CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1
    Join Date
    May 2013
    Posts
    2

    Lightbulb Regarding Resume Parser Tool

    My purpose is to compare two resumes...have a criteria that if xyz fields match it is a duplicate.

    As u may know resume styles differ.How do i understand that the name field is a name field,so i can store it somewhere and compare it with the same field in another resume.
    As of now i have used Inter-op method and i am getting all document content in a string from the string i am splitting all the \t ,\r and empty spaces i am getting in an array.From the array how to get my own standard XML format like below.I heard people are using Natural language processing for the resume parsing tool, Or kindly Suggest me method or algorithm for the resume parser. My platform in .Net Framework. Thanks in Advance.


    Code:
    <CANDIDATE_FULL_NAME>CandidateName here</CANDIDATE_FULL_NAME>
    <CANDIDATE_FIRST_NAME>CandidateFirstName here</CANDIDATE_FIRST_NAME>
    <CANDIDATE_LAST_NAME>CandidateLastName here</CANDIDATE_LAST_NAME>
    <PRIMARY_EMAIL_ID>name@gmail.com</PRIMARY_EMAIL_ID>
    <PHONE_BASIC>+919720018454155</PHONE_BASIC>
    <DOB>8/2/1987</DOB>
    <STREET1></STREET1>
    <STREET2></STREET2>
    <CITY></CITY>
    <REGION></REGION>
    <COUNTRY></COUNTRY>
    <PIN></PIN>
    Last edited by BioPhysEngr; May 5th, 2013 at 02:00 AM. Reason: convert pre tags to code tags

  2. #2
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,016

    Re: Regarding Resume Parser Tool

    Fascinating question!

    However, I think trying to apply NLP to this project will transform it into a gigantic boondoggle. It's a pain to get things into a machine-readable format even when the input pattern is fairly predictable, and probably entirely intractable for your purposes. If you're just looking to hunt down duplicates, I would try to do something simple like: dump the data to text format (e.g. with antiword), search for the first string that look like an e-mail address (using grep or C# Regex) and call it a duplicate if that e-mail matches any other e-mail address in your data set.

    An alternative approach would be to dump the data to text format and then use a diff tool to try to quantify the difference between any two resumes. Presumably near-duplicates will have similar resume structure (maybe they added a few lines, but overall duplicates will be similar in most cases, even across time).

    If you absolutely must proceed with the resume -> XML approach, this research article describes a method of doing exactly what you want, but may be difficult for your to implement: http://acl.ldc.upenn.edu/P/P05/P05-1062.pdf

    Also, forum note: The <pre> tags don't work, but you can get the same effect by wrapping [code] and [/code] tags around areas in which you want to preserve formatting.
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  3. #3
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: Regarding Resume Parser Tool

    @BioPhysEngr. I just took a look at your blog. Wow. How do you find time to post here?

  4. #4
    Join Date
    May 2013
    Posts
    2

    Re: Regarding Resume Parser Tool

    i have compared email id and phone number. But i need your suggestion for comparing names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured