CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 4 of 4
  1. #1
    Join Date
    May 2013
    Posts
    4

    c# advice - parsing text file

    Hi,

    I got a large text file, and I want to find different sentences (strings) in this file.

    What would be an efficient way to go through the file and look for different sentences?

    I can read every line, and create tens of 'if' to see if each line contains any sentence.

    I can also create a switch-case loop that goes through each line.

    However, it doesn't sound that efficient.

    I'd love to hear your suggestions.

    Thanks in advance.

  2. #2
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,016

    Re: c# advice - parsing text file

    It depends. If you have a one-off hack that you need to just work as fast as possible and never-ever have to use it again, then either your if-elseif blocks or switch statements with hard-coded strings will work fine. But it will be (very) ugly.

    If you want to do it a little bit better, you should consider separating your logic and your data. So suppose you have LargeFile.txt and a list of strings you want to know if are present, which you have stored, one string to a line, in ToMatch.txt. I'd do something like:

    Code:
    //Read the data in
    string[] largeLines = System.IO.File.ReadAllLines("LargeFile.txt");
    string[] targets = System.IO.File.ReadAllLines("ToMatch.txt");
    
    //Build a hashtable of target strings
    Dictionary<string,int> targetTable = new Dictionary<string,int>();
    foreach(string trg in targets)
    {
        targetTable[trg] = 0;  //Any value will do, we're just setting it to be present in the table at all
    }
    
    //Check each line of the large file to see if it matches one of our targets
    foreach(string line in largeLines)
    {
        //If this line matches a target, print the line
        if( targetTable.ContainsKey(line) )
            Console.WriteLine(line);
    }
    (N.B.: Did not try to compile, might have some syntax errors - it's just illustrative).

    If the file is really huge, you can instead iterate over it with a StreamReader (in System.IO) instead:

    Code:
    StreamReader r = new StreamReader("LargeFile.txt");
    while( !r.EndOfStream )
    {
        line = r.ReadLine();
    }
    r.Close();
    Alternatively, you can do this on the bash shell with a one liner:

    Code:
    cat LargeFile.txt | grep -e "^FirstString$" -e "^SecondString$" -e "^ThirdString$"
    You can use grep under Windows too, but it will take a little work to install. Alternatively, Cygwin may come to the rescue: http://www.cygwin.com/

    Hope that helps!
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  3. #3
    Join Date
    May 2013
    Posts
    4

    Re: c# advice - parsing text file

    Hi Bio,
    Thank you for your advice.

    Could you please explain what does this loop do?

    foreach(string trg in targets)
    {
    targetTable[trg] = 0; //Any value will do, we're just setting it to be present in the table at all
    }

  4. #4
    Join Date
    Feb 2011
    Location
    United States
    Posts
    1,016

    Re: c# advice - parsing text file

    A Dictionary is a hash-table data structure that associates a key with a value. Asking for the key will return the value. Think of it a little like accessing an array, except that you don't have to use integers to index into the array, we can use, for example, string.

    It turns out that it is (on average) extremely efficient to retrieve values from a Dictionary based on their key, or - equivalently - to check whether a given key exists in the Dictionary.

    A few lines later, we are doing exactly that. Namely, we are checking whether a given string is a key in the table when we call:
    Code:
    if( targetTable.ContainsKey(line) )
    The code snippet you quoted inserts the key (trg, which is one of the lines of ToMatch.txt) and associates it with a value (in this case, zero). Since all we care about here is that the key exists in the Dictionary, and not what the value is, it doesn't matter what value I assigned. I could assign 0. Or 100. Or even a random value for every key.

    Make sense?
    Best Regards,

    BioPhysEngr
    http://blog.biophysengr.net
    --
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured