-
May 28th, 2013, 03:03 PM
#1
c# advice - parsing text file
Hi,
I got a large text file, and I want to find different sentences (strings) in this file.
What would be an efficient way to go through the file and look for different sentences?
I can read every line, and create tens of 'if' to see if each line contains any sentence.
I can also create a switch-case loop that goes through each line.
However, it doesn't sound that efficient.
I'd love to hear your suggestions.
Thanks in advance.
-
May 28th, 2013, 07:48 PM
#2
Re: c# advice - parsing text file
It depends. If you have a one-off hack that you need to just work as fast as possible and never-ever have to use it again, then either your if-elseif blocks or switch statements with hard-coded strings will work fine. But it will be (very) ugly.
If you want to do it a little bit better, you should consider separating your logic and your data. So suppose you have LargeFile.txt and a list of strings you want to know if are present, which you have stored, one string to a line, in ToMatch.txt. I'd do something like:
Code:
//Read the data in
string[] largeLines = System.IO.File.ReadAllLines("LargeFile.txt");
string[] targets = System.IO.File.ReadAllLines("ToMatch.txt");
//Build a hashtable of target strings
Dictionary<string,int> targetTable = new Dictionary<string,int>();
foreach(string trg in targets)
{
targetTable[trg] = 0; //Any value will do, we're just setting it to be present in the table at all
}
//Check each line of the large file to see if it matches one of our targets
foreach(string line in largeLines)
{
//If this line matches a target, print the line
if( targetTable.ContainsKey(line) )
Console.WriteLine(line);
}
(N.B.: Did not try to compile, might have some syntax errors - it's just illustrative).
If the file is really huge, you can instead iterate over it with a StreamReader (in System.IO) instead:
Code:
StreamReader r = new StreamReader("LargeFile.txt");
while( !r.EndOfStream )
{
line = r.ReadLine();
}
r.Close();
Alternatively, you can do this on the bash shell with a one liner:
Code:
cat LargeFile.txt | grep -e "^FirstString$" -e "^SecondString$" -e "^ThirdString$"
You can use grep under Windows too, but it will take a little work to install. Alternatively, Cygwin may come to the rescue: http://www.cygwin.com/
Hope that helps!
Best Regards,
BioPhysEngr
http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
-
May 29th, 2013, 01:00 PM
#3
Re: c# advice - parsing text file
Hi Bio,
Thank you for your advice.
Could you please explain what does this loop do?
foreach(string trg in targets)
{
targetTable[trg] = 0; //Any value will do, we're just setting it to be present in the table at all
}
-
May 29th, 2013, 01:24 PM
#4
Re: c# advice - parsing text file
A Dictionary is a hash-table data structure that associates a key with a value. Asking for the key will return the value. Think of it a little like accessing an array, except that you don't have to use integers to index into the array, we can use, for example, string.
It turns out that it is (on average) extremely efficient to retrieve values from a Dictionary based on their key, or - equivalently - to check whether a given key exists in the Dictionary.
A few lines later, we are doing exactly that. Namely, we are checking whether a given string is a key in the table when we call:
Code:
if( targetTable.ContainsKey(line) )
The code snippet you quoted inserts the key (trg, which is one of the lines of ToMatch.txt) and associates it with a value (in this case, zero). Since all we care about here is that the key exists in the Dictionary, and not what the value is, it doesn't matter what value I assigned. I could assign 0. Or 100. Or even a random value for every key.
Make sense?
Best Regards,
BioPhysEngr
http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|