-
January 30th, 2012, 07:53 AM
#1
Reading large Text Files
Hi,
I have asked this question before but I have never solved it.
My problem is I have very large text files (approx 2GBs+).
They have records in them based in one per line.
Each line is not the same length and the data can be different lengths all the time.
I am currently reading the file line by line, then splitting the data by common characters in the records. To process the full file it currently takes 3hours. This is way too slow for its purpose.
What would be the best way to achieve this?
Thanks.
-
January 30th, 2012, 09:10 AM
#2
Re: Reading large Text Files
**EDIT thinking it twice, this could not be correct. A way to speed it up should be to read more than one line per time (sai 100), to decrease the number of times you must phisically access the file
if you're sure a line is a record, you could try splitting the file , first , in smallest pieces to process no more than 100 MB at a time. The split prcess will add overhead (I suppose it would take around 40 min for a 2 gb file), but the subsequent reading should be faster.
Do not know if this will really improve, but it is worth a try
Last edited by Cimperiali; January 30th, 2012 at 09:19 AM.
...at present time, using mainly Net 4.0, Vs 2010
Special thanks to Lothar "the Great" Haensler, Chris Eastwood , dr_Michael, ClearCode, Iouri and
all the other wonderful people who made and make Codeguru a great place.
Come back soon, you Gurus.
-
January 30th, 2012, 10:14 AM
#3
Re: Reading large Text Files
Hi,
Yeah I have been thinking about splitting the file, still have to experiment with this idea, and also running them side by side in two separate threads could help?
Just as a guide. I am currently running the script as standard and after an hour it has processed 4,109,500 records/lines.
Last edited by martind132; January 30th, 2012 at 10:22 AM.
-
January 30th, 2012, 10:33 AM
#4
Re: Reading large Text Files
You should be reading the file in much larger chunks for sure. You also may want to take a look at your processing of the file. When dealing with files this large there are many simple little things that can cause a very noticable speed difference.
I remember back in VB5 we had a program that processed some text files and on a 500k file the program was taking 16 minutes to complete running line by line. Each line was read and then passed as a string to a function. I changed the passed parameter from String to Variant and the process was complete in 15-20 seconds after that little change.
Most things will not be that extreme but every tick adds up when you are repeating a process thousands or millions of times.
Always use [code][/code] tags when posting code.
-
January 30th, 2012, 10:44 AM
#5
Re: Reading large Text Files
Yeah I've been running through the processing of the line and looking what can be improved.
Some of my values have to be integers, but come in the file as empty. what is the best way to make these zero?
I am currently using a function which looks for an empty string and returns zero if its empty.
-
January 30th, 2012, 10:52 AM
#6
Re: Reading large Text Files
I haven't really tested various methods to see the speed difference and most of my experience is in VB6 but my first thought would be to just use Val() on the string. I think this will return 0 if the string is empty. It will also return 0 is the string is alpha characters.
Always use [code][/code] tags when posting code.
-
January 30th, 2012, 11:15 AM
#7
Re: Reading large Text Files
Ah Cool. Replaced them with Val() (All 56 of them lol)
-
January 30th, 2012, 01:03 PM
#8
Re: Reading large Text Files
Probably can be sped up. If you show some of your code... ASYNC methods to process all files at once might be the answer.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|