Reading large Text Files
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8

Thread: Reading large Text Files

  1. #1
    Join Date
    Aug 2011
    Posts
    23

    Reading large Text Files

    Hi,

    I have asked this question before but I have never solved it.
    My problem is I have very large text files (approx 2GBs+).
    They have records in them based in one per line.
    Each line is not the same length and the data can be different lengths all the time.

    I am currently reading the file line by line, then splitting the data by common characters in the records. To process the full file it currently takes 3hours. This is way too slow for its purpose.

    What would be the best way to achieve this?

    Thanks.
    Tutorial Resource - Thousand of Scripts and Tutorials

  2. #2
    Join Date
    Jul 2000
    Location
    Milano, Italy
    Posts
    7,726

    Re: Reading large Text Files

    **EDIT thinking it twice, this could not be correct. A way to speed it up should be to read more than one line per time (sai 100), to decrease the number of times you must phisically access the file
    if you're sure a line is a record, you could try splitting the file , first , in smallest pieces to process no more than 100 MB at a time. The split prcess will add overhead (I suppose it would take around 40 min for a 2 gb file), but the subsequent reading should be faster.

    Do not know if this will really improve, but it is worth a try
    Last edited by Cimperiali; January 30th, 2012 at 09:19 AM.
    ...at present time, using mainly Net 4.0, Vs 2010



    Special thanks to Lothar "the Great" Haensler, Chris Eastwood , dr_Michael, ClearCode, Iouri and
    all the other wonderful people who made and make Codeguru a great place.
    Come back soon, you Gurus.

  3. #3
    Join Date
    Aug 2011
    Posts
    23

    Re: Reading large Text Files

    Hi,

    Yeah I have been thinking about splitting the file, still have to experiment with this idea, and also running them side by side in two separate threads could help?

    Just as a guide. I am currently running the script as standard and after an hour it has processed 4,109,500 records/lines.
    Last edited by martind132; January 30th, 2012 at 10:22 AM.
    Tutorial Resource - Thousand of Scripts and Tutorials

  4. #4
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,851

    Re: Reading large Text Files

    You should be reading the file in much larger chunks for sure. You also may want to take a look at your processing of the file. When dealing with files this large there are many simple little things that can cause a very noticable speed difference.

    I remember back in VB5 we had a program that processed some text files and on a 500k file the program was taking 16 minutes to complete running line by line. Each line was read and then passed as a string to a function. I changed the passed parameter from String to Variant and the process was complete in 15-20 seconds after that little change.

    Most things will not be that extreme but every tick adds up when you are repeating a process thousands or millions of times.
    Always use [code][/code] tags when posting code.

  5. #5
    Join Date
    Aug 2011
    Posts
    23

    Re: Reading large Text Files

    Yeah I've been running through the processing of the line and looking what can be improved.
    Some of my values have to be integers, but come in the file as empty. what is the best way to make these zero?

    I am currently using a function which looks for an empty string and returns zero if its empty.
    Tutorial Resource - Thousand of Scripts and Tutorials

  6. #6
    DataMiser is offline Super Moderator Power Poster
    Join Date
    Jul 2008
    Location
    WV
    Posts
    4,851

    Re: Reading large Text Files

    I haven't really tested various methods to see the speed difference and most of my experience is in VB6 but my first thought would be to just use Val() on the string. I think this will return 0 if the string is empty. It will also return 0 is the string is alpha characters.
    Always use [code][/code] tags when posting code.

  7. #7
    Join Date
    Aug 2011
    Posts
    23

    Re: Reading large Text Files

    Ah Cool. Replaced them with Val() (All 56 of them lol)
    Tutorial Resource - Thousand of Scripts and Tutorials

  8. #8
    Join Date
    Jan 2006
    Location
    Chicago, IL
    Posts
    14,999

    Re: Reading large Text Files

    Probably can be sped up. If you show some of your code... ASYNC methods to process all files at once might be the answer.
    David

    CodeGuru Article: Bound Controls are Evil-VB6
    2013 Samples: MS CODE Samples

    CodeGuru Reviewer
    2006 Dell CSP
    2006, 2007 & 2008 MVP Visual Basic
    If your question has been answered satisfactorily, and it has been helpful, then, please, Rate this Post!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center