CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12
  1. #1
    Join Date
    Aug 2005
    Posts
    132

    Large file seek slow under Win32

    Hi,

    I have a large file (well 17mb) and I want to randomise the lines in the file. The process I have loads the file, creates a file pointer list pointing to the first character of each line, randomisies the list of file pointers then loads each line and places it into a new text file.

    The problem section is the loading each line and placing it into a new text file. I use fseek() to find each line from the randomised list and fgets() to load the line into a buffer.

    Under unix (AIX) this process runs in less than 30 seconds. On Windows it is now taking more than 30 minutes. Upon removing the fseek and just loading each line in order under windows it takes less than 30 seconds. I tried to use fsetpos instead of fseek and it makes no difference.

    I'm using Borland Development Studio 2006 which uses the Dinkumware compiler. This section of code is very old, and unfortunately not written by me but it is using old C style function to laod the data and write to the files and I do not have the time frame to fix that.

    Anyone have any idea why it might take so long to run when I'm seeking to a new line on Windows and run so **** fast on Unix?

    Cheers
    Dan

  2. #2
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: Large file seek slow under Win32

    1) make sure you are running in Release mode, not Debug

    2) Another possibilty is to read all the lines into a container
    and write them out to the new file in random order.

    The following code takes about 10 seconds on my computer
    (2.5 GHz, 512 MB RAM)

    Code:
    #include <fstream>
    #include <vector>
    #include <string>
    #include <algorithm>
    
    using namespace std;
    
    int main()
    {
        ifstream in("orig.txt");  // 16.4 MB ... 400000 lines
    
        vector<string> v;
        v.reserve(500000);   // if you have an idea on the number of lines
    
        // read input file and store in the vector
        string line;
    
        while (getline(in,line))
            v.push_back(line);
    
        // create a vector to randomize the lines
        vector<int> randomize( v.size() );
        for (int i=0; i<v.size(); ++i) randomize[i] = i;
    
        random_shuffle(randomize.begin(),randomize.end());
    
        // output to new file in random order
    
        ofstream out("randomized.txt");
        for (int j=0; j<v.size(); ++j)
        {
            out << v[ randomize[j] ] << "\n";
        }
    
    
        return 0;
    }

  3. #3
    Join Date
    Aug 2005
    Posts
    132

    Re: Large file seek slow under Win32

    Hi,

    Thanks for the reply, yes this is certainly an option and one I will have to investigate if I cannot find a quicker solution to my problem. However this code is very complicated and it would take me a lot of time to extract it and modularise it so I could re-write it without breaking it entirely.

    I was more hoping to get a solution for the more immediate problem of why fgets is so slow when using in conjunction with fseek.

    Does anyone have any ideas?

    Cheers
    Dan

  4. #4
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Large file seek slow under Win32

    It could be because of buffering, so you could try using Win32 file functions instead of C runtime functions. So CreateFile, CloseHandle, SetFilePointer and ReadFile instead of fopen, fclose, fseek and fread.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  5. #5
    Join Date
    Aug 2005
    Posts
    132

    Re: Large file seek slow under Win32

    Also an option but it will mean I have to maintain two code sets as this code runs cross plat form running smoothly and quickly under Unix (AIX) and mean I'd have to re-write a large portion of it.

    Though you could possibly be right about a problem with the buffering, any suggestions on where I should look to find the answer on this one?

  6. #6
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Large file seek slow under Win32

    If you want you can look at the source for the runtime, but you can also just try writing a small test program with the win32 functions. If you encapsulate the file access functions inside a class, there isn't much of a problem with cross platform because you can then just easily #ifdef the two cases.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  7. #7
    Join Date
    Jun 2002
    Location
    Germany
    Posts
    1,557

    Re: Large file seek slow under Win32

    Dan,

    Did you get a chance to look at Philip's first comment---the one about Debug / Release mode?

    We have found that VC8 runs about 50-100 times slower than expected in Debug mode. We have even experienced unexpected slowdowns on certain workstations even if the application is merely started from the VC8 GUI. When started from the console, they run as expected.

    Maybe your slowdown is really simple and you just need to build the Release version. AFAIK the release STL stuff in VC8 is highly optimized, it just needs to get a chance to run properly...

    Sincerely, Chris.
    Last edited by dude_1967; February 2nd, 2007 at 05:09 PM. Reason: clarity
    You're gonna go blind staring into that box all day.

  8. #8
    Join Date
    Jan 2007
    Posts
    69

    Re: Large file seek slow under Win32

    The problem with the C Runtime file functions is that when operating on text files, if you have the file open in 'text mode' then seeking is very slow.

    The problem is that it does not seek just to a position when the stream is open in text mode.

    The solution would be just treat the file as binary, since a sequential text file is just a binary file interpreted different.

    You could use the Win32 API file I/O functions, and create a function that will read a string line by line. It's not that difficult.

    Code:
    unsigned long ReadLineFromFile(HANDLE hFile,char * pBuffer,unsigned long dwMaxLength) throw()
    {
        unsigned long dwCharsRead(0),dwBytesRead(0);
        char chTemp;
    
        do
        {
            if (::ReadFile(hFile,&chTemp,1,&dwBytesRead,0) != 0)
            {
                if (chTemp == '\r')
                {
                    ::ReadFile(hFile,&chTemp,1,&dwBytesRead,0);
    
                    if (chTemp == '\n')
                    {
                        break;
                    }
                    else
                    {
                        // for some reason, the carriage return is there
                        // but not the line feed
                        ::SetFilePointer(hFile,-1,0,FILE_CURRENT);
    
                        break;
                    }
                }
                else if (chTemp == '\n')
                {
                     break;
                }
                else
                {
                     pBuffer[dwCharsRead] = chTemp;
                     dwCharsRead += 1;
                }
            }
            else
            {
                break;
            }
    
            
        } while (dwCharsRead < (dwMaxLength - 1)); // need one for the null char
    
         pBuffer[dwCharsRead] = 0;
    
         return dwCharsRead; // means VALID chars, not actual chars read from file
    }

  9. #9
    Join Date
    Aug 2005
    Posts
    132

    Re: Large file seek slow under Win32

    Thanks for your help guys. I finally found a setting in the compiler that made it run at full speed or close enough to it again.

    Cheers
    Dan

  10. #10
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Large file seek slow under Win32

    Quote Originally Posted by dude_1967
    We have even experienced unexpected slowdowns on certain workstations even if the application is merely started from the VC8 GUI. When started from the console, they run as expected.
    Sorry for replying so late in this thread, but I wanted to point something out. The reason why applications (even in release mode) are slower when started from VC8's IDE is because memory allocation is still debugged and traced by the IDE. Some time ago I made a performance comparison between Dinkumware and STLPort on different versions of VC and couldn't explain the bad results Dinkumware+VC8 were giving. After an exchange with PJ Plauger, the culprit turned out to be memory tracing by the IDE.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  11. #11
    Join Date
    Jun 2002
    Location
    Germany
    Posts
    1,557

    Re: Large file seek slow under Win32

    Quote Originally Posted by Yves M
    ...The reason why applications (even in release mode) are slower when started from VC8's IDE is because memory allocation is still debugged and traced by the IDE. Some time ago I made a performance comparison between Dinkumware and STLPort on different versions of VC and couldn't explain the bad results Dinkumware+VC8 were giving. After an exchange with PJ Plauger, the culprit turned out to be memory tracing by the IDE.
    Thanks Yves! That explains a lot. I'll pass it on to the guys in my team.

    Do you know if this situation is the desired state of affairs or is it something that might be corrected in a service pack or something like that?

    Sincerely, Chris.
    You're gonna go blind staring into that box all day.

  12. #12
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: Large file seek slow under Win32

    AFAIK it's desired, because in VS 2003 onwards you can still debug release builds, something which you couldn't really do in VC6. And I think as long as you know this fact there is no drawback since you can run timing code from outside the IDE and that's not the majority of SW development.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured