CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 6 of 6
  1. #1
    Join Date
    Jan 2006
    Posts
    35

    Using LOTS of memory - Suggestions Please

    I have an application in which I do some pre-processing of a flat file. The result of the pre-processing is a ton of data. Next, the application works on this data. This part is very time sensitive and is the reason I pre-process the flat file.... so that I can simply blow through the data. Each piece of data is stored as a structure consisting of a float and 2 chars. After pre-processing a typical file I could potentially have 40,800,000 dynamically allocated structures.

    I initially started off using vectors. It worked decent with smaller files. As the file size increased this became extremely slow.

    I don't know exactly how memory is allocated when requesting such large amounts. I assume that at some point (when?) the contents of physical memory are paged out to virtual memory.

    My next thought is to simply output the results of the pre-processing to another flat file. So, all I would have to do when the time comes to execute, is open the file and read each chunk of data and act on it.

    Any suggestions?

  2. #2
    Join Date
    Feb 2005
    Location
    Normandy in France
    Posts
    4,590

    Re: Using LOTS of memory - Suggestions Please

    Quote Originally Posted by eeboy
    I initially started off using vectors. It worked decent with smaller files. As the file size increased this became extremely slow.
    I assume you abused of push_back.

    push_back should be seldomly used on vectors whose memory has not be previously reserved with std::vector<T>::reserve.

    A vector is a bit like a dynamic array allocated with new[]. It is not resizable.
    Actually, when push_back is called and the allocated size is insufficient, it will create a second block of memory (larger than the first one), and copy all the contents of the first block to the second one, and then free the first block.

    Using push_back many times make the vector reallocate many many times.
    In the worst case (It depends on the implementation of the STL), there may be one reallocation for each push_back operation.

    To avoid that problem, you have two alternatives:
    • Use a std:eque which is a container very efficient to push_back, and which does not use as much memory as std::list, but about the same amount of memory than std::vector.
    • Use reserve on the vector, before any push_back operation, passing the size of the final vector as argument.
      This second option assume you know (at least approximatively) the size of the final vector.

    Quote Originally Posted by eeboy
    I don't know exactly how memory is allocated when requesting such large amounts. I assume that at some point (when?) the contents of physical memory are paged out to virtual memory.
    On 386 & higher CPU, in 32 bits protected mode, the whole memory space addressed by programs is "virtual".
    It means that the program's memory is cut into 4096 bytes blocks aligned on 4096 bytes boundaries.
    Each block can be dynamically mapped/unmapped to any 4096 bytes of physical memory (aligned on a 4096 bytes boundary).
    Contiguous blocks in the virtual address space may be non-contiguous in physical memory.
    The OS can even unmap a memory page (that is a 4096 bytes block) from the virtual memory, and saves its contents in the SWAP file.
    Then, when the process access to the memory page, it makes a "page fault" which is totally handled by the OS; the OS load the page from the SWAP file to a physical memory page, and maps the virtual memory page to the physical memory page.

    google is your friend:
    http://www.google.com/search?q=%22vi...memory%22&btnG
    "inherit to be reused by code that uses the base class, not to reuse base class code", Sutter and Alexandrescu, C++ Coding Standards.
    Club of lovers of the C++ typecasts cute syntax: Only recorded member.

    Out of memory happens! Handle it properly!
    Say no to g_new()!

  3. #3
    Join Date
    Feb 2005
    Location
    Normandy in France
    Posts
    4,590

    Re: Using LOTS of memory - Suggestions Please

    Since there are many objects, you should probably make the structure to be unaligned.
    There should be a compiler flag or a pragma to align structures on 1-byte boundaries.
    "inherit to be reused by code that uses the base class, not to reuse base class code", Sutter and Alexandrescu, C++ Coding Standards.
    Club of lovers of the C++ typecasts cute syntax: Only recorded member.

    Out of memory happens! Handle it properly!
    Say no to g_new()!

  4. #4
    Join Date
    Jan 2006
    Posts
    35

    Re: Using LOTS of memory - Suggestions Please

    Thanks for your help!

    What do you mean by unaligned? From what you said... I take it to mean that by making the structure unaligned they are packed "tight" in memory. So, within the 4096 byte block they are placed continuously.... if they were not unaligned they would be aligned on some other basis (every 8 bytes or something like that) which creates wasted 'usable' space? Am I close?

    Thanks!

  5. #5
    Join Date
    Feb 2005
    Location
    Normandy in France
    Posts
    4,590

    Re: Using LOTS of memory - Suggestions Please

    Quote Originally Posted by eeboy
    if they were not unaligned they would be aligned on some other basis (every 8 bytes or something like that) which creates wasted 'usable' space? Am I close?
    Yes, if the compiler may "align" the structure, that is, create dummy fields to make its size equal a multiple of 4 (or maybe 8 if your structure contained a double data object).
    Code:
    struct S
    {
    float aFloat;
    char c1,c2;
    };
    May be replace with:
    Code:
    struct S
    {
    float aFloat;
    char c1,c2;
    char dummy[2];
    };
    That alignment is made by the compiler, because accessing values on addresses equal to a multiple of the size of the value is faster on modern compilers.
    Thus, assuming that malloc/new[]/std::vector allocate some memory whose base address is a multiple of 4 or 8, if the structure is "aligned", all the structures of the array/vector will also have a base aligned on 4 or 8 bytes boundaries.
    Thus, accesses to the floating point number will be fast (for the characters it does not make any difference).

    Furthermore, the declaration order of values in a structure has an effect.
    The compiler may align each field in the structure, what means that:
    Code:
    struct
    {
    char c1;
    float Value;
    char c2;
    };
    May be interpreted as:
    Code:
    struct S
    {
    char c1;
    char Dummy1[3];
    float Value; // if the base address of an instance of this structure is a multiple of 4, the address of Value is also a multiple of 4
    char c2;
    char Dummy2[3]; // if there is an array of S whose base address is a multiple of 4, all items in the arrays will have addresses multiple of 4, and all floating point values will also have addresses multiple of 4.
    };
    Note that this class uses 12 bytes instead of the 6-bytes expected size.
    "inherit to be reused by code that uses the base class, not to reuse base class code", Sutter and Alexandrescu, C++ Coding Standards.
    Club of lovers of the C++ typecasts cute syntax: Only recorded member.

    Out of memory happens! Handle it properly!
    Say no to g_new()!

  6. #6
    Join Date
    Nov 2002
    Location
    Los Angeles, California
    Posts
    3,863

    Re: Using LOTS of memory - Suggestions Please

    You may consider using the small object allocator in the loki library.

    LOKI
    Wakeup in the morning and kick the day in the teeth!! Or something like that.

    "i don't want to write leak free code or most efficient code, like others traditional (so called expert) coders do."

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured