C++ Memory Mapped Files Trouble
Hello everybody,
I have a data buffer project in Windows 7 x64 embedded OS (using Visual Studio 2008), that would work simply like that:
One writer application will send data (no network protocols, just procedure call on the same machine) to my app like 20 packages per second, each data packages will be approximately 3 MB size and comes with a timestamp tag.
My application will store each data item for 100 minutes and then remove. (So I can calculate the total size from beginnig no need for dynamic allocation etc...)
Meanwhile there will be up to 5 reader applications which will query data from my app via Timestamp tag and retreive data (no updates or deletitions on data by reader apps).
So since the data in my buffer app can grow over 50GB I don't think that shared memory is going to work for my case.
I'm thinking about using Boost Memory Mapped Files or Windows API Memory Mapped Files.
So theoratically I will crate a fixed size File on harddisk (lets say 50GB) and then map some portion to RAM write the data to this portion, if the reader applications wants to use the data which is mapped currently on memory, then they will use directly, otherwise they will map some other potion of the file to their address spaces and use from there...
My problem is I haven't use Map File concept at all and I'm not a C++ developer so I'm not very familiar with the language fundementals. I've searched tutorials of boost and msdn for windows as well but there is not too much code example.
So I don't know how to map some portion of the data to memory and manage it, also how to search data in file or memory according to the timestamp tag. Yes there are codes for creating files and mapping the whole file to memory but none for dividing it into regions, aligning or padding the data which I need for my case.
Any help with some code portion or sample projects would help extremely...
Thank you very much for reading and further helps...
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
vecihi
My problem is I haven't use Map File concept at all and I'm not a C++ developer so I'm not very familiar with the language fundementals. I've searched tutorials of boost and msdn for windows as well but there is not too much code example.
You failed to mention what you can do. Do you program at all? If so, which language?
Quote:
Originally Posted by
vecihi
So I don't know how to map some portion of the data to memory and manage it, also how to search data in file or memory according to the timestamp tag. Yes there are codes for creating files and mapping the whole file to memory but none for dividing it into regions, aligning or padding the data which I need for my case.
Last time I checked (more than a year ago), the boost library does not support regions for memory mapped files, but the win32 API does. So you can write your own abstraction layer using the win32 API that gives you a more useful interface for your specific problem.
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
D_Drmmr
You failed to mention what you can do. Do you program at all? If so, which language?
Yes I was doing development on Java and VB 6.0 and VB.NET.
Last time I checked (more than a year ago), the boost library does not support regions for memory mapped files, but the win32 API does. So you can write your own abstraction layer using the win32 API that gives you a more useful interface for your specific problem.
As far as I know it is possible in boost as well, at least the tutorials say so. But the problem is they don't explain how. That why I'm asking it, may be some experienced developers on boost may help.
Do you know how it can be done with Windows API? Any sample code?
Re: C++ Memory Mapped Files Trouble
Using the windows memory mapped files api requires some familiarity with both c/c++ and with the windows api. Programming your requirements in c/c++ when you are not very familiar with the language fundamentals is a big ask. If you are not familiar with c/c++ why does this program need to written in this language? Are there other programming languages of which you are familiar?
For the windows memory mapped files api set, have a look at
http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx and its links.
Also
http://www.codeproject.com/Articles/...y-Mapped-Files
which provides some code as a starter
http://social.msdn.microsoft.com/For...orum=vcgeneral
http://www.codeguru.com/cpp/article....using-RAII.htm
Good luck! :)
Re: C++ Memory Mapped Files Trouble
Thank you for reply, I wrote it in in my previous reply but I forget it in "QUOTE" tag :)
I developed with Java, VB 6.0, VB.NET and a little bit with C# before, but this projects must be in C++ (according to proj specs.)
I saw the MSDN examples before but they have some general stuff. I mean the codes are just too low level. I wondered if anyone has used dividing a mapped files into regions and manage it in his project, may be a real situation example would more help.
Thanks anyway.
Re: C++ Memory Mapped Files Trouble
Have a look at the link to the codeproject site I gave in my previous reply. It provides the code for a generic c++ class for using memory mapped files which might your task easier.
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
vecihi
I developed with Java, VB 6.0, VB.NET and a little bit with C# before, but this projects must be in C++ (according to proj specs.)
Then either you need to change the project specs or you need to learn C++ if you want to achieve anything. You cannot bluff your way through C++. It is completely unlike the languages you mention, even if the syntax looks familiar.
Re: C++ Memory Mapped Files Trouble
For a book covering the windows areas you will need have a look at
Windows via c/c++ by Jeffrey Richter
http://www.amazon.co.uk/Windows-Via-...ichter+windows
Note that earlier versions of this book also cover memory mapped files and can be obtained very cheaply if you can't stretch to the cost of the current new one eg
Advanced Windows by Jeffrey Richter
http://www.amazon.co.uk/Advanced-Win...ichter+windows
These will give you the details you need from the windows side of things, but (as stated by D_Drmmr in post #7) you must become somewhat proficient with c++ before you start with this otherwise you will get into a heap of problems.
Re: C++ Memory Mapped Files Trouble
You cannot use memorymapping for a buffer larger than the available virtual memory space.
For Win32 that means there is a hard limit of 3Gb, although since your program and windows kernel DLL's take part of that, you're Lucky if you can even allocate a contiguous region of 1.5Gb (in practice even 1gb will be an issue on some PC's).
Making a Win64 app pretty much removes that limit, but it's still not a very good method because you're hogging a lot of Virtual memory.
I would look at a solution with either an actual database (if you can find one that allows 3Mb data chunks) or alternatively... store the data in regular files, and maintain a database to the files or maintain a list in memory of references to the files (rather than keeping all the data in memory).
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
OReubens
You cannot use memorymapping for a buffer larger than the available virtual memory space.
For Win32 that means there is a hard limit of 3Gb, although since your program and windows kernel DLL's take part of that, you're Lucky if you can even allocate a contiguous region of 1.5Gb (in practice even 1gb will be an issue on some PC's).
Making a Win64 app pretty much removes that limit, but it's still not a very good method because you're hogging a lot of Virtual memory.
I would look at a solution with either an actual database (if you can find one that allows 3Mb data chunks) or alternatively... store the data in regular files, and maintain a database to the files or maintain a list in memory of references to the files (rather than keeping all the data in memory).
Thank you. Yes as you mentioned i know the limitations for 32 bit systems but the project will be definitely 64 bit. So at least for the addressinf there won't be any problems.
As i said before what i'm trying to implement is instead of mapping the whole file just map a portion (lets say 1 gb) of it. While writing data to that portion the reader applications theoretically can reach the written data immediately (since the data is available on ram)
But for the older data yes you're right once again they have to map that portion of data to their address spaces and read from there.
Reading the data can wait a little bit, i mean a little delay is acceptable. But writing is critical because we can not miss any data package.
As far as i know the operations like flushing and mapping other portions of the file are handled by kernel so i don't need to deal with them.
Also boost and microsoft offers this method for sharing data between seperate processes especially for extremely large files. That why i'm into this way.
Using a third party database would create some problems with the customers so i have to develop it myself and unfortunately this is the best way that i can find up to now.
In every documentation its said that "it can be done very easly" but there is no example or code part about it. Thats the reason that i wrote it here.
Thank you again for your time and ideas. Further ideas also highly appreciated :)
Re: C++ Memory Mapped Files Trouble
May I do a little math?
20 packages per sec at 3MB a piece is 60MB of data per second.
This is 3,600MB per minute, or 360GB for a 100 min you want to keep your data (not 50GB).
What kind of storage do you have on that embedded system? Is it both large and fast?
How frequent will your 5 readers request the data?
Writing of 60MB/sec will exhaust throughput of the low end HDD, leaving nothing for reading.
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
VladimirF
May I do a little math?
20 packages per sec at 3MB a piece is 60MB of data per second.
This is 3,600MB per minute, or 360GB for a 100 min you want to keep your data (not 50GB).
What kind of storage do you have on that embedded system? Is it both large and fast?
How frequent will your 5 readers request the data?
Writing of 60MB/sec will exhaust throughput of the low end HDD, leaving nothing for reading.
Yeah you're right, these numbers are the maximum limits in theory, but in real application these would be a little less than these.
I will definitely use a Solid State Drive whose size is big enough for storing the data in real application. But I guess the real file size would vary between 50GB - 150GB.
And for the reader applications frequency, there is nothing much to say, we have to try and see the performance in real system with the real equipment.
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
vecihi
Yeah you're right, these numbers are the maximum limits in theory, but in real application these would be a little less than these.
I will definitely use a Solid State Drive whose size is big enough for storing the data in real application. But I guess the real file size would vary between 50GB - 150GB.
So you plan to write variable-length records to a file? Then recycling becomes tricky. If you want to avoid copying entire multi-gig file, you would have to replace records in-place, dealing with the new record covering more than one. The maintenance becomes a chore.
Also, to search that kind of file for your time stamp is not trivial.
I would certainly use fix-length records, that allows easy random access to the file. Then the question is - why would you need memory-map it? Your writer would just write sequentially, keeping a pointer to the next record. Let Windows worry about caching recently accessed records.
One more question - you don't have 150GB of RAM on that system, do you?
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
VladimirF
So you plan to write variable-length records to a file? Then recycling becomes tricky. If you want to avoid copying entire multi-gig file, you would have to replace records in-place, dealing with the new record covering more than one. The maintenance becomes a chore.
Also, to search that kind of file for your time stamp is not trivial.
I would certainly use fix-length records, that allows easy random access to the file. Then the question is - why would you need memory-map it? Your writer would just write sequentially, keeping a pointer to the next record. Let Windows worry about caching recently accessed records.
One more question - you don't have 150GB of RAM on that system, do you?
Oh, I'm sorry I thought I wrote it in the first post but I think I forgat.
This application is going to be a DLL. For each different data type we're going to built another DLL. So to your question; the Data packages sizes are exactly same for one DLL and known from the compile time. There is no variable-length reacords, they're all same size.
When I said, the file size is going to vary from 50 -150GB, I mean for example for one DLL, the data packages will be 1 MB each and 10 packages per second and 100 minutes to store the data. For another instance, the packages will be 3MB and 3 packages per second but lets say 60 minutes for storing data and so on...
So these will be different applications that will be built seperately. So in our case the file size, frequency of the data, each data item size and time to store each data item is fixed & known from the compile time, no doubt about that.
My purpose to use mermory map files is to increase the I/O performance since we're trying to implement a performance critical app. As far as I know memory mapped files are the fastest way for sharing data like huge files between multiprocess systems.
Am I wrong?
And yes, I don't have that much RAM. The hardware would be something like Intel I7 processor, with 8GB - 16GB RAM and a SSD which has enough capacity for the real numbers.
Thanks.
Re: C++ Memory Mapped Files Trouble
Quote:
My purpose to use mermory map files is to increase the I/O performance since we're trying to implement a performance critical app. As far as I know memory mapped files are the fastest way for sharing data like huge files between multiprocess systems.
Am I wrong?
Why don't you write some simple programs with files on this hardware configuration and find out? Why have all the data packages in one file? Why not a file per data package with the name as the time stamp?
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
2kaud
Why don't you write some simple programs with files on this hardware configuration and find out? Why have all the data packages in one file? Why not a file per data package with the name as the time stamp?
Hmm, that's an interesting idea actually. Creating one folder including thousands of files...
But creating one file for each data item, writing the data in it and saving won't be so costly from perfomance point of view? I mean is it possible to match the performance req. like lets say 15 packages per second with each data 2 MB size?
And what about searching the files according to names (timestamp) for the reader applications?...
I don't have this hardware setup now, it will be in the real system, so I have to develop it first on my personal computer.
Do you have any estimation for the performance?
Thanks.
Edit:
Furthermore, if wirting and reading a file from hard drive doesn't cost too much and decide to go with that way, also I can create a circullar array in memory (shared memory perhaps), while writing the data to the file, I can add a new element to my "metadata array" as well with timestamp and a unique id attributes and I can give this unique id to filenames. So the reader applications can make the search first from this meta data array (without going to harddrive) and if the data is found then they can search the file from hard drive with this unique id. (How much time does it take to search a File with id (name) in for exmaple 100000 file sized repository???)
Yeah it seems really cool, only question is can I match that writing speed??? Yeah I have to try and see I think...
Thank you very much again.
Re: C++ Memory Mapped Files Trouble
Quote:
Originally Posted by
vecihi
Thank you. Yes as you mentioned i know the limitations for 32 bit systems but the project will be definitely 64 bit. So at least for the addressinf there won't be any problems.
It won't be a problem to allocate the memorymapping.
That doesn't mean there are no problems associated with it entirely. Memorymapping does not come entirely for free either. Don't use memorymapping unless you specifically have a specific need for it. From yoru description, you don't really "need" the memorymapping and in effect, it may make your total problem worse because you're hogging vram which may mean not physical ram is busy and you could end up causing excessive paging which will worsen your alreay tight throughput issue.
Quote:
As i said before what i'm trying to implement is instead of mapping the whole file just map a portion (lets say 1 gb) of it. While writing data to that portion the reader applications theoretically can reach the written data immediately (since the data is available on ram)
There is no immediate benefit to mapping portions. Settin gup a memorymapping takes some OS interaction and again, if you don't really need the mapping nature, it's a bad idea.
don't use memorymapping assuming this will work better than straightforward linear streaming (reading/writing) the entire packet.
Memorymapping doesn't pay off unless you have recurring i/o operations in a random access pattern. If the access pattern is Always linear/sequential, then memorymapping is not the way to go.
Even with memory mapping, you WILL need to synchronize the readers and writers. It's not "ram" as you seem to think. The reads/writes are not synchronised by the OS and the reader and writer can "catch up" to eachother (in fact they often will due to caching). You need to fully protect/synchronize the entire mapped file write/read.
Quote:
Reading the data can wait a little bit, i mean a little delay is acceptable. But writing is critical because we can not miss any data package.
This strengthens my observation that memorymapping is NOT the proper way to solve your problem. You need a read/writer logic that prioritizes the writer over the readers.
The need to synchronize the entire mapped file will block the writer. Which you're claiming you can't afford.
Quote:
As far as i know the operations like flushing and mapping other portions of the file are handled by kernel so i don't need to deal with them.
correct. But simultaneous access from multiple threads is NOT provided by the OS. It is possible for one thread to read "half written" data.
Quote:
Also boost and microsoft offers this method for sharing data between seperate processes especially for extremely large files. That why i'm into this way.
it is a good way to share random accessed data. It is not so good for sequential patterns (it works, but there's better alternatives). You still need to synchronize access.
Quote:
In every documentation its said that "it can be done very easly" but there is no example or code part about it. Thats the reason that i wrote it here.
"easy" is the claim for just about every new technology I've seen emerge over the last few decades.
Nobody likes to claim that their solution is "difficult" (nomatter how powerful it is).
memorymapping as a whole is easy to get in to (for single thread).
Multithreading adds complexity to just about everything. Memorymapping is a common pitfall where proper synchronisation is hard (or even forgotten entirely) because devs tend to forget that the OS manages the memorymap at the virtual page level, but your app accesses it at the byte level (or word, dword, qword depending on the datatype). I've seen many failures because of forgetting about that little detail.
Re: C++ Memory Mapped Files Trouble
Oruebens, thank you very much for your detailed answer. Yes i think you're right since i don't have fully control over this language and memory mapping concept it is a risk to go with that.
What do you think about the other idea that i just wrote above your last post? Does it make sense for you?
Thanks again.
Re: C++ Memory Mapped Files Trouble
I already suggested single files per packet in an earlier post.
Just make a file per packet, and maintain an array/list/some_container in memory with the references (filenames) to those files. The reader/writers then only need to have synchronised access to the array/list/container to fetch a packet (file) to proces or to add a newly written packet (file).
Re: C++ Memory Mapped Files Trouble
Side note: the main issue here is the size of the data packets. If the packets were onlly a few Kb in size, then a solution wher eyou keep everything in memory would be more Obvious / make more sense.
But the sheer volume (50-150Gb) makes a memory based solution problematic.