CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Dec 1999
    Location
    North Sydney, NS
    Posts
    445

    How do database repository files work?

    Does a database rewrite it's entire file everytime new data is stored? Is it able to selectively remove parts of it's file when data is to be deleted or does it need to rewrite the whole file? Is this done with fstream?

    Paul
    I know how to build. What to build is a completely different story.

  2. #2
    Join Date
    Feb 2005
    Location
    Normandy in France
    Posts
    4,590

    Re: How do database repository files work?

    Quote Originally Posted by Paul Rice
    Does a database rewrite it's entire file everytime new data is stored? Is it able to selectively remove parts of it's file when data is to be deleted or does it need to rewrite the whole file? Is this done with fstream?
    Database are usually very highly optimized for all SQL operations, and more.
    So, they won't rewrite the entire file everytime new data is stored.

    Databases written in C++, may use fstream.
    Databases written in C are more likely to use the OS-specific API, or the standard file streams C library.
    "inherit to be reused by code that uses the base class, not to reuse base class code", Sutter and Alexandrescu, C++ Coding Standards.
    Club of lovers of the C++ typecasts cute syntax: Only recorded member.

    Out of memory happens! Handle it properly!
    Say no to g_new()!

  3. #3
    Join Date
    Dec 1999
    Location
    North Sydney, NS
    Posts
    445

    Re: How do database repository files work?

    My problem is I have a class with a vector as a member. I want to save the object to disk (including the vector). The size of the vector will very and things will constantly be add and deleted. Just writing the whole object to disk seems ok when the vector is small, but what if there are thousands of nodes in the vector? I thougt maybe looking at how databases manage large amounts of data might offer a solution. But then again, maybe my concern is a non-issue.

    To read and write the object I've been using:

    Info a;
    f.write( reinterpret_cast<const char*>(&a), sizeof(a) );

    Info z;
    g.read((char*)(&z), sizeof(z));
    I know how to build. What to build is a completely different story.

  4. #4
    Join Date
    Feb 2005
    Location
    "The Capital"
    Posts
    5,306

    Re: How do database repository files work?

    Databases are not that easy.. they are very advanced in terms of storage etc.

    They may be having their own file system and/or store data in pages. I know of Sybase that can be installed even on a raw disk (but having multiple databases on the same raw disk makes it less efficient).

    Also, it is not necessary that every write or read operation would take places from files.. they could be in-memory transactions as well that the database engine periodically writes back to its storage.

    You should be better asking this question on a good database forum or a book/article telling about that.

  5. #5
    Join Date
    Feb 2005
    Location
    "The Capital"
    Posts
    5,306

    Re: How do database repository files work?

    Thought of linking with the same topic in db forum - http://www.codeguru.com/forum/showthread.php?t=403645

    Please don't consider it as a duplicate post. I don't think CG has good database administrators available in the forums (or atleast they don't visit frequently). You can try other forums/articles - some good ones are dbforums/ mysql[/url] forums/ and there are bunch of those for sql server. sql server central/online / msdn articles and forums etc.

  6. #6
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: How do database repository files work?

    An easy to access book about database design is "Managing Gigabytes", so if you are interested in that topic, check it out.

    Database storage is usually very similar to filesystem storage. Namely they have an index and then the data in some non-defined order. The index grows, but relatively slowly, so it's usually sufficient to allocate a "page" for it and then add pages as needed. What this "page" is depends on your own implementation, it could for example be 1KB, 1MB or something else. The index then references the data a bit like pointers. The data is also usually allocated in "pages" (which can and often do have a different size of the index pages) and just follows the index. Then if your data is smaller than a whole page, you typically waste the rest of the page. If the data is bigger, you just split it across several pages.

    For example, suppose an index page is 16 bytes (4 unsigned longs) and the associated IDs also take up 16 bytes. You need to identify each object with an ID, since otherwise you don't know when some data overflows a data page for example. And say the data pages are each 8 bytes long. Then how would you stored "Hello".
    Code:
    // First the 4 longs for the offset to the actual data
    32 0 0 0
    // Now the 4 longs for the IDs
    1 0 0 0
    // Now the first data page
    'H' 'e' 'l' 'l' 'o' 0 0 0
    Of course this needs to be a binary file, since you want to be able to seek and read anywhere from the middle as needed.

    Now, let's say you want to add "World" to your database. Then you'll end up with:
    Code:
    // First the 4 longs for the offset to the actual data
    32 40 0 0
    // Now the 4 longs for the IDs
    1 2 0 0
    // Now the first data page
    'H' 'e' 'l' 'l' 'o' 0 0 0
    // Second data page
    'W' 'o' 'r' 'l' 'd' 0 0 0
    And now you want to delete "Hello". This just means that you need to mark it in the index as free.
    Code:
    // the 4 longs for the offset to the actual data
    0 40 0 0
    // the 4 longs for the IDs
    0 2 0 0
    // Now the first data page (it's garbage, since you don't reference it anymore)
    'H' 'e' 'l' 'l' 'o' 0 0 0
    // Second data page
    'W' 'o' 'r' 'l' 'd' 0 0 0
    And now insert "Hello there". For this you need two data pages. So if you grab the first two that are available, you'll get.
    Code:
    // the 4 longs for the offset to the actual data
    32 40 48 0
    // the 4 longs for the IDs
    3 2 3 0
    // first data page (belongs to ID 3)
    'H' 'e' 'l' 'l' 'o' ' ' 't' 'h'
    // Second data page (belongs to ID 2)
    'W' 'o' 'r' 'l' 'd' 0 0 0
    // third data page (belongs to ID 3)
    'e' 'r' 'e' 0 0 0 0 0
    This is the gist of how it works. However there are many issues that need to be adressed for a real database system (and filesystem) that complicate the whole thing a lot.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  7. #7
    Join Date
    Dec 1999
    Location
    North Sydney, NS
    Posts
    445

    Re: How do database repository files work?

    Thanks Yves. I'm wondering if this is considered a more efficient method to manage large amounts of volatile data than what I've been doing?

    Paul
    I know how to build. What to build is a completely different story.

  8. #8
    Join Date
    Feb 2005
    Location
    "The Capital"
    Posts
    5,306

    Re: How do database repository files work?

    Well, let me jump in between you and Yves to ask why can't you use a ready made database rather than implementing your own storage mechanism?

  9. #9
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: How do database repository files work?

    It depends on what you are exactly doing.
    - Is there one object in the file, or multiple ones?
    - When you write the object to an existing file, has everything changed, is it an update or is it most likely just adding/removing data from the end?
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  10. #10
    Join Date
    Dec 1999
    Location
    North Sydney, NS
    Posts
    445

    Re: How do database repository files work?

    Quote Originally Posted by Yves M
    It depends on what you are exactly doing.
    - Is there one object in the file, or multiple ones?
    - When you write the object to an existing file, has everything changed, is it an update or is it most likely just adding/removing data from the end?
    I'm looking at storing a vector that's a member of an object. I'd say, at this point, the contents of the vector looks rather volatile.
    I know how to build. What to build is a completely different story.

  11. #11
    Join Date
    Dec 1999
    Location
    North Sydney, NS
    Posts
    445

    Re: How do database repository files work?

    Quote Originally Posted by exterminator
    Well, let me jump in between you and Yves to ask why can't you use a ready made database rather than implementing your own storage mechanism?
    I'm hoping I don't need the extra overhead. It would be great if I could integrate all this into my own app.
    I know how to build. What to build is a completely different story.

  12. #12
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: How do database repository files work?

    Quote Originally Posted by Paul Rice
    I'm looking at storing a vector that's a member of an object. I'd say, at this point, the contents of the vector looks rather volatile.
    Ok, so you have a vector that's 1 MB (say), when you change a single byte, do you write the changes immediately to the disk?

    I.e how much of the vector has changed between saves? Are the saves frequent (as in automatically if sth changes or every few seconds) or infrequent (as in the user clicks on "Save")?
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  13. #13
    Join Date
    Oct 2000
    Location
    London, England
    Posts
    4,773

    Re: How do database repository files work?

    Storing a vector object to disk bytewise is highly unlikely to achieve anything useful. It is not even useful to copy them bytewise.

    If you want to write a vector of objects to disk then either:
    - Write a "header" section first indicating the number of items
    - Have some kind of "terminator" that you read that determines when youv'e reached the end of sequence (not recommended)
    - Use a whole file so that EOF indicates end of sequence.

    Ensure that each object in the vector is output in a way that can be read back.

  14. #14
    Join Date
    Aug 2002
    Location
    Madrid
    Posts
    4,588

    Re: How do database repository files work?

    Oh yes, very good point NMTop. I hadn't noticed the cast, so that means that the vector actually holds something else than just raw bytes. If there is anything else in the vector than a POD, you really should not just cast it to bytes and write it, because each object may have some dynamic storage associated to it, some pointers, contain objects that have these or have other requirements that the constructor has to take care of.
    Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily.
    Supports C++ and VB out of the box, but can be configured for other languages.

  15. #15
    Join Date
    Dec 1999
    Location
    North Sydney, NS
    Posts
    445

    Re: How do database repository files work?

    Quote Originally Posted by Yves M
    Ok, so you have a vector that's 1 MB (say), when you change a single byte, do you write the changes immediately to the disk?
    The assumption at the moment is this wont be necessary.

    I.e how much of the vector has changed between saves?
    This, I'm sure, will vary.

    Are the saves frequent (as in automatically if sth changes or every few seconds) or infrequent (as in the user clicks on "Save")?
    An autosave feature is possible but isn't necessary. I'd imaging new data will be buffered while saving will likely be by request.
    I know how to build. What to build is a completely different story.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured