CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 10 of 10
  1. #1
    Join Date
    Mar 2009
    Location
    Riga, Latvia
    Posts
    128

    [RESOLVED] Recommend database engine for Java program

    Hello!

    I'm implementing a multilingual dictionary. (In fact, I have plans to extend it to machine translation system.) I need a database engine for it. Here is a list of my demands for the database engine:

    1) It should be capable to handle hundreds of megabytes of data quite fast (words + grammar info).
    2) Database should be a single file. (As far a I know this is false of Apache Derby aka Java DB.)
    3) Database license shouldn't require me to release my application under a copyleft license.

    Points 1 and 2 have a higher priority for me.

  2. #2
    Join Date
    Feb 2008
    Posts
    966

    Re: Recommend database engine for Java program

    Sounds like you had better get started. YOU have a lot of work to get done

    On a side note, why the hell would you want the database, which will be several if not tens of Gigabytes large, to be stored in one single file?

  3. #3
    Join Date
    Mar 2009
    Location
    Riga, Latvia
    Posts
    128

    Re: Recommend database engine for Java program

    First, not gigabytes but just about 200-500MB. (I'm writing this app for fun and to learn Java, so this is a very optimistic estimation of dictionary completeness.) I don't think that database size will exceed 2GB. (This is maximal file size for FAT16, as far as I know.) Any modern file system (NTFS, ext4) supports 2GB+ large files.

    I believe a single file database is less confusing for a user. Even Java DB now supports archived databases, but unfortunately only for reading.

  4. #4
    dlorde is offline Elite Member Power Poster
    Join Date
    Aug 1999
    Location
    UK
    Posts
    10,163

    Re: Recommend database engine for Java program

    Quote Originally Posted by andrey_zh View Post
    I believe a single file database is less confusing for a user. Even Java DB now supports archived databases, but unfortunately only for reading.
    The user should not need to be aware how the database is implemented.
    An archived database is what it says - an archive; it isn't a usable database.

    Even for a single record set with no indexes or relations, a single file is a poor choice if you have hundreds of megabytes of data - and it won't be 'quite fast', so it won't meet your criteria. If you have anything more complex than a single record set, a single file is likely to be a complete waste of time.

    Learning results from what the student does and thinks, and only from what the student does and thinks. The teacher can advance learning only by influencing the student to learn...
    H. Simon
    Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.

  5. #5
    Join Date
    Feb 2008
    Posts
    966

    Re: Recommend database engine for Java program

    Quote Originally Posted by andrey_zh View Post
    First, not gigabytes but just about 200-500MB. I don't think that database size will exceed 2GB.
    I think you are missing the point. Even if you only have 100 MB worth of data, trying to read and write to this file in as a single file DB will result in extremely slow CRUD operations.
    Quote Originally Posted by andrey_zh View Post
    (This is maximal file size for FAT16, as far as I know.) Any modern file system (NTFS, ext4) supports 2GB+ large files.
    Nobody is concerned with whether or not your OS can handle a file of 2GB or not. If you are developing this on Windows 95 service pack A (alpha) then you might be concerned. Once again, you are missing the big picture. I never said that your OS can't handle large files, I am suggesting that using large files for a lot of reads and writes is a bad idea.
    Quote Originally Posted by andrey_zh View Post
    I believe a single file database is less confusing for a user.
    dlorde already hit on this one, but to add to this: even if your "users" are developers writing Java, THEY should not be concerned either. I don't go around asking my Oracle DBA's what their indexing schemas are and how they have implemented file distribution. I don't care, nor should I.
    Quote Originally Posted by andrey_zh View Post
    Even Java DB now supports archived databases, but unfortunately only for reading.
    I think that you need to do some fundamental reading and research on databases before you start such a large project. I am not trying to be rude by saying this, but I think that your understanding of how these things work is far from where it should be for a project of this caliber.

    My suggestion would be to stop worrying about the database implementation and just go with one like MySQL that will handle the file I/O for you. All you have to do is install the DB, configure a few things and start the services. You won't have to worry about how it stores files.

  6. #6
    Join Date
    Jul 2010
    Posts
    17

    Smile Re: Recommend database engine for Java program

    If the aim of your project is to learn java (and not RDBMS administration, SQL etc) and nothing else, then you may want to start with a simpler project.

    If you really want to use this as your project, then I would reccommend NOT using a RDBMS, and instead going for an OODBMS. db4o is quite good, and it meets all three of the things you required.

  7. #7
    Join Date
    Mar 2009
    Location
    Riga, Latvia
    Posts
    128

    Re: Recommend database engine for Java program

    I think that you need to do some fundamental reading and research on databases before you start such a large project. I am not trying to be rude by saying this, but I think that your understanding of how these things work is far from where it should be for a project of this caliber.
    You are generally right in evaluation of my skills. I have written only one (!) program that uses databases so far. (That was a Web-"page" on PHP and SQL database.) Basically I already know some things about Java (inheritance, generics and how to use it and so on). All of my programs written so far (in C, C++, Java, ...) are focused on algorithms. Suddenly I've realized that my programming skills are very far from real world. (I'm almost complete noob in networking and databases.)

    I have never learned the inner working of the databases, but I believe that defragmentation and consolidation procedures could be done on a database file. Of course this will take time and write and erasure operations will ruin the order. But this should decrease database size and access time.

    Imagine I have just 100MB of (any) data. If database file size is 1GB large, it will mean 900% overhead. It's very ineffective and unbelievable for me.

    I think you are missing the point. Even if you only have 100 MB worth of data, trying to read and write to this file in as a single file DB will result in extremely slow CRUD operations.
    A typical operation on a dictionary is reading.

    An archived database is what it says - an archive; it isn't a usable database.
    H2 database: http://www.h2database.com/html/featu...atabase_in_zip
    Derby: http://db.apache.org/derby/docs/10.5...ploy11201.html
    Last edited by andrey_zh; July 2nd, 2010 at 03:07 PM.

  8. #8
    Join Date
    Jul 2010
    Posts
    17

    Re: Recommend database engine for Java program

    If you just need a 'file' to read from use db4o. (Or some other lightweight non-sql relational db)

    Relational modeling, CRUD optomisations etc will not add to anything your attempting to do.

    If you plan on scaling it massively look at key-value stores. You can apparantly get 30,000 reads per second with some of them on desktop workstation hardware.

    Thanks,
    Vackar

  9. #9
    dlorde is offline Elite Member Power Poster
    Join Date
    Aug 1999
    Location
    UK
    Posts
    10,163

    Re: Recommend database engine for Java program

    Quote Originally Posted by andrey_zh View Post
    Quote:
    An archived database is what it says - an archive; it isn't a usable database.
    H2 database: http://www.h2database.com/html/featu...atabase_in_zip
    Derby: http://db.apache.org/derby/docs/10.5...ploy11201.html
    Well clearly it depends what I mean by 'usable' - maybe I should have said 'practical'. Your requirement was for performance, and a single zip archive of the hundreds of megabytes you specified is unlikely to be practical for your requirements. Additionally, databases in a jar or zip file are generally read-only, which makes for a pretty limited kind of use.

    But hey, if you think it will do the job, go for it.

    There are features that should not be used. There are concepts that should not be exploited. There are problems that should not be solved. There are programs that should not be written...
    R. Harter
    Please use [CODE]...your code here...[/CODE] tags when posting code. If you get an error, please post the full error message and stack trace, if present.

  10. #10
    Join Date
    Jun 2007
    Location
    Aurora CO USA
    Posts
    137

    Re: Recommend database engine for Java program

    Quote Originally Posted by andrey_zh View Post
    You are generally right in evaluation of my skills. I have written only one (!) program that uses databases so far. (That was a Web-"page" on PHP and SQL database.) Basically I already know some things about Java (inheritance, generics and how to use it and so on). All of my programs written so far (in C, C++, Java, ...) are focused on algorithms. Suddenly I've realized that my programming skills are very far from real world. (I'm almost complete noob in networking and databases.)
    Which is why you came here to ask questions. Recognizing your skill level and not letting your ego interfere are good traits. My question is, if you realized your noob status and asked for advice, why are you so resistant to taking the advice from experts in the forum who (obviously) know more than you?

    I have never learned the inner working of the databases, but I believe that defragmentation and consolidation procedures could be done on a database file. Of course this will take time and write and erasure operations will ruin the order. But this should decrease database size and access time.
    You should have a working knowledge of these topics. But you definitely should not design your program around these constraints. It locks you into one SW provider and limits your choices later on.
    Imagine I have just 100MB of (any) data. If database file size is 1GB large, it will mean 900% overhead. It's very ineffective and unbelievable for me.

    A typical operation on a dictionary is reading.
    One thing you can count on in SW development is that data almost always grows. Your system that works correctly today will be completely useless in a couple of years because the data size now overwhelms the code. I am dealing with this very issue in a legacy system I support. Worked fine with the data set of 5-7 years ago. Completely stalls with the current amount of data. Plan for the future!

    Again, you came here looking for advice. Don't ignore all the good advice you're being given just because you're afraid of adding a little complexity for the end-user. Your installation program/instructions can easily take care of that for them.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured