CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 13 of 13
  1. #1
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    very large amount of data: how to speed up? or what databse to choose

    Hi all,

    With a very large amount of data I mean 100 million records!

    When I try to select some data, it needs quite a time to execute.

    Currently I use MySQL5.1. I also created indices where necessary. Is there anyway to speed up things? Or should I go for a different sql server (MS SQL server, Oracle, ..?).

    I read some articles about different SQL servers and its speed, but all articles have a different opinion about what the fastest is.

    What is the best thing to do in this case?

  2. #2
    Join Date
    Jun 2006
    Posts
    437

    Re: very large amount of data: how to speed up? or what databse to choose

    Hi all

    Well, this post seems to be related the previous one I saw.
    100 million of record is really "very large amount of data". I suggest Oracle (yes, I love it ), but beyond the db engine, it's extremely important that your data model is correct. A bad design with few data which runs on a powerful db engine can result slower than a good design with large amount of data that runs on a poor database.

  3. #3
    Join Date
    Apr 2009
    Posts
    598

    Re: very large amount of data: how to speed up? or what databse to choose

    Database engines are all based on more or less the same principles: balanced trees for indexes. These trees can have a lot of branches, much more than 100 millions. If the tree has 10 branches per node, then the engine needs to check only 8 nodes to reach any record, since 10 power 8 is 100 millions. If the tree has only two branches per node, then the engine needs only 27 comparisons to find any record. So this should be very quick if your table has the correct index. Don't be afraid of adding secondary indexes, except if you don't have enough space.

  4. #4
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    Re: very large amount of data: how to speed up? or what databse to choose

    Quote Originally Posted by davide++ View Post
    100 million of record is really "very large amount of data".
    I know it is a lot of data, it could be even more. Analyzations so far prodicted that there will be about 500K-1Million new records each day.

    To explain the situation, we want to build our own adserver. Currently we are connected to a 3th party adserver. Each day we have 70 million(!) ads shown over the world and we are not the only company conntect to this adserver. So I'm really curious about how they handle that much data, but offcourse they won't reveal the technique behind this to us.

    We want to build a small version of this.

    Quote Originally Posted by olivthill2
    Database engines are all based on more or less the same principles: balanced trees for indexes. These trees can have a lot of branches, much more than 100 millions
    I created indexes, see the script below
    Code:
    CREATE TABLE  `my_db`.`my_table` (
      `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `vurl_id` bigint(20) unsigned NOT NULL,
      `url` varchar(3000) DEFAULT NULL,
      `creative_id` bigint(20) unsigned DEFAULT NULL,
      `publisher_entity_id` bigint(20) unsigned NOT NULL,
      `seller_line_item_id` bigint(20) unsigned NOT NULL,
      `ip_address` varchar(255) DEFAULT NULL,
      `user_city` varchar(255) DEFAULT NULL,
      `datetime` datetime NOT NULL,
      PRIMARY KEY (`id`),
      KEY `Index_vurl_id` (`vurl_id`),
      KEY `Index_publisher_id` (`publisher_entity_id`),
      KEY `Index_date` (`datetime`),
      KEY `Index_seller_line_item` (`seller_line_item_id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=35115 DEFAULT CHARSET=latin1;
    The test data I'm working with now, contains 30 million records. If the daterange doesn't get too big, it executes within about 10 seconds (depending on the kind of query). But I don't use any daterange, it is simply too much data to process and takes too long.

  5. #5
    Join Date
    Apr 2009
    Posts
    598

    Re: very large amount of data: how to speed up? or what databse to choose

    I don't know know very well MySQL, but I think that if one of the fields of the index is not specified, e.g. the date, in the WHERE clause of your SELECT, then the index is not used at all. This is the reason why secondary indexes, with fewer fields than in the primary key, are often very useful.

  6. #6
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    Re: very large amount of data: how to speed up? or what databse to choose

    I took a look at my queries I run on this table, and all columns in my WHERE clause are defined in the index.


    What I forgot to ask
    Quote Originally Posted by davide++ View Post
    I suggest Oracle (yes, I love it )
    Why would you suggest Oracle?


    Anyone else with suggestions?

  7. #7
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    Re: very large amount of data: how to speed up? or what databse to choose

    Btw, is there any difference between creating 2 indexes like
    KEY `Index_vurl_id` (`vurl_id`)
    KEY `Index_date` (`datetime`)

    Or create 1 index like
    KEY `Index_combined` (`datetime`,`vurl_id`)


    Or use them all 3 together?

  8. #8
    Join Date
    Jun 2006
    Posts
    437

    Re: very large amount of data: how to speed up? or what databse to choose

    Well.
    Indexing is one of the most important element of a good design, but when data is so big it cannot solve all problems. Yes, it's true, searches based on indexes are fast and their speed tend to increase very slowly when data grows; but sometimes you cannot use indexes, or indexes don't work because the values aren't distinct, they are almost the same, so indexes cannot get the benefits by search tree-based.
    You can think to other solutions; for example, you can divide your enormous table into smaller
    tables according some criteria, or you can implement some "history policy", ie copy the records that isn't searched very often into another table, and remove then from the main table. In both cases the idea is to perform queries in small tables instead of big tables.

    I suggested Oracle because probably it's the most powerful database engine, and usually it's used when there are big amount of data to manage. It's hard to say in few words what are the advantages using Oracle; nevertheless it's a very complex product, much more complex than SQLServer or other database programs. And it's more expensive too.

  9. #9
    Join Date
    May 2009
    Location
    Lincs, UK
    Posts
    298

    Re: very large amount of data: how to speed up? or what databse to choose

    I also love Oracle. Its advantages over the others: robustness, PL/SQL, its SQL has some extensions that make life much easier (the outer joins syntax for instance), it's not Microsoft... Yes, price is an issue, but they have some lightweight versions that are free (though I haven't try them, except for Personal Oracle, but that was a very long time ago).

  10. #10
    Join Date
    Sep 2000
    Location
    FL
    Posts
    1,452

    Re: very large amount of data: how to speed up? or what databse to choose

    1. Is there a reason you are using InnoDB? You might see some increases with MyISAM.
    2. With that large a dataset, I would definitely be looking at the hardware too. What does the disk IO look like? If you are having disk IO issues on the hardware, you will still end up with these issues on Oracle, IBM or MS databases as well.
    3. Creating a multi-field index can increase speed if used on fields that are used in the WHERE statement a lot.
    4. Are you running replication? Are you logging queries? If so, you might want to look at moving the log files to a seperate disk.

    If you want to stay with MySQL, check out the forums. You should be able to get some help with optimization.

  11. #11
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    Re: very large amount of data: how to speed up? or what databse to choose

    Quote Originally Posted by sotoasty View Post
    1. Is there a reason you are using InnoDB? You might see some increases with MyISAM.
    2. With that large a dataset, I would definitely be looking at the hardware too. What does the disk IO look like? If you are having disk IO issues on the hardware, you will still end up with these issues on Oracle, IBM or MS databases as well.
    3. Creating a multi-field index can increase speed if used on fields that are used in the WHERE statement a lot.
    4. Are you running replication? Are you logging queries? If so, you might want to look at moving the log files to a seperate disk.

    If you want to stay with MySQL, check out the forums. You should be able to get some help with optimization.
    The answers to your questions

    1: MyISAM doesn't support foreign keys
    2: 64-bit Windows Server 2008, Intel Xeon CPU (quad-core 2.83GHz), 8 GB RAM. This shouldn't be the problem I suppose.
    4: No replication and logging is running..

    Don't necessarely needs to be MySQL, need to find a balance between cost en performance.

  12. #12
    Join Date
    Sep 2000
    Location
    FL
    Posts
    1,452

    Re: very large amount of data: how to speed up? or what databse to choose

    Quote Originally Posted by dannystommen View Post
    1: MyISAM doesn't support foreign keys
    I thought that might be the case. Just thought I would ask.

    Quote Originally Posted by dannystommen View Post
    2: 64-bit Windows Server 2008, Intel Xeon CPU (quad-core 2.83GHz), 8 GB RAM. This shouldn't be the problem I suppose.
    This is not bad for a DB server. You might see some improvement if you upgraded the memory. Also, if you do get the 100 million records, I would definately say upgrade. Many times, the real issue is the hard drives. For example, if you run your query in the Query Browser, once the query returns it gives you 2 times at the bottom. The first is total query time, the second (in parens) is just the query time. If your query time is tiny, but the total query time is large, you probably have an IO issue. This is an issue you will see whether you are running MySQL, MS or Oracle.


    Quote Originally Posted by dannystommen View Post
    4: No replication and logging is running..
    Then that won't be an issue As I mentioned, check out the MySQL forums, there are lots of tweaks you can try to speed up your queries.

  13. #13
    Join Date
    Sep 2008
    Location
    Netherlands
    Posts
    865

    Re: very large amount of data: how to speed up? or what databse to choose

    Thanks for the advice so far.

    If anyone has even more suggestion, please let met know.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured