Click to See Complete Forum and Search --> : Database design confusion


navinkaus
February 18th, 2010, 08:51 AM
I am creating a database that's going to have millions of records. I would like to get answers for following questions:

1. Is there any dis-advantage of normalized database (like performance de-gradation because of foreign key etc)?
2. Is there any disadvantage of having varchar field of size 256 and in most of the case I will utilize only 10 bytes ?
3. What should I take care of in order to writing highly optimized queries.
4. Is there any disadvantage of having extra columns(reserved), will it increase size on disk even if there is no data?
5. should I avoid join queries ?
6. any other thing which I should take care of ?

-Thanks,
navin

hspc
February 18th, 2010, 05:18 PM
1. Is there any dis-advantage of normalized database (like performance de-gradation because of foreign key etc)?
Yes, sometimes denormalization can make huge benefits. Like avoiding to join with a huge table.
2. Is there any disadvantage of having varchar field of size 256 and in most of the case I will utilize only 10 bytes ?
SQL server stores the actual occupied size with each varchar field. And it's 2 or 4 bytes. So this can be a waste. Also there is a slight performance hit. But you'll not lose the whole unused 246 bytes.
3. What should I take care of in order to writing highly optimized queries.Lots of things:

Use proper indexes
Avoid unwanted casts
Select only what you need
Make sure db statistics are updated
avoid per-record function calls or sub-queries if possible
consider indexed views when appropriate

4. Is there any disadvantage of having extra columns(reserved), will it increase size on disk even if there is no data?It will make a small disk utilization. a bit flag for each nullable field is stored to indicate if it's null or not. I don't prefer this design anyway regardless of space issue.
5. should I avoid join queries ?Can you avoid them totally?
6. any other thing which I should take care of ?Test on real life data sizes and consider archiving old data.

navinkaus
February 18th, 2010, 09:25 PM
Thanks for your reply, further questions to your answers:

1. How to decide what to normalize what not as initially you don't know much table will grow ?
2. As I understand in case of varchar If I give size 256 or 2032 it will still take same size on disk as it keeps size in 2 or 4 bytes, only difference is during input/output operation performance might hit as server will have to keep 2032 buffer, right ?
3. what are db statistics ?
4. Sub query should be avoided....means should I use join oftenly ?

hspc
February 19th, 2010, 03:48 AM
1. How to decide what to normalize what not as initially you don't know much table will grow ?
First, design the db to the 3rd normal form as usual. then start denormalizing. If you don't know what tables will grow then this is a problem. What you can do is to monitor the performance regularly and decide to refactor later.
2. As I understand in case of varchar If I give size 256 or 2032 it will still take same size on disk as it keeps size in 2 or 4 bytes, only difference is during input/output operation performance might hit as server will have to keep 2032 buffer, right ?
I don't think that the buffer allocated is always the maximum size of the field. Logically it should decide the buffer size based on the size indicator. But of course this is only my assumption.
3. what are db statistics ? SQL server keeps statistics about data to decide the best execution plan. For example, what is the distribution of values in an index? should the search make index seek or scan? check: SQL Server UPDATE STATISTICS (http://www.sql-server-performance.com/tips/update_statistics_p1.aspx)
4. Sub query should be avoided....means should I use join oftenly ?
What I meant actually is correlated subquery (http://msdn.microsoft.com/en-us/library/ms187638.aspx). Where the query is executed per each row. This should be avoided if possible.