Database design confusion
I am creating a database that's going to have millions of records. I would like to get answers for following questions:
1. Is there any dis-advantage of normalized database (like performance de-gradation because of foreign key etc)?
2. Is there any disadvantage of having varchar field of size 256 and in most of the case I will utilize only 10 bytes ?
3. What should I take care of in order to writing highly optimized queries.
4. Is there any disadvantage of having extra columns(reserved), will it increase size on disk even if there is no data?
5. should I avoid join queries ?
6. any other thing which I should take care of ?
-Thanks,
navin
Re: Database design confusion
Quote:
Originally Posted by
navinkaus
1. Is there any dis-advantage of normalized database (like performance de-gradation because of foreign key etc)?
Yes, sometimes denormalization can make huge benefits. Like avoiding to join with a huge table.
Quote:
Originally Posted by
navinkaus
2. Is there any disadvantage of having varchar field of size 256 and in most of the case I will utilize only 10 bytes ?
SQL server stores the actual occupied size with each varchar field. And it's 2 or 4 bytes. So this can be a waste. Also there is a slight performance hit. But you'll not lose the whole unused 246 bytes.
Quote:
Originally Posted by
navinkaus
3. What should I take care of in order to writing highly optimized queries.
Lots of things:
- Use proper indexes
- Avoid unwanted casts
- Select only what you need
- Make sure db statistics are updated
- avoid per-record function calls or sub-queries if possible
- consider indexed views when appropriate
Quote:
Originally Posted by
navinkaus
4. Is there any disadvantage of having extra columns(reserved), will it increase size on disk even if there is no data?
It will make a small disk utilization. a bit flag for each nullable field is stored to indicate if it's null or not. I don't prefer this design anyway regardless of space issue.
Quote:
Originally Posted by
navinkaus
5. should I avoid join queries ?
Can you avoid them totally?
Quote:
Originally Posted by
navinkaus
6. any other thing which I should take care of ?
Test on real life data sizes and consider archiving old data.
Re: Database design confusion
Thanks for your reply, further questions to your answers:
1. How to decide what to normalize what not as initially you don't know much table will grow ?
2. As I understand in case of varchar If I give size 256 or 2032 it will still take same size on disk as it keeps size in 2 or 4 bytes, only difference is during input/output operation performance might hit as server will have to keep 2032 buffer, right ?
3. what are db statistics ?
4. Sub query should be avoided....means should I use join oftenly ?
Re: Database design confusion
Quote:
Originally Posted by
navinkaus
1. How to decide what to normalize what not as initially you don't know much table will grow ?
First, design the db to the 3rd normal form as usual. then start denormalizing. If you don't know what tables will grow then this is a problem. What you can do is to monitor the performance regularly and decide to refactor later.
Quote:
Originally Posted by
navinkaus
2. As I understand in case of varchar If I give size 256 or 2032 it will still take same size on disk as it keeps size in 2 or 4 bytes, only difference is during input/output operation performance might hit as server will have to keep 2032 buffer, right ?
I don't think that the buffer allocated is always the maximum size of the field. Logically it should decide the buffer size based on the size indicator. But of course this is only my assumption.
Quote:
Originally Posted by
navinkaus
3. what are db statistics ?
SQL server keeps statistics about data to decide the best execution plan. For example, what is the distribution of values in an index? should the search make index seek or scan? check: SQL Server UPDATE STATISTICS
Quote:
Originally Posted by
navinkaus
4. Sub query should be avoided....means should I use join oftenly ?
What I meant actually is correlated subquery. Where the query is executed per each row. This should be avoided if possible.