Click to See Complete Forum and Search --> : Premature Optimization: Drawing the Line
MrDoomMaster
April 3rd, 2008, 03:41 PM
I'm trying to draw a line between what is considered premature optimization and what isn't. My main concern right now is Big O Notation. If I make a design decision in a C++ application based on Big O Notation (Say I choose O(1) over O(n) complexity), does that make it a premature optimization decision? Where can the line be drawn between what is premature and what is not?
Again, the main question is: Can Big O Notation be considered premature optimization when considered at design time?
All opinions are welcome!
Lindley
April 3rd, 2008, 03:50 PM
No. Every design decision should be based around big-Oh notation. That's actually the reason big-Oh exists.
Premature optimization is anything that significantly reduces code readability for an insignificant improvement in performance.
MrDoomMaster
April 3rd, 2008, 04:22 PM
I always thought premature optimization extended beyond the realm of code readability. Concentrating on small optimization details during implementation is wrong to me, since your first goal should be *functionality*. Once you've gotten something functional and readable, you then go back and do *iterative* profiling then (and ONLY then) optimize based on those results, if needed. For example, a lot of premature optimization decisions I see are "Well, lets use object A instead of object B, because object A should be faster...". However, this decision was made due to speculation instead of through benchmarks or profiling.
People can write readable code and *still* be making premature optimization decisions, IMHO. Your overall design of the application may be affected due to premature optimization.
Lindley
April 3rd, 2008, 04:38 PM
Your overall design of the application should be affected by optimization decisions, at least to a point. No one would write something the naive, exponential-time way when a linear-time algorithm is easily available, for instance----even as a first pass.
In the end it's a trade-off between efficiency and coding difficulty. Where the proper balance lies should be decided on a case-by-case basis.
Now, little things like using assembly or C rather than C++ constructs in inner loops, that you probably don't need to worry about until you do some profiling.
Graham
April 3rd, 2008, 05:14 PM
There are a couple of articles on this subject in C++ Coding Standards by Sutter and Alexandrescu.
7stud
April 3rd, 2008, 10:44 PM
I'm trying to draw a line between what is considered premature optimization and what isn't.
I think you should always strive to write efficient code. Whether you should spend 100 hours rewriting code is another consideration. Maybe waiting and profiling your code once you are done might be the better option.
On the other hand, if you're going to be doing your Big O calculation in a separate thread while you are waiting to connect to a webisite, then it probably doesn't matter how slow the calculation is, so fretting about the details is a waste of time.
_uj
April 3rd, 2008, 11:09 PM
All opinions are welcome!
There's a complement to "don't optimize prematurely" which, half jokingly, states "don't pessimize prematurely". The first principle says you shouldn't spend lots of time improving code that doesn't influence the overall efficiency of a program. Before you put in an extensive optimizing effort make sure you're removing a performance bottleneck.
The second principle says that you shouldn't lay down unnecessarily inefficient code. While the first principle can be interpreted as that it doesn't matter at all what kind of code you produce as long as it works the second puts you on track again. You should produce code that's as efficient as possible all the time.
So what about Big O? In my view, having a choise you should pick the lowest Big O that's readily available. "Don't pessimize prematurely" tells you that. BUT on the other hand you shouldn't spend lots of time finding a lower Big O if one isn't readily available. "Don't optimize prematurely" tells you that.
STLDude
April 3rd, 2008, 11:13 PM
And avoid n squared algorithms like a plague, specially in games.
MrDoomMaster
April 4th, 2008, 09:45 AM
And avoid n squared algorithms like a plague, specially in games.
You mean like hash tables? :)
Lindley
April 4th, 2008, 09:56 AM
....hash tables shouldn't be O(n^2). If they are you're doing something wrong.
Well, not O(n^2) speed-wise. Conceivably they could be storage-wise.
STLDude
April 4th, 2008, 10:16 AM
Not hash table
for (int i = 0; i < 10; i++)
{
for (int y = 0; y < 10; y++)
{
// do something here
}
}
This is n squared algorithms.
There are exemptions to it, like walking over pixels in your buffer, where you need to touch every single one of them.
MrDoomMaster
April 4th, 2008, 10:35 AM
Not hash table
for (int i = 0; i < 10; i++)
{
for (int y = 0; y < 10; y++)
{
// do something here
}
}
This is n squared algorithms.
There are exemptions to it, like walking over pixels in your buffer, where you need to touch every single one of them.
Yes, a hash table! But not in regards to accessing elements. This is in regards to construction. Specifically I'm talking about boost::unordered_map. Go here (http://unordered.nfshost.com/doc/html/unordered/comparison.html) and look at table 1.5. Notice the 2nd row in the table. "Worst case is O(n^2)".
Lindley
April 4th, 2008, 10:48 AM
Oh sure, hash tables do have worst case O(n^2) now that I think about it. But the probability of that case happening with a decent hash function is vanishingly small, so it's not something you'd typically think about when describing them.
With high probability, hash table insertions and lookups are constant time; so the cost of inserting or looking up n objects will be O(n) WHP.
In order for the n^2 case to happen, a lot of the inputs must all have the same hash value. So long as periodic table expansions (and associated re-hashing) are allowed, this isn't usually a problem.
STLDude
April 4th, 2008, 11:20 AM
Lindley summed up very nicely here. When you deciding on containers, you also have to look at your data set, how are you going to use it, allot inserts, removal, iteration, how big, which key to use for lookup, etc. All those issues should be taken into account when deciding which container to use. If I have 100K objects and I'm pretty sure key collision is not an issue, then hash might be appropriate container, etc.
There are may factors to go into decision to pick container. Look up this (http://www.linuxsoftware.co.nz/containerchoice.png), it will show you nice flow chart.
One more think about your original question, deciding which container to use it's not a premature optimization it is solid software engineering.
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.