Apgar score for software?

**sammy8** · July 27th, 2007, 10:11 PM

In the 1950s, the Apgar test for newborn infants was introduced. It allowed doctors to evaluate the health of a newborn baby in seconds using only a few simple criteria, like heartrate, the color of the baby's skin, etc.

Can we do the same thing for software? Are there simple criteria that a person could quickly use to evaluate the technical quality of a software project? I am looking for suggestions for these technical criteria.

Let me set some ground rules.

1) If possible, they should be binary, either yes or no. If not, they should be as easy to determine as possible. We are looking for a quick estimate of quality, not a deep evaluation.

2) They should be purely technical. We don't want to know if the software is on schedule. We don't want to know if it is following a process. We don't want to know if they are using configuration management. We want to look solely at the source code, and determine its technical quality, regardless of how it was developed.

3) I will mostly apply these to C and C++, but if possible they should be applicable to any language, Python, Ruby, Matlab, Java, etc.

Here's some examples I am considering to get the ball rolling:

1) Does the software successfully compile with -WALL -WERROR?
2) What is the ratio of comments to code?
3) Is there error handling?
4) Is there error logging?
5) Is there a coding standard/meaningful variable names?
6) Are there automated tests written for it?
7) Are any functions over 500 lines?
8) Is there any dead (uncallable) code?

I am interested in opinions, comments and of course, question suggestions. And I will mention that this idea is certainly not originally my own, I came across this in Daniel Read's blog. We are thinking of actually implementing this at work and I was looking to this community for suggestions.

**sammy8** · July 29th, 2007, 11:34 PM

Ah! Not one response? I thought this community would have a lot of opinions and input on this topic. Maybe I chose the wrong forum? I picked C++ because that is the primary language being evaluated, but maybe this forum is focused on direct coding issues?

Could a moderator move this to General Developer Topics or another forum you think would more appropriate?

Thanks,
Sammy

**DHillard** · July 30th, 2007, 07:27 AM

But in the end, what would it (the score) tell you?

Can't you imagine a scenario where you could get a perfect score, yet the software still be a big piece of crap?

**TheCPUWizard** · July 30th, 2007, 07:33 AM

Over the years there have been many metrics developed. I have personally used quite a few of them in the 30+ years I have been developing software.

Alas, due to the nature of software, it is very hard to get "absolute" meaningful numbers. Consider an old metric of measuring the ration of comments to code (very typical for Assembly in the 1970's). There was no way however to tell if the text in the comments was at all meaningful or correct.

The best metrics I use today (which require source access) is that I have (nearly) 100% code coverage in my unit tests, no compiler warnings, and no warnings from an analysis tool like Visual Studios "Code Analysis".

**sammy8** · July 31st, 2007, 10:24 AM

Can't you imagine a scenario where you could get a perfect score, yet the software still be a big piece of crap?

Well, I am thinking of it like the Apgar score for babies - if the score is low, there is something wrong with the code (or baby), guaranteed. If the score is high, there still might be something less obvious wrong with the code, but at least you know some things are ok.

I'll also say that this quick scoring technique is meant to be used on code written in 'good faith'. Of course a programmer could always write code specifically to get a good apgar score and be crap, but that's not what this is meant to detect. It meant to examine code that was written without a thought to how it would be evaluated.

**PredicateNormative** · July 31st, 2007, 12:32 PM

Tools like Lint are available to tell you if they think there is a possibility that things are wrong with the code... Lint (and no doubt other tools) is not always correct though. If you want some figure on code quality you could count the number of Lint errors.

However, as DHillard has inferred and TheCPUWizard has pointed out, this doesn't mean the code does what it is intended to do. Neither does it mean its easy to follow, well designed, well implemented or scalable. Basically, it could pass all the tests you could ever want it to, but that doesn't mean its good code, worth reusing, or more to the point, worth using at all.

I have seen code written in my company, that passes all unit tests, doesn't give any compiler warnings, or lint errors, but is not exception safe and still leeks a ton of memory.

**TheCPUWizard** · July 31st, 2007, 01:57 PM

Originally Posted by PredicateNormative

I have seen code written in my company, that passes all unit tests, doesn't give any compiler warnings, or lint errors, but is not exception safe and still leeks a ton of memory.

Same here.

Nut I do have to say that statistically speaking the odds are better than with code that: fails some/all unit tests, generated tons of warnings, etc......

**PredicateNormative** · August 1st, 2007, 04:48 AM

When I get handed or find poorly written code (i.e lots of unnecessary pointers, more unnecessary pointers, pointers to pointers, pointers for the sake of pointers, pointer returns to memory declared and dynamically allocated within some function in one dll and passed out to a function in another dll that may or may not call delete on the memory when it is finished....); I sometimes wonder what to do:

1)Spend hours trying to figure out how the code is meant to work, and also check that every 'new' has an appropriate 'delete' (and every malloc has an appropriate free) by checking about 30 files over which the monstrosity is written - only to realise that someone could very easily break the code even if it isn't broken already.

2)Spend hours rewriting the code in a safer, more understandable, robust and object orientated way.

**TheCPUWizard** · August 1st, 2007, 06:44 AM

Originally Posted by PredicateNormative

When I get handed or find poorly written code (i.e lots of unnecessary pointers, more unnecessary pointers, pointers to pointers, pointers for the sake of pointers, pointer returns to memory declared and dynamically allocated within some function in one dll and passed out to a function in another dll that may or may not call delete on the memory when it is finished....);

That's the whole point

(sorry could not resist).
I sometimes wonder what to do:

1)Spend hours trying to figure out how the code is meant to work, and also check that every 'new' has an appropriate 'delete' (and every malloc has an appropriate free) by checking about 30 files over which the monstrosity is written - only to realise that someone could very easily break the code even if it isn't broken already.

2)Spend hours rewriting the code in a safer, more understandable, robust and object orientated way.[/QUOTE]

My approach (80%+ of the time). Develop tests to validate the existing software without making any modifications to the software. This is a NO risk way to make sure you understnd what the software does without necessarily needing to figure out all of the details of how.

Once you have a solid set of tests, then architect, design and implement a new well thoughout code base. Make sure this code base runs 100% against the existing tests.

Whe you have a set of tests that runs 100% agains both the old and new implementations, try running "system level" tests. If the application passes a QA cycle, then you are good to go.

Only after this happens do you think about making changes to thinks like API's or other functional issues.

This approach has worked extremely well for me, and actually acounts for about 40% of my companies gross revenue from consulting projects.

**sammy8** · August 1st, 2007, 02:54 PM

Sure, one of my examples is "Are there automated unit tests?" which is a quick basic check to see of the code does what it should. Could the unit tests be bad and not test the right things? Of course, but again, you are kind of missing the point of having a quick Apgar score. It is the exception and not the rule that unit tests someone took time to write are useless.

this doesn't mean the code does what it is intended to do. Neither does it mean its easy to follow, well designed, well implemented or scalable.

Now we're getting somewhere! So how can you tell quickly is code is scalable? Is there a binary check that is right most of the time?

My suggestion of measuring comments/total SLOC is meant to determine if the code is easy to follow. Is there a way to tell is code is well-designed? Well-implemented?

<edit:>

Let me expand a little on the list of criteria some people have suggested:

Works: successful compile, successful automated unit tests
Well-implemented: functions less than 300 SLOC? coding standard? Passes LINT?
Easy to follow: comments/total SLOC
Well-designed: no dead code? Error handling?
Scalable: ????

Any other criteria?

**TheCPUWizard** · August 1st, 2007, 03:03 PM

Originally Posted by sammy8

Sure, one of my examples is "Are there automated unit tests?" which is a quick basic check to see of the code does what it should. Could the unit tests be bad and not test the right things? Of course, but again, you are kind of missing the point of having a quick Apgar score. It is the exception and not the rule that unit tests someone took time to write are useless.

Now we're getting somewhere! So how can you tell quickly is code is scalable? Is there a binary check that is right most of the time?

My suggestion of measuring comments/total SLOC is meant to determine if the code is easy to follow. Is there a way to tell is code is well-designed? Well-implemented?

To repeat myself (and remember this is just based on my 30+ years of dealing with software written by other people)...

You can NOT quickly tell if the code is scalable, but you may quickly determine tht it is not.

Well written code should require few if any comments within the code. So comment ratio's are useless. In fact the majority of code that has been in use (and frequently modified) for a few years, tends to have more comments that are WRONG. Since it is known that comments are NOT maintained (the big one being copy/paste where the header comment actually contains the wrong file/class/method name

)

There is no way to tell if code is well designed. Two peices of code which are intended to perform the exact same function (i.e. given a set of inputs produce identical outputs), may very well be implemented in completely different fashions, and if they were swapped, both fail to meet the usage requirements.

EVERY "metric" based tool I have used has simply been able to point out what is utter crap. Items which give "good" scores, in reality range from crap to excellent. A quick view by a knowedgleable developer is much more accurate.

ps: I have found that about 80% of unit tests that people have written are useless. They fail to handle simple things like out of bound parameters, simulation of exceptions within lower levels, etc. When I used to write validation tests for military systems, the average was 5-10 hours of test development per hour of production software development. Very rarely is this level of testing done on Business type systems.

**sammy8** · August 1st, 2007, 09:18 PM

CPU said (emphasis mine):

You can NOT quickly tell if the code is scalable, but you may quickly determine that it is not.

How? This would be a great metric for the Apgar score.

As to the rest of your comments, I am starting to see the disconnect here. I should have mentioned that I work on military and space applications, so the level of rigor is quite high, and process is well-defined and carefully followed. You can assume all development is performed at CMMI level 3 or higher. Useful peer reviews, requirements traceability and accurate, maintained documentation is a given. I will not be evaluating some kid's homework assignment with this Apgar score. In that case I would agree with your position. But for the software I will be evaluating, the problems you mention largely don't exist.

For example,

Well written code should require few if any comments within the code. So comment ratio's are useless. In fact the majority of code that has been in use (and frequently modified) for a few years, tends to have more comments that are WRONG. Since it is known that comments are NOT maintained

We will have to agree to disagree here. Comments are incredibly useful and important. Of course they aren't useful if they aren't maintained, but that hardly ever happens on projects I review. Could it happen? Sure, maybe one time out of 20 - but a quick metric of comment-to-code ratio that's right 95% of the time is just what I am looking for.

I have found that about 80% of unit tests that people have written are useless. They fail to handle simple things like out of bound parameters, simulation of exceptions within lower levels, etc.

That can happen if requirements are not traced to the tests and if the tests aren't peer reviewed, but again, I am assuming a basic software process is in place and being followed. If unit tests are written for my projects, you can assume those two steps have occured, which means the chances of them being worthless has probably dropped to less than 1%.

Does knowing the level of software I review change your mind as to the usefulness of a quick scoring technique?

**TheCPUWizard** · August 1st, 2007, 09:33 PM

Originally Posted by sammy8

Does knowing the level of software I review change your mind as to the usefulness of a quick scoring technique?

Yes it does, quite considerably (I did almost exclusively mil spec software from 1977-1992).

Still it is a non-trivial task, and I would not be confident that it can be achieved (translation, if someone offered me a very lucrative contract to write such a peice of software, I would not take the contract).

Regarding comments, it really depends on what they are (and this is very difficult to parse. The following are completely useless (and likely to end up wrong at some point..

Code:

int i; // declare integer variable
i = 3; // initialize i to 3
i += j; // increment j by the current value of j.

If you look at the documentation methodology used by toold such as Doxygen, nDoc, jDoc, and SandCastle. These are very useful (XML formatted comments at the start of each element, that can be parsed and processed. Still I find (at least in commercial/industrial software) many cases where the information is not kept up to date.

The manual review proceses are really your best bet. Utilizing a good static analysis tool that can be "tuned" to your exact requirements will take some of the work out of the process, but the final reviews will still be done my a human.

Thread: Apgar score for software?

Thread Tools

Display

Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Re: Apgar score for software?

Posting Permissions