I'm making updates and maintaining some very old code. The original program suite was written for Windows 1.0 and except for a port to Win 32 and what I've been doing the last year, there have been few updates. This system does many things but one area where it is weak is in a curve comparison function.

Basically it compares two XY curves using a tolerance and spits out a pass/fail. The user can enter the tolerance, or they can use an automatic tolerance. The automatic tolerance has problems, and the comparison can spit out false passes when the curve are obviously very different (by visual inspection) and sometimes does a false fail on curves that visually look the same on the graph.

Eventually I'm going to be able to do a major rewrite of the whole system, but before I can start on that, we need to get to a stable spot with what they are selling now.

I need a better algorithm than we're using now. A search of the net turned up a lot of comparison logic for doing sorts, and some stuff on comparing two points, but I need to be able to test two curves against one another. The data presented to the comparison is basically two arrays of floats with X and Y data. The two arrays can be differing size as the user can change the number of data points in a test.

This system is collecting data via I/O. The way it's done now you always have a Control sample that is known working and that is compared against a unit under test.

In the future, my customer would like to be able to test many samples and do statistical analysis on the data then come up with a statistical Control sample. In this scenario, you'd have say 100 units, you know most of them are good, but some may be counterfeit. You don't know if a specific unit is real or counterfeit though. By running a statistical analysis on all the data, we can hopefully develop a middle of the bell curve Control sample that future units will be compared against.

That's coming sometime next year. Right now the scenario is the simpler one with just two curves. One is the Control from a known good unit. The other curve is comparing against it. If both curves always had the same number of data points and the same start and end points, that would be very simple. What confounds me in making an algorithm is that curves that would still pass could start at different points, and have a different number of data points.

For example, one curve could start at -10 and go to +10 with 200 samples. The other curve could go from -7 to +7 with 100 samples. The comparison is essentially trying to do what the eyeball can do intuitively looking at the curves on the screen. If they cover each other close enough for the tolerance, it passes. If they don't have the same shape (within tolerance), it fails.


Are there any curve comparison algorithms out there? Can anyone recommend any books, articles, websites, or even sample code?

This is where my EE background instead of CompSci background is failing me. I'm an embedded programmer who has moved into a mix of embedded and the Windows app world. The math I took in school (20+ years ago) was more practically focused than the more theoretical stuff the CompSci and Math majors got (at my school back then the CompSci department was under the same umbrella as the Math department).

How it currently works - In case anybody cares…

The auto tolerance is set by getting the sample size (range/numSamples) and multiplying by a factor. In some cases, this results in a tolerance wider than the entire range.

Then it narrows the compare by looking at the endpoint X values (Xmin and Xmax). It takes the lesser of the two. For the above example with a range of -10 to +10 and -7 to +7, the range for the compare would be -7 to +7.

Then it takes the starting point in one curve and compares the X and Y to every single point in the other curve. If any point in the other curve falls within the box created by the tolerance, that point passes and it goes on to the next point.

The point by point comparison method tends to work OK when the tolerance is something reasonable, but the way auto tolerance is done, that can allow false passes through. I haven't figured out why we get false fails though. My customer is trying to get some real world data from one of his customers in which false fails are happening.

Intuitively, I don't like the compare one point to everything method. It just doesn't seem right, though it might be among the better choices if the tolerance could be set more intelligently.

Thanks,

Bill