mop
March 24th, 2003, 09:36 AM
The min and max values for IEEE 754 single precision is 1.1754E-38 and 3.4028E+38 respectively.
This leads one to belive that as long as I stay within the constraints specifed, the program should - for the most part - produce the correct result.
Referencing the program below (2.0e20 + 1) - 2.0e20, does not produced the correct result (ie 1.00000), however (2.0e6 + 1) - 2.0e6 does. Did some reading and realized that in order to add two floats the exponents must be the same and there's a normalization process that the values go through such that if the difference between the exponents is greater than the number of
digits of precision, the value of the smaller number will drop to 0 by the time the exponents are the same. The question then becomes how would I avoid 'situations' in my prgroam such that the difference between the exponents IS NOT greater than the number of digits of precision?
I'm using a 32 bit fixed point processor that has libraries for doing floating point math. For benchmarking purposes I suspect the largest floating point values I could multiply that'll produce the correct result is 3.4028E+38 * 3.4028E+38?
Thanks for the assistance
#include "stdafx.h"
#include "stdio.h"
#include "math.h"
int main(int argc, char* argv[])
{
float a, b, rel_diff;
// b = 2.0e7 + 1; -- doesnt work
// a = b - 2.0e7;
// b = 2.0e8 + 1.0; -- doesnt work
// a = b - 2.0e8;
b = 2.0e20 + 1; // doesnt work
a = b - 2.0e20;
// b = fabs(1.0e20) + 1.0;
// a = fabs(b) - fabs(1.0e20);
// b = 2.0e6 + 1.0e6; // works
// a = b - 2.0e6;
// rel_diff = (b - a)/ (a + b); // or MAX(a,b)
printf("%f \n" , a);
return 0;
}
This leads one to belive that as long as I stay within the constraints specifed, the program should - for the most part - produce the correct result.
Referencing the program below (2.0e20 + 1) - 2.0e20, does not produced the correct result (ie 1.00000), however (2.0e6 + 1) - 2.0e6 does. Did some reading and realized that in order to add two floats the exponents must be the same and there's a normalization process that the values go through such that if the difference between the exponents is greater than the number of
digits of precision, the value of the smaller number will drop to 0 by the time the exponents are the same. The question then becomes how would I avoid 'situations' in my prgroam such that the difference between the exponents IS NOT greater than the number of digits of precision?
I'm using a 32 bit fixed point processor that has libraries for doing floating point math. For benchmarking purposes I suspect the largest floating point values I could multiply that'll produce the correct result is 3.4028E+38 * 3.4028E+38?
Thanks for the assistance
#include "stdafx.h"
#include "stdio.h"
#include "math.h"
int main(int argc, char* argv[])
{
float a, b, rel_diff;
// b = 2.0e7 + 1; -- doesnt work
// a = b - 2.0e7;
// b = 2.0e8 + 1.0; -- doesnt work
// a = b - 2.0e8;
b = 2.0e20 + 1; // doesnt work
a = b - 2.0e20;
// b = fabs(1.0e20) + 1.0;
// a = fabs(b) - fabs(1.0e20);
// b = 2.0e6 + 1.0e6; // works
// a = b - 2.0e6;
// rel_diff = (b - a)/ (a + b); // or MAX(a,b)
printf("%f \n" , a);
return 0;
}