
July 31st, 2013, 01:32 AM
#1
double vs int64
You'd think this would be a simple thing to find out from the internet but I must admit, I've struggled to find the answer
1) What is the range of numbers covered by a 64bit double?
2) Ignoring fractions, is the above range wider or narrower than the range covered by an int64?
"A problem well stated is a problem half solved.”  Charles F. Kettering

July 31st, 2013, 01:57 AM
#2
Re: double vs int64
Best regards,
Igor

July 31st, 2013, 02:28 AM
#3
Re: double vs int64
All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/
C, C++ Compiler: Microsoft VS2017.2

July 31st, 2013, 03:47 AM
#4
Re: double vs int64
Thanks guys. If I'm using my rusty old calculator correctly it looks like double has got a MUCH wider range than int64  and even the humble float is almost comparable to int64. float seems to be roughly +/1.1 x 10^17. int64 approx. +/9.25 x 10^18. int32 is +/2.1 x 10^9.
Last edited by John E; July 31st, 2013 at 04:01 AM.
"A problem well stated is a problem half solved.”  Charles F. Kettering

July 31st, 2013, 04:40 AM
#5
Re: double vs int64
Originally Posted by John E
Thanks guys. If I'm using my rusty old calculator correctly it looks like double has got a MUCH wider range than int64  and even the humble float is almost comparable to int64. float seems to be roughly +/1.1 x 10^17. int64 approx. +/9.25 x 10^18. int32 is +/2.1 x 10^9.
But you realize that there is a big difference between using integral and floating point values, correct? That difference being accuracy.
Floating point variables are not exact (unless they are sums of inverse powers of 2). An int64 is always exact, since it is an integer. Calculations that require exact math cannot be done reliably using floats and doubles. So the reasons for using float/double versus int64 is much more than range.
For example for money calculations, it is advantageous to use integers representing the smallest unit of currency (example, for USA it would be cents instead of dollars). Then the int64 can be used to represent purely cents instead of a dollar.cents.
Regards,
Paul McKenzie
Last edited by Paul McKenzie; July 31st, 2013 at 04:48 AM.

July 31st, 2013, 07:03 AM
#6
Re: double vs int64
Originally Posted by John E
Thanks guys. If I'm using my rusty old calculator correctly it looks like double has got a MUCH wider range than int64  and even the humble float is almost comparable to int64. float seems to be roughly +/1.1 x 10^17. int64 approx. +/9.25 x 10^18. int32 is +/2.1 x 10^9.
Yes, it has a wider range, but that range isn't continuous. try storing 144.115.188.075.855.873 in a double, then reading it back out.
also note that most calculators don't work with a "double", but work with a floating point type that is larger than a double. So even your rusty old calculator probably exceeds the capabilities of a double.

July 31st, 2013, 10:00 AM
#7
Re: double vs int64
Originally Posted by OReubens
try storing 144.115.188.075.855.873 in a double, then reading it back out.
Presumably I was supposed to remove all the periods?
Interestingly, the compiler told me the number would get truncated from int64 to double. But according to the debugger it looked lie the right number
"A problem well stated is a problem half solved.”  Charles F. Kettering

July 31st, 2013, 11:24 AM
#8
Re: double vs int64
Originally Posted by John E
Presumably I was supposed to remove all the periods?
The dot is the thousands separator in Belgium (where I presume ORueben is posting from).
Regards,
Paul McKenzie

July 31st, 2013, 05:24 AM
#9
Re: double vs int64
Hi Paul,
Yes, I understand about the inherent inaccuracies with float and double. Here's the problem I'm considering:
Code:
void some_func(int64_t a, int64_t b)
{
printf ("%u\n", abs( ab ));
}
I'm working on a program (originally written for Linux) which consistently sends 64bit values to abs(). That's just a simple example above. The actual functions are usually more convoluted. The problem is that VC++ doesn't seem to have a version of abs() that accepts int64_t. The only types available support float, double, int or long. I'm trying to figure out which type I should use so that I don't lose accuracy (or at least, I lose as little accuracy as possible).
"A problem well stated is a problem half solved.”  Charles F. Kettering

July 31st, 2013, 06:54 AM
#10
Re: double vs int64
Originally Posted by John E
The problem is that VC++ doesn't seem to have a version of abs() that accepts int64_t.
make your own...
Code:
int64_t abs(int64_t val)
{
if (val<0)
return val;
else
return val;
}
depending on need, you may have to do somethign special in case val is 2^{63} because that can't be represented in a positive int. A potential solution is returning an unsigned int64_t, but that may not fit your problem domain.

July 31st, 2013, 09:47 AM
#11
Re: double vs int64
Originally Posted by OReubens
make your own...
Code:
int64_t abs(int64_t val)
{
if (val<0)
return val;
else
return val;
}
Good suggestion, Thanks.
I also realised that for 64bit values on Linux, they should really be calling llabs(), rather than abs(). A convenience macro can then be used to map llabs() to __abs64() which is the Windows equivalent.
"A problem well stated is a problem half solved.”  Charles F. Kettering

July 31st, 2013, 06:50 AM
#12
Re: double vs int64
an int64 has an effective accurate range from  2^{63} all the way to + 2^{63}1
a double has an 53bit mantissa (with an implied leading 1) and it has a separate sign bit so it has an effective accurate integer range from  2^{54} all the way to + 2^{54}.
Or to put it another way, a double can accurately represent any value an int55 (assuming such a thing existed) can.
now, a double can store larger values (and it can store fractions), but none of those will guarantee accurate integer values not are they in a continuous range. or put another way, any other values not in the "int55" range will be approximations.

August 1st, 2013, 07:21 AM
#13
Re: double vs int64
yes, sorry about that. thousand separator.
Also, a correction. I initially looked up the value of DBL_MANT_DIG to post the above, and DBL_MANT_DIG is defined as 53
I knew a double has an implied 1 in front, so I added this on, but DBL_MANT apparently already has it built in as well. (Doh!)
so change my above to:
a double has an 52bit mantissa (with an implied leading 1) and it has a separate sign bit so it has an effective accurate integer range from  2 ^{53} all the way to + 2 ^{53}.
Or to put it another way, a double can accurately represent any value an int54 (assuming such a thing existed) can.
Interestingly, the compiler told me the number would get truncated from int64 to double. But according to the debugger it looked lie the right number
well yes, storing it in a double right away as in
double x = 144115188075855873;
I would have expected the compiler to output a warning (which in and by itself should already have been a clue of it's own).
what you were getting is probably the compiler seeing it is a const and displaying the full const value without stuffing into an actual double.
any sort of "simple" code is probably going to need some form of "don't optimize this" to actuall proove the point I was trying to make.
Code:
double x = 144115188075855873;
__int64 i = (__int64)x;
Running this in a debug build or with all optimisations off results in i being equal to 144115188075855872 on VC2010. (and I would expect the same result on any compiler given how truncating/rounding should work.
Last edited by OReubens; August 1st, 2013 at 07:23 AM.

August 1st, 2013, 08:20 AM
#14
Re: double vs int64
Thanks again for that full explanation OReubens. I tried that assignment, like you suggested (double to int64_t) and you were absolutely right. The int64_t was 1 less than the original number.
Actually I think there's something else I haven't fully understood in all this (the meaning of the letter 'E'). Looking at that web page that Igor linked to, I noticed that type float can hold a maximum (positive) number of 3.4E38. I originally thought that 'E' meant 'e' the natural logarithm (i.e. 2.7182818). So I calculated 3.4E38 to mean:
3.4 x E^38  or in other words, ((2.7182818^38) * 3.4)
But my debugger suggests that my assumption was completely wrong! It gives the impression that 3.4E38 actually means (3.4 * 10^38)
How confusing...
"A problem well stated is a problem half solved.”  Charles F. Kettering

August 1st, 2013, 08:42 AM
#15
Re: double vs int64
E stands for exponent and is used for scientific notation. You're impression is right 3.4E38 means 3.4 * 10^38. See
https://en.wikipedia.org/wiki/Scientific_notation
All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/
C, C++ Compiler: Microsoft VS2017.2
Posting Permissions
 You may not post new threads
 You may not post replies
 You may not post attachments
 You may not edit your posts

Forum Rules

Click Here to Expand Forum to Full Width
This a Codeguru.com survey!
OnDemand Webinars (sponsored)
