double vs int64

**John E** · July 31st, 2013, 01:32 AM

You'd think this would be a simple thing to find out from the internet but I must admit, I've struggled to find the answer

1) What is the range of numbers covered by a 64-bit double?
2) Ignoring fractions, is the above range wider or narrower than the range covered by an int64?

**Igor Vartanov** · July 31st, 2013, 01:57 AM

Data Type Ranges

**2kaud** · July 31st, 2013, 02:28 AM

Also look at limits.h

**John E** · July 31st, 2013, 03:47 AM

Thanks guys. If I'm using my rusty old calculator correctly it looks like double has got a MUCH wider range than int64 - and even the humble float is almost comparable to int64. float seems to be roughly +/-1.1 x 10^17. int64 approx. +/-9.25 x 10^18. int32 is +/-2.1 x 10^9.

**Paul McKenzie** · July 31st, 2013, 04:40 AM

Originally Posted by John E

Thanks guys. If I'm using my rusty old calculator correctly it looks like double has got a MUCH wider range than int64 - and even the humble float is almost comparable to int64. float seems to be roughly +/-1.1 x 10^17. int64 approx. +/-9.25 x 10^18. int32 is +/-2.1 x 10^9.

But you realize that there is a big difference between using integral and floating point values, correct? That difference being accuracy.

Floating point variables are not exact (unless they are sums of inverse powers of 2). An int64 is always exact, since it is an integer. Calculations that require exact math cannot be done reliably using floats and doubles. So the reasons for using float/double versus int64 is much more than range.

For example for money calculations, it is advantageous to use integers representing the smallest unit of currency (example, for USA it would be cents instead of dollars). Then the int64 can be used to represent purely cents instead of a dollar.cents.

Regards,

Paul McKenzie

**John E** · July 31st, 2013, 05:24 AM

Hi Paul,

Yes, I understand about the inherent inaccuracies with float and double. Here's the problem I'm considering:-

Code:

void some_func(int64_t a, int64_t b)
{
      printf ("%u\n", abs( a-b ));
}

I'm working on a program (originally written for Linux) which consistently sends 64-bit values to abs(). That's just a simple example above. The actual functions are usually more convoluted. The problem is that VC++ doesn't seem to have a version of abs() that accepts int64_t. The only types available support float, double, int or long. I'm trying to figure out which type I should use so that I don't lose accuracy (or at least, I lose as little accuracy as possible).

**OReubens** · July 31st, 2013, 06:50 AM

an int64 has an effective accurate range from - 2⁶³ all the way to + 2⁶³-1

a double has an 53bit mantissa (with an implied leading 1) and it has a separate sign bit so it has an effective accurate integer range from - 2⁵⁴ all the way to + 2⁵⁴.
Or to put it another way, a double can accurately represent any value an int55 (assuming such a thing existed) can.

now, a double can store larger values (and it can store fractions), but none of those will guarantee accurate integer values not are they in a continuous range. or put another way, any other values not in the "int55" range will be approximations.

**OReubens** · July 31st, 2013, 06:54 AM

Originally Posted by John E

The problem is that VC++ doesn't seem to have a version of abs() that accepts int64_t.

make your own...

Code:

int64_t abs(int64_t val)
{
  if (val<0)
       return -val;
  else 
       return val;
}

depending on need, you may have to do somethign special in case val is -2⁶³ because that can't be represented in a positive int. A potential solution is returning an unsigned int64_t, but that may not fit your problem domain.

**OReubens** · July 31st, 2013, 07:03 AM

Originally Posted by John E

Thanks guys. If I'm using my rusty old calculator correctly it looks like double has got a MUCH wider range than int64 - and even the humble float is almost comparable to int64. float seems to be roughly +/-1.1 x 10^17. int64 approx. +/-9.25 x 10^18. int32 is +/-2.1 x 10^9.

Yes, it has a wider range, but that range isn't continuous. try storing 144.115.188.075.855.873 in a double, then reading it back out.

also note that most calculators don't work with a "double", but work with a floating point type that is larger than a double. So even your rusty old calculator probably exceeds the capabilities of a double.

**John E** · July 31st, 2013, 09:47 AM

Originally Posted by OReubens

make your own...

Code:

int64_t abs(int64_t val)
{
  if (val<0)
       return -val;
  else 
       return val;
}

Good suggestion, Thanks.

I also realised that for 64-bit values on Linux, they should really be calling llabs(), rather than abs(). A convenience macro can then be used to map llabs() to __abs64() which is the Windows equivalent.

**John E** · July 31st, 2013, 10:00 AM

Originally Posted by OReubens

try storing 144.115.188.075.855.873 in a double, then reading it back out.

Presumably I was supposed to remove all the periods?

Interestingly, the compiler told me the number would get truncated from int64 to double. But according to the debugger it looked lie the right number

**Paul McKenzie** · July 31st, 2013, 11:24 AM

Originally Posted by John E

Presumably I was supposed to remove all the periods?

The dot is the thousands separator in Belgium (where I presume ORueben is posting from).

Regards,

Paul McKenzie

**OReubens** · August 1st, 2013, 07:21 AM

yes, sorry about that.

thousand separator.

Also, a correction. I initially looked up the value of DBL_MANT_DIG to post the above, and DBL_MANT_DIG is defined as 53
I knew a double has an implied 1 in front, so I added this on, but DBL_MANT apparently already has it built in as well. (Doh!)

so change my above to:

a double has an 52bit mantissa (with an implied leading 1) and it has a separate sign bit so it has an effective accurate integer range from - 2⁵³ all the way to + 2⁵³.
Or to put it another way, a double can accurately represent any value an int54 (assuming such a thing existed) can.

Interestingly, the compiler told me the number would get truncated from int64 to double. But according to the debugger it looked lie the right number

well yes, storing it in a double right away as in
double x = 144115188075855873;
I would have expected the compiler to output a warning (which in and by itself should already have been a clue of it's own).

what you were getting is probably the compiler seeing it is a const and displaying the full const value without stuffing into an actual double.
any sort of "simple" code is probably going to need some form of "don't optimize this" to actuall proove the point I was trying to make.

Code:

	double x = 144115188075855873;
	__int64 i = (__int64)x;

Running this in a debug build or with all optimisations off results in i being equal to 144115188075855872 on VC2010. (and I would expect the same result on any compiler given how truncating/rounding should work.

**John E** · August 1st, 2013, 08:20 AM

Thanks again for that full explanation OReubens. I tried that assignment, like you suggested (double to int64_t) and you were absolutely right. The int64_t was 1 less than the original number.

Actually I think there's something else I haven't fully understood in all this (the meaning of the letter 'E'). Looking at that web page that Igor linked to, I noticed that type float can hold a maximum (positive) number of 3.4E38. I originally thought that 'E' meant 'e' the natural logarithm (i.e. 2.7182818). So I calculated 3.4E38 to mean:-

3.4 x E^38 - or in other words, ((2.7182818^38) * 3.4)

But my debugger suggests that my assumption was completely wrong! It gives the impression that 3.4E38 actually means (3.4 * 10^38)

How confusing...

**2kaud** · August 1st, 2013, 08:42 AM

E stands for exponent and is used for scientific notation. You're impression is right 3.4E38 means 3.4 * 10^38. See

https://en.wikipedia.org/wiki/Scientific_notation

Thread: double vs int64

Thread Tools

Display

double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Re: double vs int64

Posting Permissions