Click to See Complete Forum and Search --> : sizeof(char)


dude_1967
October 17th, 2002, 03:15 PM
Gurus,

Is sizeof(char) == 1 specified in ANSI C/C++ or can there be compiler-dependent aspects?

Chris.



char* pc = reinterpret_cast<char*>(&mem_area);

pc += 128;




:)

Andreas Masur
October 17th, 2002, 03:40 PM
They are not specified by ANSI. The size of the standard datatypes are dependent on the specific machine. Most-likely a 'char' will be always one byte nowadays but it is not guaranteed...

BaroloMan
October 17th, 2002, 03:45 PM
sizeof (char) = 1. I've used DSP processors (TMS320C44 et. al) that are strict 32-bit engines (no byte selects in the hardware). For these processors, the sizeof (char) = sizeof (int) = sizeof (long) = 1, because all are implemented in 32-bits.

We got burned trying to port some algorithms from a DOS (Intel) application because the memcpy routines that were used assumed that a "long" was four times greater than a "char". We rewrote these routines to observe the ratio of

sizeof (long) / sizeof (char)

jfaust
October 17th, 2002, 03:53 PM
Bjarne Stroustrup, The C++ Programming Language, section 4.6:
Sizes of C++ objects are expressed in terms of multiples of the size of a char, so by definition, the size of a char is 1.


Jeff

Andreas Masur
October 17th, 2002, 04:22 PM
Originally posted by jfaust
Bjarne Stroustrup, The C++ Programming Language, section 4.6:
Sizes of C++ objects are expressed in terms of multiples of the size of a char, so by definition, the size of a char is 1.
Jeff
Well...yes and no. In the same chapter some sentences later...

"Additionally it will be guaranteed that a 'char' has at least 8 bit, a 'short' at least 16 bit..."

There exists also machines where a 'char' consists of 32 bytes, as he also mentions later...

jfaust
October 17th, 2002, 04:31 PM
A char can be of different sizes, as far as bits are concerned, but in all cases, sizeof(char) will equal 1. This may be what you were saying, in which case I'm merely expanding on your point.

Jeff

stober
October 17th, 2002, 06:39 PM
Originally posted by BaroloMan
For these processors, the sizeof (char) = sizeof (int) = sizeof (long) = 1, because all are implemented in 32-bits.

That is interesting -- It that correct -- for a 32-bit char, the sizeof operator returns 1?? I'll bet that would break a whole lot of programs written for PC and ported to that operating system.

jfaust
October 17th, 2002, 06:45 PM
The size of a 32-bit char is one. There's nothing else to return.

Some more from Stroustrup:


This is what is guaranteed about sizes of fundamental types:
1 = sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
1 <= sizeof(bool) <= sizeof(long)
sizeof(char) <= sizeof(wchar_t) <= sizeof(long)
sizeof(float) <= sizeof(double) <= sizeof(long double)
sizeof(N) = sizeof(signed N) = sizeof(unsigned N)


So, having char int and long all the same size is perfectly legal. Any code that breaks due to this is in error, since it is not standard compliant.

Jeff

Andreas Masur
October 17th, 2002, 11:49 PM
Originally posted by jfaust
This may be what you were saying, in which case I'm merely expanding on your point.

Jeff
Errrm....yes that is what I wanted to say basically. I mixed up a little bit between the actual size and what 'sizeof()' is supposed to return...it looks like I should stop writing posts after 10 pm.... :cool:

Thank you for paying attention...

stober
October 18th, 2002, 04:35 AM
Doesn't that make it very difficult, if not impossible, to transfer data from one OS to another? If, in a socket program, the os on one end sends an 8-bit character how would that be received by the program running on the other end that expects a 32-bit character? And how about data files, are they affected by this too?

BaroloMan
October 18th, 2002, 06:42 AM
To stober,

The sizeof operator deals with the storage size for a given type. When one operating system sends a packet to another operating system, each data characters is defined as eight-bits, according to the protocol specifications. It just turns out that the system that declares a char to be 32-bits "wastes" 24-bits of storage for each character it internally manages, because the compiler design properly determined that processor utilization was more important memory utilization.

dude_1967
October 18th, 2002, 07:36 AM
Gurus,

I studied up on this one a bit.

ISO/IEC 9899 "Programming Languages -- C" specifies that that the minimum size of char is 8 bits. The internal storage size of a character may be larger.

I have concluded that sizeof(char) is compiler-dependent. Fortunately, I think most common compilers for most common platforms store char's in single bytes.

The interesting topics relating to communications software must certainly be handled within the protocol definitinos and implementations (interesting comments from BaroloMan, stober).

The sizeof(char) topic always seems to arise when one wants to access the single bytes of some larger data type as individual, adjacent characters. Look at the following code sequence it would seem that the 4 bytes of the float will be properly stored in the character array. However, this code is not proper since it's proper function hinges on the assumption that characters are 1 byte in size.

I have never found a platform-independent implementation for this kind of function. One could static_cast the address to a DOWRD beforehand, then increment by one, etc. However, this relies on the fact that addresses are less than or equal to 32-bit. I think this is one of the several dark-alleys of C/C++.

Chris.



int main(int argc, char* argv[])
{
float the_float = static_cast<float>(1.23456789);

char float_data[4] =
{
*(reinterpret_cast<char*>(&the_float) + 0),
*(reinterpret_cast<char*>(&the_float) + 1),
*(reinterpret_cast<char*>(&the_float) + 2),
*(reinterpret_cast<char*>(&the_float) + 3)
};

return 1;
}

BaroloMan
October 18th, 2002, 09:33 AM
dude_1967:

The only correct conclusion is sizeof(char) = 1. Please review all comments by jfaust.

The confusion may stem from the fact that the overwhelming majority of processors support byte accesses of memory, so the overwhelming majority of compilers implement one char per byte. However the sizeof operator does not literally refer to the number of physical bytes.

If a compiler does not evaluate sizeof(char) to 1, it is non-compliant.

About your example of looking at components of a float via multiple chars, lets talk about little endian/big endian :)