Click to See Complete Forum and Search --> : Strings in C#, UNicode or ANSI?


JonnyPoet
October 2nd, 2009, 04:19 PM
Hi Friends!

I have some very basic, but a bit theoretical question.
How is a string in C# stored in the memory. I know it is an immutable Reference Type. OK. But the chars for the immuteable string needs to be stored anywhere in the storage, where the reference points too. How does the compiler know where the string ends. Is it terminated with '\0' or is it as I have read an ANSI string and the length of the string is stored in an integer.

In another place of the web I have read that all strings in .net (and C# is a .net language) are automatically UTF8 Format ( Unicode) which would mean that the word 'Hallo' needs 10 Bytes in the memory.

So whats behind that, or where can I find references about that.
The reason is: I simple want t get a deeper understanding of whats going on behind the scene.
Thx in before

Arjay
October 2nd, 2009, 04:49 PM
They're unicode UTF-16.

http://msdn.microsoft.com/en-us/library/system.string.aspx

BigEd781
October 2nd, 2009, 05:13 PM
So, if we break out good ol' reflector we can gleam some insights here. String is defined in mscorlib, here are the private fields:


private const int alignConst = 3;
private const int charPtrAlignConst = 1;
public static readonly string Empty;

[NonSerialized]
private int m_arrayLength;

[NonSerialized]
private char m_firstChar;
[NonSerialized]

private int m_stringLength;

private const int TrimBoth = 2;
private const int TrimHead = 0;
private const int TrimTail = 1;
internal static readonly char[] WhitespaceChars;


So, it looks like the length of the string is held in an int, and the array size in a separate int. Unfortunately, as you would probably guess, the rest of the implementation is in not in a .NET assembly, so reflector does us no good with that.

JonnyPoet
October 2nd, 2009, 05:32 PM
So, if we break out good ol' reflector we can gleam some insights here. String is defined in mscorlib, here are the private fields:
....
So, it looks like the length of the string is held in an int, and the array size in a separate int. Unfortunately, as you would probably guess, the rest of the implementation is in not in a .NET assembly, so reflector does us no good with that.This is both very interesting Arjays MS Reference as well as this. So it seems that there could be a difference between what we are getting ny the Length Property of the string and which array it really consumes in the memory.

Thx both for info :wave: