Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by TheCPUWizard
100% true
I think you're wrong when you claim that the GC has to be managed by the programmer to be efficient. On the contrary I think the GC is most efficient when the programmer just relies on it and uses sound programming techniques.
But, as a general advice, don't finalize. This is guaranteed to be inefficient.
Also note that what the GC does is memory management. Resource management is your responsibility.
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by _uj
I think you're wrong when you claim that the GC has to be managed by the programmer to be efficient. On the contrary I think the GC is most efficient when the programmer just relies on it and uses sound programming techniques.
But, as a general advice, don't finalize. This is guaranteed to be inefficient.
Also note that what the GC does is memory management. Resource management is your responsibility.
My point is that "sound programming techniques" uin a GC'ed environment are radically different at fundamental levels than " sound programming techniques" in a programmatic memory management system.
Re: What is the major drawback of C# .NET?
Let me just wade in here as a guy who has done *extensive* optimisations regarding memory and cpu usage in C#.
As a general rule of thumb in managed languages, allocating on the heap is bad. Every object you allocate on the heap is memory that you cannot reclaim until a garbage collection occurs.
So you say "But what if i null out my references as soon as i'm done with them". That doesn't matter. You've still allocated the object and you've still created it on the heap.
Code:
SomeType *info;
info = new SomeType();
// do some things with info
delete info;
....
// do some things that do not use info
...
info = new SomeType();
// do some things with info (that to not require state of the info instance from above
...
delete info;
If you can't tell me why this code is *terrible* in most use cases, then you can't really talk about optimisation in a serious manner.
Let me talk about the general case. Suppose you do what this code suggests. You dump your reference as soon as you finish creating the object and then recreate a new one when you need it. Now, imagine that this bit of code is being called 100,000 times a second. Instead of only allocating 100,000 objects, you are now allocating 200,000 objects. You're now forcing your application to garbage collect twice as frequently.
Ok, so gen 0 collections are cheap, but they still take time.
Take this example as something i have hard tangible documented proof of:
Suppose you are writing a high performing socket based application. So, following the advice above, what you'd do is you'd instantiate a byte[] for each message you want to send and then null it out as soon as you're done with it.
Thats fine, the byte[] will only live for a very short time, a few ms at the most. Suppose the average message is 16kB in size and you have 20 clients communicating at 80kB/sec each, that's 100 byte arrays being allocated each second. Thats 1600kB a second. Thats a lot of memory.
So, you decide to optimise things properly. You implement a buffer pool so instead of nulling out the references, you instead put the byte[] in a pool where you can retrieve it the next time you need one. As a result, you have 100 buffers allocated which are either in use or in the pool. You have no allocations going on every second.
What effect does this have on memory usage? Amazingly enough, you'll find a reduction of over 10% in memory usage. Why? Because you are not allocating useless crap over and over again.
EDIT:
Code:
class MyBadClass
{
public MyBadClass()
{
for (int i =1; i<2000; ++i)
m_MyList.Add(new char[7500]);
}
private
ArrayList m_MyList = new ArrayList;
}
There's nothing inherently wrong with that class. As far as i can see, it's perfect. That bears a striking resemblance to my BufferManager class actually. The big difference is that i store about 1/7th of that, a mere 2 megabytes of byte[] i'm afraid. Amazingly enough, it improves my performance too.
The problem is not that you are storing char[], it's that you may be *needlessly* storing char[]. However that's not a fault of C# and the GC (or any other managed language), thats a fault of a stupid programmer. If that code was in C you'd have the exact same problem.
EDIT: Accidentally wrote Gen1 collections are cheap instead of Gen0
Re: What is the major drawback of C# .NET?
Mutant_Fruit,
Not doubting your experience in this area [if you updated you profile to allow PM's some of this could be covered in more detail privately, then broght back to the list, but you have not chosen to do this...] but having worked almost exclusively with managed code for 5+ years, having been a MSFT employee (with direct access to the code base and the developers), I think I can "hold my own".
Quote:
Ok, so gen 1 collections are cheap, but they still take time.
Yes this is true, but I am talking about GEN0 collections. Objects which do not have any rooted references at time of GEN0 have NO incremental cost [meaning the time is exactly the same to clean up 1 object as it is to clean up 2,000,000,000!]
Quote:
As far as i can see, it's perfect. That bears a striking resemblance to my BufferManager class actually. The big difference is that i store about 1/7th of that, a mere 2 megabytes of byte[] i'm afraid. Amazingly enough, it improves my performance too.
Based on your statement, it is NOTHING like your class. Your class will use the LOH where fragmentation can (and often does) become a big procblem. It is the number 1 reason for having to periodically re-cycle service (COM++, IIS, etc) programs. My sizes were explicitly chosen to avoid the LOH, but I guess your "experience" did not make that immediately obvious.
Quote:
Thats 1600kB a second. Thats a lot of memory.
I dont think that is much at all. If that was the only thing going on, and your program otherwised used 1/2 of the default process space for IIS, then this equates to a GC every 4.266 MINUTES. Assuming you have everything else stable (ie no GEN0 objects being promoted), you are about a 0.0390625% load being placed on the application by the GC....
Quote:
You're now forcing your application to garbage collect twice as frequently.
Frequent GEN0 collections with little or no information being promoted (after application stabilization) has never been a performance problem in any of the .Net applications I have worked on. Many of them involving LOB applications in the financial, insurance, medical verticals....
Re: What is the major drawback of C# .NET?
Quote:
Yes this is true, but I am talking about GEN0 collections.
Aye, my mistake. I meant to write Gen0. Those are cheap. If you follow the Allocate -> Null it -> Allocate it again pattern you are going to increase the rate at which Gen0's happen. Then, as a direct result of this you increase the likelihood of a Gen1/2 as you are inducing more frequent GC's which could easily make the GC decide that it's time to promote objects (but that depends on how your other code i running).
Quote:
Based on your statement, it is NOTHING like your class. Your class will use the LOH where fragmentation can (and often does) become a big procblem......My sizes were explicitly chosen to avoid the LOH, but I guess your "experience" did not make that immediately obvious.
If i remember correctly, 16kB is less than 85kB, which means my class doesn't use the LOH, exactly the same way as yours doesn't. In fact, i can also prove it with profiling graphs if you so wish.
Quote:
Assuming you have everything else stable (ie no GEN0 objects being promoted), you are about a 0.0390625% load being placed on the application by the GC....
That's not the point i was making. The point i was making is that (in certain circumstances) when you remove ongoing allocations, you reduce the working set of your application. In those same circumstances if you used the Allocate -> Null -> Allocate again pattern, you'd end up increasing your working set.
Quote:
Frequent GEN0 collections with little or no information being promoted (after application stabilization) has never been a performance problem
Thats the phrase word right there. That can be a hard thing to guarantee, especially as the application codebase gets larger, especially when it comes to GUI based apps. I fully agree though Gen 0 is cheap, provided you are't bumping objects to Gen1/2. The mid-life crisis situation is a perfect example of when you are allocating too many temporary objects and are artificially promoting objects which should've died in Gen0.
Anyway, the point i'm making is this: Don't try to optimise your allocations by nulling them out unless you have a *very* good reason to do so. The only good reason is that you expect your method to take a considerable amount of time to run and you only need your really large object for the first 3 lines of code.
The only way to truly find out if you need to optimise would be if you profiled your application extensively and noticed that you were allocating X amount of ClassA and you want to reduce that, or that Object Y kept being promoted to Gen1 where it would then die.
EDIT: The Gen0 size is about 2megabytes. Therefore if i were to allocate 1600kB a second from just byte[], i'd be inducing a Gen0 collection every second (nearly). This significantly increases the chances of objects being promoted to Gen 1 as compared to the case where i do not allocate 1600kB a second for buffers.
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by TheCPUWizard
My point is that "sound programming techniques" uin a GC'ed environment are radically different at fundamental levels than " sound programming techniques" in a programmatic memory management system.
What I feel you have suggested so far is that one should second-guess the GC. One should base one's code on what one think gives the most favourable GC treatment.
Is this what you propose?
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by _uj
What I feel you have suggested so far is that one should second-guess the GC. One should base one's code on what one think gives the most favourable GC treatment.
Is this what you propose?
NOt "second guessing" but rather a through understanding of what CG does and does not do. So many times I get asked "I just "released" a bunch of memory, but it still shows my program as very large. People who make statements like this are mis-understanding what the GC is and does.
Also one needs to look carefully at object lifetime. The "mid-life crisis" has already been mentioned in this thread. One should weigh the difference between create/use/destroy and create-use-keep-resuse(...)-destroy design patterns. If state is not being maintained between the uses, then what are the initialization costs?
If you are doing a transactional (in the most general sense) do you group your allocations for objects with longer scopes close to the beginning of the transaction processing phase? If you are allocating a number of objects of radically different sizes/complexity do you order them properly?
Since this is such a big topic, I am actually putting together a complete arcticle on it (will be a few weeks though). In there I have a stress test, that continually allocates 50Kb objects and holds no references to them. I end up with about 900 GEN0 collections per second, but it still is only taking 5-7% of the CPU. This indicates that the GEN0 collection is taking somewhere on the order of 100 microseconds. Granted this is an extreme case (ideal in some respects, worst case in others), but it forms the basis for some very interesting experiments with more complex scenarios.
Re: What is the major drawback of C# .NET?
Sorry, off topic. But if you dont mind sharing, where are you employed cpu wizard? You know what you're talking about and was just curious if anyone was taking advantage of that :)
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by TheCPUWizard
Since this is such a big topic, I am actually putting together a complete arcticle on it (will be a few weeks though).
This is indeed a *huge* topic in which there are *no* hard and fast rules. My advice of reusing byte[] as opposed to constantly reallocating is only useful in certain circumstances, such as programs making heavy use of sockets (more than a dozen send/receives a second).
Implementing the same logic in a program which barely uses sockets could result in the byte[] needlessly being retained in memory when it really should be GCed.
Your pattern of
Allocate
Null out
Reallocate
Null out again
Reallocate again
may work in certain circumstances (i have no examples though) but in the general case it may just serve to promote objects to a higher generation needlessly.
The only hard and fast rule i can think of is this: Don't second guess the garbage collector until you have hard evidence that a certain section of code is responsible for a significant proportion of your allocations. That means profiling your code.
Reading a "rule" from some guy on the internet who says keeping byte[] in a BufferManager class is a great idea and then going ahead and doing that is just stupid, unless you are suffering the exact same problem that guy had.
EDIT: If you were just allocating 50kB segments constantly, you should lodge at 100% cpu usage. It's a bit weird that you're stuck at less than 10%. What OS/.NET version are you on?
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by mariocatch
Sorry, off topic. But if you dont mind sharing, where are you employed cpu wizard? You know what you're talking about and was just curious if anyone was taking advantage of that :)
Mario, this is what private messages are for, please go back and re-read the FAQ's so you know how to enable them.
The answer to your question is in my signature....
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by Mutant_Fruit
This is indeed a *huge* topic in which there are *no* hard and fast rules. My advice of reusing byte[] as opposed to constantly reallocating is only useful in certain circumstances, such as programs making heavy use of sockets (more than a dozen send/receives a second).
Implementing the same logic in a program which barely uses sockets could result in the byte[] needlessly being retained in memory when it really should be GCed.
Your pattern of
Allocate
Null out
Reallocate
Null out again
Reallocate again
may work in certain circumstances (i have no examples though) but in the general case it may just serve to promote objects to a higher generation needlessly.
The only hard and fast rule i can think of is this: Don't second guess the garbage collector until you have hard evidence that a certain section of code is responsible for a significant proportion of your allocations. That means profiling your code.
Reading a "rule" from some guy on the internet who says keeping byte[] in a BufferManager class is a great idea and then going ahead and doing that is just stupid, unless you are suffering the exact same problem that guy had.
EDIT: If you were just allocating 50kB segments constantly, you should lodge at 100% cpu usage. It's a bit weird that you're stuck at less than 10%. What OS/.NET version are you on?
Mutant_Fruit,
I completely agree with you in principal, Especially profiling code. When ever I am running and code during the development phase, I have at least Perfmon running on a second monitor showing me what is happening. I am consistantly amazed at the number of people writing code who have no idea how to use perfom, or even what it is. For serious metrics, I prefer the "Ants profiler".
Regarding the "constant allocation", yes the process is pegged at nearly 100% of CPU (measurement aliasing and other items cause it to sometimes only register in the very high 90's). What I was refering to was the GC taking 5-7% of the time [I will go back and edit to make clearer], and this will 900 GC/s per second.
Re: What is the major drawback of C# .NET?
Emphasis added
Quote:
Implementing the same logic in a program which barely uses sockets could result in the byte[] needlessly being retained in memory when it really should be GCed.
Your pattern of
Allocate
Null out
Reallocate
Null out again
Reallocate again
may work in certain circumstances (i have no examples though) but in the general case it may just serve to promote objects to a higher generation needlessly.
I do not see how this is possible unless there are other objects being allocated that have a longer lifecycle and thus have rooted references at the point where the allocations in question are.
The basis of the GC is that "Newer Objects have shorter lifespans". This (oversimplified) constructs like the following defeat that basis (AGAIN OVERSIMPLIFIED)
Code:
ClassA instanceA = new ClassA;
...
ClassB instanceB = new ClassB;
...
instanceA = null; // or instanceA goes out of scope
... // KeyPoint
instanceB = null; // or instanceB goes out of scope
In this case instanceB is a "newer object", but it has a longer scope than instanceA. Therefor if a GC occurs during KeyPoint, then instanceB will be promoted to GEN1.
I minor reorganization may be possible where this situation can be avoided.
In my previous response I mentioned perfmon. For my applications I have created 4 custom counters:
- TransactionIdle
- TransactionActivate
- TransactionExecute
- TransactionDeactivate
These counters are appropriately maintained by my operational classes. It give a good set of performance (and scalability) metreis not related to GC, but also provides insight into the applications use of memory.
In a single-threaded (or at least single active Transaction) scenario, only GEN0 collections should occur during the Idle, Execute, and Deactivate phases. If (significant) GEN1 collections (or worse and GEN2 collections) occur during these phases, I can often track it back to a correctable mid-life crisis.
If GEN2 collections occur once the program is past the startup phase (granted some will in larger applications, but even then the count should be extremely low compared to GEN1 (IMHO at least 1-2 orders of magnitude), then I look again for root causes.
So my approach is definately profile/measurement based when it comes to addressing GC issues. The flip side, is that having spend significant time programming (conseratively 12,000 hours) in a .Net environment, I am aware of patterns that tend to lead to problems, and patterns that tend to work well, as well as the use cases (especially with my overall architectural style).
But as you said, it is impossible to crreate a set of hard and fast general rules that apply to all circumstances...
Re: What is the major drawback of C# .NET?
Quote:
Originally Posted by TheCPUWizard
Code:
ClassA instanceA = new ClassA;
...
ClassB instanceB = new ClassB;
...
instanceA = null; // or instanceA goes out of scope
... // KeyPoint
instanceB = null; // or instanceB goes out of scope
What about all the other objects you have? Suppose 1/4 of your allocations are from the Allocate/Null/Reallocate pattern. That means you are going to induce Gen0 collections 25% faster as you have 25% more allocations. This means that you will bump up objects to Gen1 which would normally have fallen out of scope by the time a Gen0 would have been performed. Gen1's are more expensive than Gen0, so you'd actually reduce performance.
Secondly, 95% of objects in your typical application are small, and methods complete quickly so you'll drop all method level references very quickly whether or not you null them out.
The final thing i have to say is that GC's only occur when you allocate a new object and don't have space or you call GC.Collect(). As a result, you're more likely to have a GC on the call to Classb obj = new ClassB(); then at the point where you null out the reference.
Different applications have wildly different allocation patterns, which need different techniques for maximizing performance. But in 95% of applications i'd say you won't need specialised logic to solve GC issues. Those 5% of cases would primarily be server applications dealing with large amounts of traffic.
Re: What is the major drawback of C# .NET?
Taking the points out of order....
Quote:
Originally Posted by Mutant_Fruit
Different applications have wildly different allocation patterns, which need different techniques for maximizing performance. But in 95% of applications i'd say you won't need specialised logic to solve GC issues. Those 5% of cases would primarily be server applications dealing with large amounts of traffic.
I agree with this. And expecially in the case of applications where you want continous uptime on the scale of weeks/months/years (implying very careful attention to the LOH). About 80% of the applications I am involved with (the kind that warrant the services of an outside consultant) fall into this category. I rarely see codebases of under 100,000 lines
FYI: the project I am involved in right now has 2500+ classes in 175+ assemblies totalling over 4 million lines of code). The highest throughput application was an online betting system which had upwards of 2 million hits per minute at peak (immediately before the start of a large interntional sporting event). Granted that application was on a huge farm....
If you are only going to be writing small applications, then yes you can avoid this issue 99% (giving it a higer number than even you ;) ] of the time. However if you are involved in all types of projects, then the habits and design patterns that can make or break a large application tend to filter down to the smaller ones. Although the benefits may not be larg or even significant, there is no downside.
Quote:
What about all the other objects you have? Suppose 1/4 of your allocations are from the Allocate/Null/Reallocate pattern. That means you are going to induce Gen0 collections 25% faster as you have 25% more allocations. This means that you will bump up objects to Gen1 which would normally have fallen out of scope by the time a Gen0 would have been performed. Gen1's are more expensive than Gen0, so you'd actually reduce performance.
If you are following the patterns I mentioned earlier (Last alloc=shorted life) then there will be one set of objects which gt promoted from GEN0 to GEN1, wether this happens on the first, second, or thousandth new allocation really does not matter.
Additionally the promotion of objects to GEN1 causes three potential types of impact (in increasing order of severity):
- The cost of moving the object itself.
- The cost of updating references to the moved object
- The probability that this will cause a GEN1 collection.
I have never seen the first point cause a measurable impact. The second point typically occurs when developers design a class which will typically be used in a collection, and put references to the collection into the object itself...
Quote:
Secondly, 95% of objects in your typical application are small, and methods complete quickly so you'll drop all method level references very quickly whether or not you null them out.
I dont know if I agree with this assesment. For most operations there is at least one method which has a duration as long as the overall transaction. What matters (in my mind) the most, is how many methods are invoked from within the bethod your are analyzing. Their size and complexity directly relates to the probability that a GC will occur while your method is action.
Quote:
The final thing i have to say is that GC's only occur when you allocate a new object and don't have space or you call GC.Collect(). As a result, you're more likely to have a GC on the call to Classb obj = new ClassB(); then at the point where you null out the reference.
Agreed. (With the cavaet that GC.Collect() should never be called except in some very rare and specialized circumstances. My point was that if additional code was called which did significant allocations at the point I labeled, then instanceB would be promoted. If there was no need for continuance of state for instanceB, then releasing the reference prior to calling the code would prevent collection in the event that the called code did an allocation which triggered a GEN0 GC.