.NET Framework General: What is the Garbage Collection?
Q: What is the Garbage Collection?
The Garbage Collection in the Microsoft .NET Common Language Runtime completely absolves the developer from tracking memory usage and knowing when to free memory. The Common Language Runtime's Garbage Collector manages the allocation and release of memory for an application which is running in a managed environment (like the Common Language Runtime); the Garbage Collector attempts to reclaim the memory used by the objects that will never be accessed again by the application. The .NET Garbage Collector provides a high-speed allocation service with good use of memory and no long-term fragmentation problems.
Garbage Collector was invented by John McCarthy around 1959 to solve the problems of manual memory management in LISP Programming language. The languages that implement Garbage Collection are known as Garbage Collected Languages. Java, C#, & VB.NET are some examples of Garbage Collected Languages.
- Managed Heap & Resource Allocation
The .NET CLR requires that all resources be allocated from the managed heap. When a process is initialized the runtime reserves a contiguous region of address space that initially has no storage allocated for it, this address space region is known as managed heap. The heap also maintains a pointer which indicates where the next object will be allocated within the heap. At first, this pointer is set to the base address of the address space region.
When an object is created using a new operator, the new operator first makes sure that the memory required by the new object fits in the reserved address space. If the object fits the memory, the pointer is then incremented to point to the address space where the new object will be placed in the heap. Here one point should be noted that there are never any gaps between objects that are created on managed heap. This is because the pointer is always incremented with the size of the object that is created on the heap.
The managed heap is far superior to C-runtime heap and it gains this performance boost because it makes one really big assumption. The assumption it makes is that address space and storage is infinite (which is not possible in the real world scenario). Now, to overcome this assumption there should be a mechanism by which the managed heap can reclaim the memory already allocated to the objects that are no longer used in the managed environment. This mechanism is called Garbage Collection.
- Releasing/Reclaiming Memory
The Garbage Collector’s optimizing engine determines the best time to perform a collection based on the allocations being made. When the Garbage Collector performs a collection it releases the memory associated with the objects that are no longer used by the application.
When an application calls the 'new' operator to create an object, there may not be enough address space left in the region to allocate to the object. The heap detects this by adding the size of the new object to the pointer. If the pointer is beyond the end of the address space region, then the heap is full and a collection must be performed.
Every application has a set of roots. Roots identify the storage locations which refer to objects on the managed heap or to objects that are set to null. The global, static object pointers, any local variable/parameter object pointers on a thread's stack in an application and any CPU registers containing pointers to objects in the managed heap are considered part of the application's roots. The list of active roots is maintained by the Just In Time and Common Language Runtime, and is made available to the Garbage Collector’s algorithm.
When the Garbage Collector (GC) starts, it assumes that none of the application’s roots refer to any objects in the heap. The GC walks through the roots of the application and starts building a graph of all objects that are reachable from the roots. The GC walks through all the objects that are reachable from the root. Once all the roots have been checked, the GC’s graph will contain the objects that are still being used (that are somehow reachable from the roots) by the application. Any objects that are not part of the graph are considered garbage and the memory associated with them needs to be reclaimed. The GC then starts walking through heap linearly looking for contiguous blocks of memory that are used by objects that are considered Garbage. It then shifts the no-garbage objects down in memory removing all the gaps in the heap (In simpler terms the heap is compacted). As this process of moving objects in the address space invalidates the current pointers to the objects, the GC must update the application’s roots so that the pointers point to the objects’ new location. After all the objects that need to collected have been identified and the all non-garbage objects are compacted the pointer on the heap is positioned just after the last non-garbage object.
The GC in .NET Framework is typically a Generational Garbage Collector and works using generations. In a generational collection, the heap is divided into multiple generations. Objects are created in the base (youngest) generation, and are promoted to higher (older) generations after passing some form of criteria, usually related to the age of the object. Garbage collection can be done at different time intervals and even using different techniques based on the generation the object is in.
It is always faster to compact the portion of the managed heap rather than the whole heap itself. So to optimize the performance of the Garbage Collector, the managed heap is divided into generations. In .NET the managed heap is divided into three generations namely 0, 1 and 2. The Garbage Collector stores the newly created objects in Generation 0. Objects created early in the application's lifetime that survive collections are promoted and stored in generations 1 and 2.
In reality the collection occurs when generation 0 is completely full. After each collection the Garbage Collector promotes the non-garbage collected objects to generation 1 and continues to allocate memory for new objects in generation 0 until generation 0 is full. The garbage collector's optimizing engine determines whether it is necessary to examine the objects in older generations. If a collection of generation 0 does not reclaim enough memory for the application to successfully complete its attempt to create a new object, the garbage collector can perform a collection of generation 1, then generation 0. Objects in generation 1 that survive collections are promoted to generation 2. Garbage Collector supports only three generations, so the objects in generation 2 that are not collected remain in generation 2. The objects present in Generation 2 are always older than the objects present in Generation 1 and the objects present in the Generation 1 are older than objects present in Generation 0.
Owing to the reality that Garbage Collection is non-deterministic means that if you are typically used to de-allocating other system resources (file handles, database connections, etc) in the same block of code that is also used to release the memory of the object, then you will need to adapt a new way of coding where you would explicitly release all other system resources associated with an object and let the garbage collector release the memory.
Here are some of the links from MSDN that contain more information on Garbage Collection: