Every java program runs as a process and is allocated some memory. The memory allocated to the java process is called process heap.
The java heap i.e. the java object heap, is part of the process heap. The java object heap (also called just java heap) store the java objects i.e. object instances. The java heap can be configured using the several command line parameters.
Garbage collection is the mechanism used by java to reclaim or recollect the space occupied by objects that are not reachable from the program. This is necessary to ensure that the application does not run out of memory.
An object that cannot be reached from any reference within the running program is eligible for garbage collection.
The simplest form of GC iterates through all the references within the running program and marks all the objects that are reachable via those references directly or indirectly. At the end of this process, any object that is not marked, is considered garbage and becomes eligible for collection.
It’s obvious, that in case of this particular algorithm, the duration of the GC is relative of the number of objects referenced by the program.
In most applications, a larger number of objects are short-lived. i.e. there are relatively few objects that are created when the application is initialized and live until the end; which means, it makes sense for the GC to focus on the objects that are short-lived.
To leverage this common property noticed in most applications, JVM manages the memory in generations i.e. it divides the object memory into groups depending on the age of the objects i.e. short-lived objects would stay in the young-generation and longer-living objects would be moved to the tenured-generation. New objects are allocated in the young-generation and longer-living objects that have survived a good number of collections are then moved to the tenured (older) generation.
There is also a permanent-generation where metadata for classes and methods are stored. The default permanent generation is generally enough, but applications that need to load a large number of classes may want to increase the perm-gen size. This can be done using the XX:MaxPermSize option.
But how does this grouping help?
As we have already seen, the amount of time spent in GC depends on the number of live objects. Since most objects in the young-generation would be short-lived (die quickly after their creation), running a GC on this area should not take very long. There would be only a few live objects that would need to be copied, and hence the GC pause would be significantly small. Also, the young-generation should be optimally sized which would deter the need to garbage collect the objects too soon. If the young-generation is too small, the GC would have to be called too soon, which means a lot of objects in the area would still be alive. The purpose would be defeated. Hence, the young-generation should be big enough to leverage the infant mortality (i.e. most objects will die soon – but not too soon). The young-generation should be around 1/3rd of the total heap size.
The young-generation is divided into an area called ‘eden’ where new allocations are made initially and 2 ‘survivor spaces’ which play the part of a copying collector. When the ‘eden’ becomes full, a minor collection takes place.
The minor collection collects the dead objects from eden and any live objects are copied from the eden into one of the ‘survivor space’. If a minor collection runs again, and the eden is again found full, the live objects are copied from eden and the 1st survivor space to the 2nd survivor space, so on and so forth.
Thus every minor collection will copy live object from Eden and one of the survivor spaces into the second survivor space. Also, any live objects that have survived a certain number of collections are promoted to the tenured generation.
Note: In case of such generational collection, the collector does not visit the tenured generation objects referred by the young generation objects. There are ways to track inter-generational references (out-of-scope for the purpose of this article)
If the minor collection is not able to reclaim significant amount of memory, a major collection takes place on higher generations. Such collections collect both the generations i.e. the young as well as the tenured.
Note: When we call System.gc(), a major collection is called through. Since major collections take a longer time to run, we should be very careful about using System.gc() since it could have a performance impact.
There are a couple of GC strategies that you can choose from to improve the performance/throughput.
1. Serial collector: What we discussed until now is the serial way of collection also known as ‘Stop-the-world’ collection. As the name suggests, program execution is halted when this kind of collector runs. Advantage is that it guarantees that no new object is allocated and object reachability does not change until the collection finishes. Disadvantage is that program execution is paused. This collection strategy is simplest and least complex. This strategy is useful for applications that are not interactive.
- -XX:NewRatio: ratio of the sizes of the young and old generations.
- -XX:NewSize: size of the young generation in bytes
- -XX:SurvivorRatio: ratio of the sizes of the eden and the survivor spaces.
- -XX:GCTimeRatio: specifies a throughput goal. This option specifies the percentage of time to be spent in program execution instead of garbage collection.
- -XX:MaxGCPauseMillis: specifies a max pause goal. Specifies the maximum time the program execution can be halted to allow GC to finish in milliseconds.
- -XX:+DisableExplicitGC: If this option is enabled, any call to System.gc() would not result into a garbage collection cycle.