JVM Series - Garbage Collection in JVM#
Content from:
1. Determining Garbage Collection#
Types of References#
-
Strong Reference: As long as a strong reference exists, the garbage collector will never reclaim the referenced object. -
Soft Reference: When the JVM believes memory is insufficient, it will reclaim the object pointed to by the soft reference before throwing an OutOfMemoryError. -
Weak Reference: Whenever the JVM performs garbage collection, regardless of whether memory is sufficient, it will reclaim the object associated with the weak reference. -
Phantom Reference: Also known as ghost reference or phantom reference, it is the weakest type of reference. The existence of a phantom reference does not affect the lifetime of an object. It merely provides a mechanism to perform certain actions after the object has been finalized, such as typically used for a so-called post-mortem cleanup mechanism.
Determining Object Reachability#
If an object is not referenced by any other object or variable, it is considered an invalid object and needs to be reclaimed.
Reference Counting Method#
A counter is maintained in the object header; each time the object is referenced, the counter is incremented by 1; if the reference becomes invalid, the counter is decremented by 1. When the counter reaches 0, the object is considered invalid.
The implementation of the reference counting algorithm is simple, and its determination efficiency is high, making it a good algorithm in most cases. However, mainstream Java virtual machines do not use the reference counting algorithm to manage memory, mainly because it is difficult to solve the problem of circular references between objects:
For example, if object objA and objB both have a field instance, setting objA.instance = objB and objB.instance = objA means they reference each other, resulting in their reference counts not being 0, and thus the reference counting algorithm cannot notify the GC collector to reclaim them.
Reachability Analysis Method#
All objects directly or indirectly associated with GC Roots are valid objects, while objects not associated with GC Roots are invalid objects.
GC Roots refers to:
- Objects referenced in the Java Virtual Machine stack (local variable table in stack frames)
- Objects referenced in the native method stack
- Objects referenced by constants in the method area
- Objects referenced by static properties of classes in the method area
GC Roots do not include objects referenced by objects in the heap, thus avoiding the problem of circular references.
Reclaiming Invalid Objects in the Heap#
For objects that are unreachable in reachability analysis, it does not mean they cannot be alive.
Determining Whether finalize() Needs to Be Executed#
The JVM will determine whether it is necessary to execute the finalize() method for this object. If the object does not override the finalize() method or if the finalize() method has already been called by the virtual machine, it is considered "not necessary to execute." Thus, the object is essentially reclaimed.
If the object is determined to need to execute the finalize() method, it will be placed in an F-Queue queue, and the virtual machine will execute these finalize() methods with lower priority, but it will not ensure that all finalize() methods will complete execution. If the finalize() method has time-consuming operations, the virtual machine will directly stop pointing to that method and clear the object.
Object Resurrection or Death#
If during the execution of the finalize() method, this is assigned to a certain reference, then the object is resurrected. If not, it will be cleared by the garbage collector.
Note:
The finalize() method of any object will only be automatically called once by the system. If the object faces the next collection, its finalize() method will not be executed again, making any attempt to save itself in finalize() ineffective.
Reclaiming Memory in the Method Area#
The method area stores class information, constants, and static variables with longer lifetimes, and only a small amount of garbage is cleared during each garbage collection. The method area mainly clears two types of garbage:
- Abandoned constants
- Useless classes
Determining Abandoned Constants#
As long as the constants in the constant pool are not referenced by any variables or objects, these constants will be cleared. For example, if a string "bingo" enters the constant pool, but currently there is no String object referencing the "bingo" constant in the constant pool, and no other place references this literal, if necessary, the "bingo" constant will be cleaned out of the constant pool.
Determining Useless Classes#
Determining whether a class is a "useless class" has stringent conditions.
- All objects of the class have been cleared
- The ClassLoader that loaded the class has been reclaimed
- The java.lang.Class object of the class is not referenced anywhere, and the class's methods cannot be accessed through reflection anywhere.
When a class is loaded into the method area by the virtual machine, an object representing that class will exist in the heap. This object is created when the class is loaded into the method area and is cleared when the class is deleted from the method area.
Memory Leak#
In Java, if a long-lived object holds a reference to a short-lived object, it is very likely to cause a memory leak. For example, in the singleton pattern, we can often consider the lifecycle of this singleton object to be roughly the same as that of the entire program, making it a long-lived object. If this object holds references to other objects, a memory leak may occur.
For more detailed content, please refer to: Detailed Explanation of JAVA Memory Leak (Causes, Examples, and Solutions)
2. Garbage Collection Algorithms#
After learning how to determine invalid objects, useless classes, and abandoned constants, the remaining task is to reclaim this garbage. Common garbage collection algorithms include the following:
Mark-and-Sweep Algorithm#
Determine which data needs to be cleared, mark them, and then clear the marked data.
This method has two disadvantages:
- Efficiency issue: The efficiency of both marking and clearing processes is not high.
- Space issue: Mark-and-sweep will produce a large amount of discontinuous memory fragmentation, and too much fragmentation may lead to the inability to find enough contiguous memory for allocating larger objects in the future, necessitating an early trigger of another garbage collection action.
Copying Algorithm (Young Generation)#
To solve the efficiency problem, the "copying" collection algorithm was introduced. It divides the available memory into two equal-sized blocks, using only one block at a time. When this block of memory is used up and garbage collection is needed, the surviving objects are copied to the other block, and then the first block of memory is completely cleared. This algorithm has its pros and cons:
- Advantages: There will be no memory fragmentation issues.
- Disadvantages: Memory is reduced to half of the original, wasting space.
To solve the space utilization problem, memory can be divided into three blocks: Eden, From Survivor, and To Survivor, with a ratio of 8:1:1, using Eden and one of the Survivor blocks each time. During collection, the surviving objects in Eden and Survivor are copied all at once to the other Survivor space, and finally, Eden and the recently used Survivor space are cleaned up. This way, only 10% of the memory is wasted.
However, we cannot guarantee that no more than 10% of the objects will survive during each collection. When the Survivor space is insufficient, it needs to rely on other memory (referring to the old generation) for allocation guarantees.
Allocation Guarantee#
By clearing abandoned data in the old generation to expand the free space in the old generation, it provides guarantees for the young generation.
Before JDK 6 Update 24:
Before a Minor GC occurs, the virtual machine will first check whether the maximum available contiguous space in the old generation is greater than the total space of all objects in the young generation. If this condition is met, Minor GC can ensure safety; if not, the virtual machine will check whether the HandlePromotionFailure value is set to allow guarantee failures. If so, it will continue to check whether the maximum available contiguous space in the old generation is greater than the average size of objects promoted to the old generation in previous collections. If greater, it will attempt a Minor GC, even though this Minor GC is risky; if less, or if HandlePromotionFailure is set to not allow risks, then a Full GC will be performed.
After JDK 6 Update 24:
As long as the contiguous space in the old generation is greater than the total size of young generation objects or the average size of previous promotions, a Minor GC will be performed; otherwise, a Full GC will be performed.
This process is the allocation guarantee.
Mark-and-Compact Algorithm (Old Generation)#
Before reclaiming garbage, first mark the abandoned objects, then move the unmarked objects to one side, and finally clear the other side.
This is a garbage collection algorithm for the old generation. Objects in the old generation generally have longer lifetimes, so there will be a large number of surviving objects during each garbage collection. If the copying algorithm is used, a large number of surviving objects need to be copied each time, which is inefficient.
Generational Collection Algorithm#
Based on the different lifetimes of objects, memory is divided into several blocks. Generally, the Java heap is divided into young generation and old generation, and the most appropriate collection algorithm is used for each generation's characteristics.
- Young Generation: Copying Algorithm
- Old Generation: Mark-and-Sweep Algorithm, Mark-and-Compact Algorithm
3. Garbage Collectors#

Young Generation Garbage Collectors#
Serial Garbage Collector (Single-threaded)#
Only one GC thread is opened for garbage collection, and all user threads are stopped during the garbage collection process (Stop The World).
Generally, client applications require less memory, do not create too many objects, and the heap memory is not large, so the garbage collector's collection time is short. Even during this time, stopping all user threads does not feel significantly laggy. Therefore, the Serial garbage collector is suitable for client use.
Since the Serial collector only uses one GC thread, it avoids the overhead of thread switching, making it simple and efficient.
ParNew Garbage Collector (Multi-threaded)#
ParNew is the multi-threaded version of Serial. Multiple GC threads perform garbage cleanup in parallel. However, the cleanup process still requires Stop The World.
ParNew pursues "low pause time," and the only difference from Serial is that it uses multiple threads for garbage collection. In a multi-CPU environment, its performance is somewhat improved compared to Serial; however, thread switching incurs additional overhead, so it performs worse than Serial in a single CPU environment.
Parallel Scavenge Garbage Collector (Multi-threaded)#
Parallel Scavenge, like ParNew, is also a multi-threaded young generation garbage collector. However, there are significant differences between the two:
- Parallel Scavenge: Pursues CPU throughput and can complete specified tasks in a shorter time, making it suitable for non-interactive background calculations.
- ParNew: Aims to reduce user pause time, making it suitable for interactive applications.
Pursuing high throughput can be achieved by reducing the time spent on actual GC work; however, infrequent GC means that whenever GC runs, there will be a lot of work to do because the number of objects accumulated in the heap during this time is high. A single GC will take more time to complete, leading to higher pause times. Considering low pause times, it is better to run GC frequently to complete it more quickly, which in turn leads to reduced throughput.
- Use the parameter
-XX:GCTimeRatioto set the percentage of garbage collection time in total CPU time. - Use the parameter
-XX:MaxGCPauseMillisto set the maximum pause time during garbage processing. - Use the command
-XX:+UseAdaptiveSizePolicyto enable adaptive policy. We only need to set the heap size and MaxGCPauseMillis or GCTimeRatio, and the collector will automatically adjust the size of the young generation, the ratio of Eden and Survivor, and the age at which objects enter the old generation to maximize the proximity to our set MaxGCPauseMillis or GCTimeRatio.
Old Generation Garbage Collectors#
Serial Old Garbage Collector (Single-threaded)#
The Serial Old collector is the old generation version of Serial, both being single-threaded collectors that only enable one GC thread, suitable for client applications. Their only difference is that Serial Old works in the old generation using the "mark-compact" algorithm, while Serial works in the young generation using the "copying" algorithm.
Parallel Old Garbage Collector (Multi-threaded)#
The Parallel Old collector is the old generation version of Parallel Scavenge, pursuing CPU throughput.
CMS Garbage Collector#
The CMS (Concurrent Mark Sweep) collector is designed to achieve the shortest collection pause time (pursuing low pause). It allows user threads and GC threads to execute concurrently during garbage collection, so users do not feel significant lag during garbage collection.
The CMS collector implements a "mark-sweep" algorithm, and the entire process is divided into four steps:
Initial Mark: Stop The World, using only one initial marking thread to mark all objects directly associated with GC Roots.Concurrent Marking: Using multiple marking threads that execute concurrently with user threads. This process performs reachability analysis and marks all abandoned objects. It is slow.Remark: Stop The World, using multiple marking threads to execute concurrently, marking newly abandoned objects that appeared during the concurrent marking process.Concurrent Sweep: Using only one GC thread that executes concurrently with user threads to clear the marked objects. This process is very time-consuming.
The concurrent marking and concurrent sweeping processes take the longest time and can work alongside user threads, so overall, the memory reclamation process of the CMS collector is executed concurrently with user threads.
Disadvantages
- Sensitive to CPU resources, low throughput;
- Cannot handle floating garbage, leading to frequent Full GC;
- The recycling algorithm it uses - "mark-sweep" - can lead to a large amount of space fragmentation at the end of the collection.
G1 Garbage Collector#
G1 (Garbage-First) is a garbage collector aimed at server applications. It does not have the concept of young and old generations but divides the heap into independent Regions. The G1 collector maintains a priority list in the background, and during garbage collection, it first estimates the amount of garbage in each Region, then selects the Region with the highest value for collection based on the allowed collection time. This method of dividing memory space into Regions and prioritizing region collection ensures that the G1 collector can achieve the highest collection efficiency possible within a limited time (breaking memory into smaller pieces).
Overall, G1 is a collector based on the "mark-compact" algorithm, while locally (between two Regions), it is implemented based on the "copying" algorithm, meaning that no memory space fragmentation will occur during operation.
Each Region has a Remembered Set, which records the regions of all objects referenced in this area. During reachability analysis, simply adding the Remembered Set to GC Roots prevents the need to traverse the entire heap memory. Therefore, even if an object and the objects it references may not be in the same Region, when garbage collection occurs, it does not need to scan the entire heap memory to perform a complete reachability analysis.
If we do not account for the maintenance of the Remembered Set, the work process of the G1 collector is divided into the following steps:
Initial Mark: Stop The World, using only one initial marking thread to mark all objects directly associated with GC Roots.Concurrent Marking: Using one marking thread that executes concurrently with user threads. This process performs reachability analysis and is slow.Final Mark: Stop The World, using multiple marking threads to execute concurrently.Filtering Collection: Collecting abandoned objects, which also requires Stop The World and uses multiple filtering collection threads to execute concurrently.
Additional Extensions#
Situations that Trigger JVM to Perform Full GC#
Calling the System.gc() Method#
The invocation of this method is a suggestion to the JVM to perform Full GC; note that this is only a suggestion and not a certainty, but in many cases, it will trigger Full GC, thereby increasing the frequency of Full GC. Generally, we only need to let the virtual machine manage memory by itself, and we can disable the call to System.gc() using -XX:+DisableExplicitGC.
Insufficient Old Generation Space#
Insufficient old generation space will trigger Full GC operations, and if space remains insufficient after this operation, the following error will be thrown: java.lang.OutOfMemoryError: Java heap space.
Insufficient Permanent Generation Space#
In the JVM specification, the method area in the runtime data area, also known as the permanent generation in the HotSpot virtual machine, stores some class information, constants, static variables, etc. When there are many classes to be loaded, reflected classes, and invoked methods, the permanent generation may become full, triggering Full GC. If Full GC still cannot reclaim space, the JVM will throw the following error message: java.lang.OutOfMemoryError: PermGen space.
Others#
-
Promotion failed and concurrent mode failure during CMS GC
Promotion failed refers to the allocation guarantee failure mentioned above, while concurrent mode failure occurs when objects need to be placed into the old generation during the execution of CMS GC, but the old generation space is insufficient. -
The average size of Minor GC promotions to the old generation exceeds the remaining space in the old generation.