The Java Virtual Machine (JVM) is a crucial component of the Java ecosystem, responsible for converting Java bytecode into machine code that can be executed on the host system. In this blog post, we'll explore the various components of the JVM, their roles in the execution of Java applications, and how to optimize garbage collection for improved application performance.
Components of the Java Virtual Machine
- Class Loader: The class loader is responsible for loading and verifying the bytecode of Java classes. It ensures that the classes are properly formatted and adhere to the Java language specifications.
- Runtime Data Areas: These are memory spaces where the JVM stores class data, memory allocations, and instructions during the execution of a Java application.
- Execution Engine: The execution engine is the core component of the JVM that interprets and executes Java bytecode once it's loaded into the main memory.
Understanding Runtime Data Areas
The Heap: Stores runtime data, such as class instances. Managed by the Garbage Collector (GC). If the application needs more heap space than is available, OutOfMemoryError is thrown.
The Stack: Stores primitive values and object references for each method invocation.
Each thread in an application possesses its own individual stack, which is composed of frames. Whenever a method is invoked, a new frame is added to the stack. Once the method execution concludes, the frame is subsequently removed.
The term "stack" is derived from the fact that only the top frame can be accessed, similar to how one can safely remove plates only from the top of a stack of plates. The top frame is referred to as the current frame because it is associated with the method being executed at that moment.
When an executing method calls another method, a new frame is placed atop the existing one. This newly added frame becomes the current frame, as the recently invoked method is now the one being executed.
The Method Area (Metaspace): Stores runtime representation of classes, such as runtime code, static variables, and constant pools.
The Program Counter (PC) Register: Holds the address of the currently executed instruction for each thread.
The Native Method Stack: Stores values for native code (e.g., C or C++).
Primitives and Objects
- Primitives are stored on the stack when declared as local variables.
- Primitives are stored on the heap when declared as instance variables.
- Objects are stored on the heap, and their references are stored on the stack.
Garbage Collector (GC)
Garbage collection is a crucial part of managing memory in modern programming languages. To identify objects that are no longer being used by an application, the garbage collector traverses the object graph on the heap, following references from one object to another. This process helps the GC to identify live objects, which are still in use by the application. Objects that are not marked as live during this traversal are considered unreachable and can be safely removed from memory. By automating memory management in this way, garbage collection reduces the likelihood of memory leaks and other common programming errors.
Heap Spaces: Divided into Young Generation and Old Generation spaces.
Young Generation Space: Consists of Eden Space and Survivor Space (S0 and S1).
Old Generation Space (Tenured Space): Contains longer-lived objects.
Minor garbage collection (GC)
Minor garbage collection (GC) involves cleaning the young generation area within the Java heap memory, which consists of the Eden space and two survivor spaces (S0 and S1). The majority of new objects are allocated in the Eden space, and when it becomes full, a minor garbage collection is initiated.
Here's a simplified scenario:
- New objects are created by your application and allocated in the Eden space.
- As object creation continues, the Eden space begins to fill.
- When the Eden space reaches capacity, a minor GC is triggered.
- Using a "mark and copy" technique, the garbage collector identifies live objects and relocates them to one of the survivor spaces (S0).
- The Eden space is subsequently cleared, providing room for new object allocation.
- During the next minor GC, live objects from both Eden and the populated survivor space (S0) are moved to the other survivor space (S1). Objects that persist through several minor GC cycles are eventually promoted to the old generation space.
In this scenario, minor GC efficiently cleans the young generation area, where objects tend to have shorter lifetimes. This is based on the generational hypothesis, which posits that most objects have brief lifespans. By focusing on the young generation, the garbage collector minimizes the impact on application performance.
To determine when an object should be promoted from the young to the old generation, the JVM employs a criterion called "tenuring threshold." Configurable, this value indicates the number of times an object should survive minor garbage collection cycles before promotion.
During a minor GC, live objects from the Eden space and the occupied survivor space are transferred to the other survivor space (e.g., from S0 to S1 or vice versa). Each time an object is copied to a survivor space, its age increases. If an object's age meets or surpasses the tenuring threshold, it advances to the old generation.
Object promotion after multiple minor GC cycles is an optimization based on the generational hypothesis. As older objects are more likely to have extended lifetimes, moving them to the old generation, which is collected less frequently than the young generation, is sensible. This reduces the overhead of repeatedly copying long-lived objects between survivor spaces and enhances the overall performance of the garbage collector.
The tenuring threshold can be configured using JVM options such as -XX:InitialTenuringThreshold
and -XX:MaxTenuringThreshold
. By adjusting the tenuring threshold, you can fine-tune the garbage collection behavior to accommodate your application's unique requirements and object allocation patterns.
Major garbage collection (GC)
Major garbage collection (GC) is the process of cleaning up the entire Java heap memory, which includes both the young generation (Eden, S0, and S1 spaces) and the old generation space. Major GC is also sometimes referred to as a "full GC" because it examines the entire heap, rather than just a portion of it.
Let's consider a basic scenario for major GC:
- Your application creates new objects, initially allocating them in the Eden space within the young generation.
- As minor GC cycles occur, objects that survive multiple cycles are promoted to the old generation.
- Over time, the old generation space starts filling up with long-lived objects.
- When the old generation reaches a certain occupancy threshold, a major GC is triggered.
- The garbage collector identifies live objects in both the young and old generations using a technique called "mark and sweep."
- Dead objects are removed, and memory is reclaimed. Live objects in the old generation may be compacted to reduce memory fragmentation.
- After the major GC is complete, the memory in the old generation is freed up, allowing for more promotions from the young generation.
Major GC cycles are more time-consuming and resource-intensive than minor GC cycles because they involve scanning and cleaning the entire heap, including long-lived objects in the old generation. A major GC can cause longer "stop-the-world" pauses, during which the JVM halts application threads to perform the garbage collection. This can lead to noticeable performance degradation, especially in applications with strict latency requirements.
Java's Garbage Collectors
The JVM provides several algorithms with trade-offs in throughput, latency, and memory usage. These include Serial, Parallel, CMS, G1, ZGC, and Shenandoah GC. JMX monitoring can be used to track GC performance, and GC tuning can optimize application performance by adjusting heap size, tenuring threshold, and survivor space ratio. GC logs and monitoring tools like VisualVM and JConsole can help identify bottlenecks and optimization areas. The java.lang.ref package provides reference types (SoftReference, WeakReference, and PhantomReference) for more granular control over object lifetimes and garbage collection behavior.
- Serial GC: A simple, single-threaded garbage collector suitable for small applications or environments with limited resources.
- Parallel GC: A multi-threaded garbage collector that improves throughput by parallelizing young generation collections.
- Concurrent Mark and Sweep (CMS) GC: A low-latency garbage collector that performs most of its work concurrently with the application threads, minimizing "stop-the-world" pauses.
- G1 GC (Garbage-First Collector): A modern, low-latency garbage collector that divides the heap into smaller regions and prioritizes collecting regions with the most garbage to meet specified pause time goals.
- Z Garbage Collector (ZGC): A low-latency, concurrent garbage collector designed for systems with large amounts of memory that uses a combination of concurrent marking, relocation, and remapping of memory pages to minimize GC pause times while maintaining high throughput.
- Shenandoah Garbage Collector: A low-latency garbage collector that uses a concurrent marking and compacting algorithm allowing it to perform most of its work concurrently with the application threads, introduced in JDK 12.