JVM Series - Class Loader Mechanism#
Content organized from:
I. Brief Introduction#
The virtual machine loads class data from the class file into memory, verifies, transforms, parses, and initializes the data, ultimately forming a Java type that can be directly used by the virtual machine. This is the class loading mechanism of the virtual machine.
II. Class Loading Process and Lifecycle#
The class loading process is divided into three steps (five phases): Loading
-> Linking (Verification, Preparation, Resolution)
-> Initialization
Loading#
Description of the loading process:
- Locate the .class file using the fully qualified name of the class and obtain its binary byte stream.
- Convert the static storage structure represented by the byte stream into a runtime data structure in the method area.
- Generate a java.lang.Class object for this class in the Java heap, serving as the access entry for these data in the method area.
Linking#
Linking: includes three steps: verification, preparation, and resolution.
Verification#
Verification is the first step of the linking phase, used to ensure that the information in the Class byte stream meets the requirements of the virtual machine.
Specific forms of verification:
File format verification
: Verifies whether the byte stream conforms to the Class file format specification; for example: whether it starts with 0xCAFEBABE, whether the major and minor version numbers are within the processing range of the current virtual machine, and whether the constants in the constant pool have unsupported types.Metadata verification
: Performs semantic analysis on the information described by the bytecode (note: compare with the semantic analysis during the javac compilation phase) to ensure that the described information meets the requirements of the Java language specification; for example: whether this class has a superclass, other than java.lang.Object.Bytecode verification
: Determines whether the program semantics are legal and logical through data flow and control flow analysis.Symbolic reference verification
: Ensures that the resolution action can be executed correctly.
Preparation#
Allocate memory for the class's static variables
and initialize them to default values. The preparation process typically allocates a structure to store class information, which includes member variables, method, and interface information defined in the class.
Specific behaviors:
- At this time, memory allocation
only includes class variables (static)
, excluding instance variables, which will be allocated in the Java heap along with the object during instantiation. - The initial values set here are usually
the default zero values of the data types
(such as 0, 0L, null, false, etc.), rather than values explicitly assigned in Java code (with the exception ofconstants
that are explicitly assigned).
Resolution#
Resolution: Converts the symbolic references
in the class to direct references
in the constant pool.
Symbolic References: Symbolic references describe the referenced target with a set of symbols, which can be any form of literal as long as it can uniquely locate the target. Symbolic references are independent of memory layout, so the referenced object does not necessarily need to be loaded into memory. Various virtual machine implementations may have different memory layouts, but the accepted symbolic references must be consistent, as the literal form of symbolic references is clearly defined in the Class file format.
Direct References: Direct references are pointers, relative offsets, or handles that can indirectly locate the target. Direct references are related to the memory layout of the virtual machine, and the same symbolic reference generally does not translate to the same direct reference on different virtual machines. If a direct reference exists, it must already be in memory.
Types of constants in the constant pool:
- The number of constants in the constant pool is not fixed, so a u2 type unsigned number is placed at the beginning of the constant pool to store the current capacity of the constant pool.
- Each constant in the constant pool is a table, with the first position of the table being a u1 type flag (tag) that represents the type of the current constant.
Type | Tag | Description |
---|---|---|
CONSTANT_utf8_info | 1 | UTF-8 encoded string |
CONSTANT_Integer_info | 3 | Integer literal |
CONSTANT_Float_info | 4 | Float literal |
CONSTANT_Long_info | 5 | Long literal |
CONSTANT_Double_info | 6 | Double literal |
CONSTANT_Class_info | 7 | Symbolic reference to a class or interface |
CONSTANT_String_info | 8 | String literal |
CONSTANT_Fieldref_info | 9 | Symbolic reference to a field |
CONSTANT_Methodref_info | 10 | Symbolic reference to a method in a class |
CONSTANT_InterfaceMethodref_info | 11 | Symbolic reference to a method in an interface |
CONSTANT_NameAndType_info | 12 | Symbolic reference to a field or method |
CONSTANT_MethodHandle_info | 15 | Represents a method handle |
CONSTANT_MethodType_info | 16 | Identifies method type |
CONSTANT_InvokeDynamic_info | 18 | Represents a dynamic method call site |
The resolution action mainly targets seven types of symbolic references: class or interface
, field
, class method
, interface method
, method type
, method handle
, and call site qualifier
.
Initialization#
Initialization: Assigns correct initial values to class static variables.
Goals of Initialization#
- To implement the initialization of initial values specified when declaring class static variables;
- To implement the initialization of initial values set using static code blocks.
Steps of Initialization#
- If this class has not been loaded or linked, first load and link this class;
- If the direct superclass of this class has not been initialized, first initialize its direct superclass;
- If there are initialization statements in the class, execute the initialization statements in order.
Timing of Initialization#
In situation 1, the four bytecode instructions are most commonly encountered in Java when:
- Creating a new object
- Setting or getting a class's static field (excluding static fields marked as final that are placed in the constant pool)
- Calling a static method of a class
Order of Initialization for Parent and Child Classes in Java#
- Static member variables and static code blocks in the parent class
- Static member variables and static code blocks in the child class
- Ordinary member variables and code blocks in the parent class, and the constructor of the parent class
- Ordinary member variables and code blocks in the child class, and the constructor of the child class
Active and Passive References to Classes#
In the Java Virtual Machine specification, it is strictly stipulated that only active references to a class will trigger its initialization method. Other forms of references are called passive references, which do not trigger the class's initialization method.
Active Reference
Active reference: During the class loading phase, only loading and linking operations are executed, and no initialization operations are performed.
Passive Reference
All reference situations other than active references are called passive references, and these references will not trigger initialization.
Forms of passive references include:
- Referencing a static field of a parent class through a subclass does not cause the subclass to initialize;
- Defining an array reference of a class without assigning a value does not trigger the initialization of this class;
- Accessing constants defined in a class does not trigger the initialization of this class.
III. Three Types of Class Loaders#
- The Bootstrap Classloader is initialized after the Java virtual machine starts.
- The Bootstrap Classloader is responsible for loading the ExtClassLoader and setting the parent loader of the ExtClassLoader to the Bootstrap Classloader.
- After the Bootstrap Classloader loads the ExtClassLoader, it will load the AppClassLoader and set the parent loader of the AppClassLoader to the ExtClassLoader.
Bootstrap ClassLoader#
Bootstrap ClassLoader
: Responsible for loading libraries stored in JDK\jre\lib (JDK represents the installation directory of JDK, the same below) or paths specified by the -Xbootclasspath parameter, which can be recognized by the virtual machine (such as rt.jar, all classes starting with java. are loaded by the Bootstrap ClassLoader). The Bootstrap ClassLoader cannot be directly referenced by Java programs.
Extension ClassLoader#
Extension ClassLoader
: This loader is implemented by sun.misc.Launcher$ExtClassLoader, and it is responsible for loading all libraries in the JDK\jre\lib\ext directory or paths specified by the java.ext.dirs system variable (such as classes starting with javax.). Developers can directly use the extension class loader.
Application ClassLoader#
Application ClassLoader
: This class loader is implemented by sun.misc.Launcher$AppClassLoader, and it is responsible for loading classes specified by the user class path (classes under the program's own classpath). Developers can directly use this class loader. If the application does not define its own class loader, this is generally the default class loader in the program.
Class Loader Isolation Issues#
Each class loader has its own namespace to store loaded classes. When a class loader loads a class, it searches for the class using the fully qualified name stored in its namespace to check whether the class has already been loaded.
The unique identification of classes in the JVM and Dalvik is ClassLoader id + PackageName + ClassName
, so it is possible for two classes with the same package name and class name to exist in a running program. Moreover, if these two classes are not loaded by the same ClassLoader, it is impossible to cast an instance of one class to another class. This is the isolation of ClassLoaders.
To solve the isolation problem of class loaders, the JVM introduces the parent delegation mechanism.
IV. Parent Delegation Model#
Core idea: First, check whether the class has been loaded from the bottom up
; second, try to load the class from the top down
.
The workflow of the parent delegation model is: If a class loader receives a class loading request, it will first not attempt to load the class itself, but delegate the request to the parent loader to complete, going up in order. Therefore, all class loading requests should ultimately be passed to the top-level Bootstrap ClassLoader, and only when the parent loader cannot find the required class within its search scope, will the child loader attempt to load the class itself.
Specific Loading Process#
- When the AppClassLoader loads a class, it will first not attempt to load the class itself, but delegate the class loading request to the parent class loader ExtClassLoader to complete.
- When the ExtClassLoader loads a class, it will also first not attempt to load the class itself, but delegate the class loading request to the BootstrapClassLoader to complete.
- If the BootstrapClassLoader fails to load (for example, if the class is not found in %JAVA_HOME%/jre/lib), the ExtClassLoader will attempt to load it;
- If the ExtClassLoader also fails to load, the AppClassLoader will attempt to load it, and if the AppClassLoader also fails, a ClassNotFoundException will be thrown.
Significance of the Parent Delegation Model#
- Prevents multiple copies of the same bytecode from appearing in memory, creating a hierarchical classification of classes.
- Ensures that Java programs run safely and stably.
Take java.lang.Object
as an example; when you load it, it ultimately goes through layers of delegation to be loaded by the Bootstrap ClassLoader, which means that it is ultimately the Bootstrap ClassLoader that looks for java.lang.Object
in <JAVA_HOME>\lib's rt.jar
and loads it into the JVM. Thus, if someone maliciously creates their own java.lang.Object with bad code, the implementation of the parent delegation model ensures that only the contents of our rt.jar are loaded into the JVM, thereby protecting these core foundational class codes.
Expansion
Why does Java SPI break the parent delegation model?
V. Class Loading Methods#
- Initialized and loaded by the JVM when starting the application from the command line.
- Dynamically loaded via the Class.forName() method.
- Dynamically loaded via the ClassLoader.loadClass() method.
Class.forName() and ClassLoader.loadClass()
- Class.forName(): Loads the .class file into the JVM and executes the static code blocks in the class while interpreting the class;
- ClassLoader.loadClass(): Only loads the .class file into the JVM without executing the contents of the static code blocks; it will only execute when newInstance is called.
VI. Custom Loaders#
Applications are loaded by the cooperation of these three types of class loaders, and if necessary, we can also add custom class loaders. Since the built-in ClassLoader of the JVM only knows how to load standard Java class files from the local file system, writing your own ClassLoader allows you to achieve the following:
- Automatically verify digital signatures before executing untrusted code.
- Dynamically create customized classes that meet specific user needs.
- Retrieve Java classes from specific locations, such as from a database or over the network.