Disassembling Hello World in Java
last updated 2021-09-30 21:48:08 by met
data:image/s3,"s3://crabby-images/6a14a/6a14a8c032cd35eeafb32c924fc61da2d2345bd6" alt="image showing github search results"
Writing The Program
Software written in Java is usually compiled into Java Bytecode which are then executed in the Java Virtual Machine. Let's write the Hello World example. First, create a file named "HelloWorld.java" and write the content below.class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!"); } }
$ javac HelloWorld.java
The .class File Format
A Java class file contains the actual bytecodes for a class, constant pool, access flags, version metadata, superclass & interface id (actual superclass and interface names are stored in the constant pool) and various attributes. To see more about what a class file contains, you can see more about the class file format at https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.htmlA class file is identified by its first 4 bytes, printing it shows following out:
$ xxd -l 4 HelloWorld.class 00000000: cafe babe ....
Disassembling
When a Java Virtual Machine starts up, it first looks for a main function in the specified class.The main class is usually defined as
public static void main(String[] args)
in Java Language.
In JVM, that method is searched within the class file with a method name of "main" and a method descriptor of ([Ljava/lang/String;)V
,
which basically means a method that takes an array of String class instances as parameter and returns void.
Details about descriptors can be found at: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.3.2
We can disassemble a class file with the
javap
tool:
$ javap -c HelloWorld Compiled from "HelloWorld.java" class HelloWorld { HelloWorld(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello, World! 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return }
Since main method is a static method, the constructor of our class is not invoked during execution of our HelloWorld program.
Interpreting The Bytecodes
The bytecodes are obviously doing something that prints the "Hello World". Lets follow them in the light of JVM Instruction Set Specs.0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
getstatic
instruction pushes the value of a static field into the operand stack after initializing its class through <clinit> method if not initialized already.
Next two bytes after this instruction is built into an index used for fetching the field name from the Constant Pool. In our case, the static field is the "System.out".
3: ldc #3 // String Hello, World!
ldc
is the instruction for loading an item from the constant pool and pushing it into the operand stack.
In our case, the constant pool entry is a String literal, so a reference to a String "Hello, World!" is pushed into the stack.
Detailed information about the constant pool can be found at here:
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
invokevirtual
invokes the instance method fetched from constant pool with arguments from the operand stack.
The println method we called, takes 1 argument. Since this is an instance method, we need an instance object reference
which is also at the operand stack as we previously pushed with getstatic.
After popping 2 values from the operand stack, we can run the println method.
Tracing the java.io library, the println method eventually calls a function called
FileOutputStream.writeBytes
that is defined as a private native void.
So there's no implementation for it in java.
Java Native Interface & System Calls
JVM is isolated from the underlying system by design. So any action requiring access to the underlying system is done by native methods via JNI. In our case, we need to print stuff to the console which is done by writing into the file descriptor 1 as stated in the POSIX and thejava.io.FileDescriptor#out
.
Even a simple program in java requires many native methods to be registered and linked. For example, running our Hello World example with below command shows some relevant native methods for printing the text.
Some output is omitted for convenience. $ java -verbose:jni HelloWorld … [Dynamic-linking native method java.io.FileOutputStream.writeBytes … JNI] Hello, World!
FileOutputStream.writeBytes
can be
found at JDK sources.
JNIEXPORT void JNICALL Java_java_io_FileOutputStream_writeBytes(JNIEnv *env, jobject this, jbyteArray bytes, jint off, jint len, jboolean append) { writeBytes(env, this, bytes, off, len, append, fos_fd); }
To prove our point and path we reached, we can use
strace
to dump all system calls JVM and its forks does during execution.
Some output is omitted for convenience. # strace -f java HelloWorld ... ... [pid 29907] write(1, "Hello, World!", 13) = 13 [pid 29907] write(1, "\n", 1)
Conclusion
Even though JVM instruction set has a wide range of instructions, any action requires communicating with the kernel or underlying host requires native code to be executed through the Java Native Interface.Digging into a simple Hello World program can give us hints about how JVM works and communicates out of it's isolated space.
Discussion is encouraged.
References
JVM SpecificationsJava Language Specifications
JDK Source
Wikipedia Write SysCall
Write Syscall Man Page
Originally written for Sahibinden Technology