Disassembling Hello World in Java

last updated 2021-09-30 21:48:08 by met

image showing github search results
1.6M Repository Results About Hello World.
For most, "Hello World" has been the prominent choice of test phrase when beginning journey of learning a new programming language. It's widely used in examples and tutorials. Making a quick search in the Github reveals over 1.6M results so there's no doubt of its widespread use. Java was the second most used language - after HTML - in the search.

Writing The Program

Software written in Java is usually compiled into Java Bytecode which are then executed in the Java Virtual Machine. Let's write the Hello World example. First, create a file named "HelloWorld.java" and write the content below.
class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World!");
    }
}
Hello World Example
Javac is the software used for converting source code conforming to Java Language Specification into JVM compatible bytecode. Executing the below statement generates a file named "HelloWorld.class".
$ javac HelloWorld.java

The .class File Format

A Java class file contains the actual bytecodes for a class, constant pool, access flags, version metadata, superclass & interface id (actual superclass and interface names are stored in the constant pool) and various attributes. To see more about what a class file contains, you can see more about the class file format at https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html

A class file is identified by its first 4 bytes, printing it shows following out:

$ xxd -l 4 HelloWorld.class
00000000: cafe babe                                ....
0xCAFEBABE is the magic number that JVM uses to identify class files. Lets disassemble the class file we generated to see more.

Disassembling

When a Java Virtual Machine starts up, it first looks for a main function in the specified class.

The main class is usually defined as public static void main(String[] args) in Java Language. In JVM, that method is searched within the class file with a method name of "main" and a method descriptor of ([Ljava/lang/String;)V, which basically means a method that takes an array of String class instances as parameter and returns void.

Details about descriptors can be found at: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.3.2

We can disassemble a class file with the javap tool:
$ javap -c HelloWorld
Compiled from "HelloWorld.java"
class HelloWorld {
  HelloWorld();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return
public static void main(java.lang.String[]);
    Code:
       0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #3                  // String Hello, World!
       5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: return
}
Decompiled output shows us that there are two methods in our class file. First one is the constructor, aka <init> function in JVM's notations; the other one is the entry point of our program, our main method. Even though we haven't added a constructor, the compiler added a basic constructor invoking the superclass's constructor. In our case, HelloWorld class doesn't have an explicit superclass but every class in java is ultimately derived from the Object class so <init> function of Object is invoked.

Since main method is a static method, the constructor of our class is not invoked during execution of our HelloWorld program.

Interpreting The Bytecodes

The bytecodes are obviously doing something that prints the "Hello World". Lets follow them in the light of JVM Instruction Set Specs.

0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
getstatic instruction pushes the value of a static field into the operand stack after initializing its class through <clinit> method if not initialized already.
Next two bytes after this instruction is built into an index used for fetching the field name from the Constant Pool. In our case, the static field is the "System.out".

3: ldc #3 // String Hello, World!
ldc is the instruction for loading an item from the constant pool and pushing it into the operand stack. In our case, the constant pool entry is a String literal, so a reference to a String "Hello, World!" is pushed into the stack.
Detailed information about the constant pool can be found at here:

5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
invokevirtual invokes the instance method fetched from constant pool with arguments from the operand stack. The println method we called, takes 1 argument. Since this is an instance method, we need an instance object reference which is also at the operand stack as we previously pushed with getstatic.
After popping 2 values from the operand stack, we can run the println method.

Tracing the java.io library, the println method eventually calls a function called FileOutputStream.writeBytes that is defined as a private native void. So there's no implementation for it in java.

Java Native Interface & System Calls

JVM is isolated from the underlying system by design. So any action requiring access to the underlying system is done by native methods via JNI. In our case, we need to print stuff to the console which is done by writing into the file descriptor 1 as stated in the POSIX and the java.io.FileDescriptor#out.

Even a simple program in java requires many native methods to be registered and linked. For example, running our Hello World example with below command shows some relevant native methods for printing the text.
Some output is omitted for convenience.
$ java -verbose:jni HelloWorld
…
[Dynamic-linking native method java.io.FileOutputStream.writeBytes … JNI]
Hello, World!
We traced to the writeBytes method and found that its defined as a native void. Tracing further shows us implementation of the FileOutputStream.writeBytes can be found at JDK sources.
JNIEXPORT void JNICALL
Java_java_io_FileOutputStream_writeBytes(JNIEnv *env,
    jobject this, jbyteArray bytes, jint off, jint len, jboolean append) {
    writeBytes(env, this, bytes, off, len, append, fos_fd);
}
Native method writeBytes
User level programs usually communicate with the kernel through system calls. In our case, the system call for writing data into a file descriptor is "Write". The writeBytes method is a wrapper for write function which is also a wrapper for the write system call.

To prove our point and path we reached, we can use strace to dump all system calls JVM and its forks does during execution.
Some output is omitted for convenience.
# strace -f java HelloWorld
...
...
[pid 29907] write(1, "Hello, World!", 13) = 13
[pid 29907] write(1, "\n", 1)   
A write system call writing into the STDOUT (fd 1 as per POSIX) can be seen. Detailed information about the write system call and it's wrapper can be found at the man file.

Conclusion

Even though JVM instruction set has a wide range of instructions, any action requires communicating with the kernel or underlying host requires native code to be executed through the Java Native Interface.

Digging into a simple Hello World program can give us hints about how JVM works and communicates out of it's isolated space.

Discussion is encouraged.

References

JVM Specifications
Java Language Specifications
JDK Source
Wikipedia Write SysCall
Write Syscall Man Page


Originally written for Sahibinden Technology