Demystifying Java Bytecode: An Easy Intro

October 7, 2023 · 1212 words · 6 minutes

Contents

Introduction

Java bytecode serves as the crucial intermediate representation of Java source code, enabling the famous “write once, run anywhere” feature. Let’s explore the underlying language of the Java Virtual Machine (JVM).

Components of Java

Three essential components power the world of Java: the JDK, JRE, and JVM. The JDK (Java Development Kit) is the developer’s toolkit, incorporating the Java compiler and development tools. The JRE (Java Runtime Environment) is for end-users, ensuring Java applications run smoothly. The JVM, our focus here, is the runtime engine responsible for executing Java bytecode, rendering it independent of the underlying platform.

How bytecode is created?

It all begins with Java source code and which is written in a human-readable format. The transformation process of creating a bytecode from the source code is handled by javac tool which is part of the Java Development Kit (JDK). If we compile the following code using javac Main.java command, javac transforms the code into a Main.class file which contains the bytecode. The process happens in several steps:

1
2
3
4
5
6


public class Main {
    public static void main(String[] args) {
        int a = 3, b = 5;
        System.out.println(a + b);
    }
}

Syntax Checking

The javac compiler first checks the source code for syntax errors. If it finds any, it will abort and report as compilation error.

Bytecode Generation

Assuming there are no syntax errors, javac translates the Java source code into bytecode instructions. These instructions are platform-independent, meaning they can be executed on any device that has a Java Virtual Machine (JVM).

Digging Deeper into Bytecode

Disassembling bytecode

Java provided another command-line tool javap which is used for disassembling or decompiling Java bytecodes. It allows to view the bytecode instructions of compiled Java classes in a human-readable format. This tool is also part of the JDK. By running javap -c src/Main.class we can see the bytecode instructions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


Compiled from "Main.java"
public class Main {
  public Main();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: iconst_3
       1: istore_1
       2: iconst_5
       3: istore_2
       4: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       7: iload_1
       8: iload_2
       9: iadd
      10: invokevirtual #3                  // Method java/io/PrintStream.println:(I)V
      13: return
}

Opcode

Opcode stands for operation codes, refer to the set of instructions used in bytecode to define the actions that the JVM should perform. These opcodes are fundamental building blocks that dictate how the JVM executes a Java program. Each opcode represents a specific operation, such as loading a value onto the stack, performing arithmetic operations, invoking methods or controlling program flow. All supported list of opcodes can be found in Oracle docs.

Opcode Breakdown

In our example bytecode above, we can see some instructions: iconst, istore, getstatic, iload, iadd and invokevirtual. Let’s explain:

iconst instructs to load a constant on to the operand stack. The operand stack is a runtime data structure that is used to perform operations and computations. In our code, we defined two constants, 3 and 5 (int a = 3, b = 5;). istore instruction pops an integer from the operand stack and stores it in a local variable. If we look into the first four lines:

1
2
3
4


0: iconst_3  // Push integer constant 3
1: istore_1  // Pops integer 3 from the operand stack and stores it in local variable slot 1
2: iconst_5  // Push int constant 5
3: istore_2  // Pops integer 5 from the operand stack and stores it in local variable slot 2

Now line number 4, getstatic instruction is used to retrieve the value of a static field (class variable) from a class. Here we use System.out.println, if we look into the System source code, out is a static final field of the System class of type PrintStream.

1

4: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;

It instructs to get the static Field java/lang/System.out of field data type Ljava/io/PrintStream. Here L in Ljava is a prefix used to represent a class type. java/io indicates the package where the PrintStream located at. The #2 references a constant pool entry that specifies this field.

Next line number 7 & 8, iload instructs to load a local variable onto the operand stack. iload_1 loads the value from local variable slot 1 (by istore_1, which is 3) onto the stack and iload_2 loads the value from local variable slot 2 (by istore_2 which is 5) onto the stack.

1
2


7: iload_1
8: iload_2

iadd is an arithmetic addition instruction. This instruction adds the top two integers on the stack, which are 3 and 5. The result, 8, is left on the stack.

1

9: iadd

invokevirtual invokes an instance method of an object, dispatching on the (virtual) type of the object. This is the normal method dispatch in the Java programming language. In line number 10, it invokes the println method of the PrintStream class, which prints the integer value on the stack (8) to the standard output. The #3 references a constant pool entry that specifies the method signature.

1

10: invokevirtual #3                  // Method java/io/PrintStream.println:(I)V

Why bytecode is slower than machine code?

Bytecode is typically slower than machine code because of the additional layer of interpretation and runtime checks. A machine code is a direct instruction to CPU, CPU can understand it and execute directly without any external help. On the other hand, bytecode is semi-compiled, its an intermediate state interpreted by the JVM, translated to machine code in rumtime.

JVM incorporates runtime checks for memory safety, security, and to enforce language rules (array bounds checking, null pointer checks etc). These checks are essential for security and stability but come at a cost in terms of execution speed. Machine code doesn’t typically include these checks, as it assumes the programmer has handled them at compile time.

While JIT compilation can optimize bytecode, it may not achieve the same level of optimization as a dedicated native compiler that generates machine code specific to the target platform. Native compilers can take advantage of low-level details and hardware-specific features, resulting in more efficient code.

Advantages of bytecode over machine code?

Although bytecode is slower than the machine code, Java uses bytecode for portability and security. Byte code is designed to be platform-independent, Since it’s an intermediate representation, it can be run on any platform that has a compatible JVM without modification.

JVM provides a layer of security. It enforces security checks, like access control and memory management, to prevent unauthorized or harmful actions. Bytecode doesn’t have direct access to system resources, which makes it harder for malicious code to compromise a system.

Bytecode allows for dynamic class loading at runtime. Classes can be loaded into the JVM on-demand, enabling features like reflection and dynamic class loading. Bytecode abstracts away many low-level details of memory management and hardware interaction, making Java easier to learn and program in than languages that compile directly to machine code.

Conclusion

In this article, we’ve taken a brief look at bytecode. For a deeper understanding of the intricate workings and structures of the JVM, you can explore the Java Virtual Machine Specification for detail underlying mechanism and structures of JVM.