My new favourite thing: ASM

[This could also have been titled ANTLR4 project with Maven – Tutorial (episode 4)]

[The full source code is here on github]

Introduction

ASM is:

an all purpose Java bytecode manipulation and analysis framework. It can be used to modify existing classes or dynamically generate classes, directly in binary form. Provided common transformations and analysis algorithms allow to easily assemble custom complex transformations and code analysis tools.

It has several uses but the most remarkable is the ability to easily output Java Bytecode and dump byte array representations of class files.

In order to allow this, ASM has a set of handy APIs and a couple of tools that guides you by examples, rather that teaching you up front a mass of notions. I’ve recently used ASM to build the next step of my ANTLR4 Maven tutorial: an essential compiler that translates parsed expressions into java classes (i.e. .class files). I point you to this branch if you want to take a look at the complete source code.

A lot of cool staff out there uses ASM, in particular Groovy and Clojure, just to mention two main representatives of the JVM languages world, use ASM to compile to class files.

Before starting using ASM a couple of preparatory activities are needed. The first is intalling the Bytecode Outline plugin for Eclipse. This will become your main educational tool. The most useful thing that it can do is generating the source code of a Java class that will output the bytecode of another Java class. To be more clear, if you want to know how to generate the bytecode for a certain class, or method, or block, etc…, you write the source code of the Java class whose bytecode you would like to create, then you inspect it with the Bytecode Outline plugin, and it generates the Java code that you should write in order to builds the corresponding bytecode.

Imagine that I want to output the bytecode that corresponds to the following Java class:


public class SimulatedCompiledExpression
{
	public double compute(){
		return compute_0();
	}
	
	public double compute_0(){
		return compute_01() * compute_02();
	}
	
	public double compute_01(){
		return 2.0D;
	}
	
	public double compute_02(){
		return 3.0;
	}
}

once I have written and compiled the class in Eclipse, I open the Bytecode view (Window -> Show View)

show-view

and it tells me this:

bytecode

If you take the Java class generated on the right panel, compile it and call its dump method, what you get is the bytecode corresponding to the SimulatedCompiledExpression class.

The second preparatory step is specific to my example. Since you want to be able to test the compiled classes on the fly, a custom class loader is useful that could load a class directly from its raw byte array representation. It did something very basic but it’s enough to allow unit tests:

package org.merka.arithmetic.classloader;

import org.apache.commons.lang.StringUtils;

public class ByteArrayClassLoader extends ClassLoader
{
	private byte[] rawClass;
	private String name;
	private Class<?> clazz; 
			
	public ByteArrayClassLoader(byte[] rawClass, String name)
	{
		if(StringUtils.isBlank(name)){
			throw new IllegalArgumentException("name");
		}
		if(rawClass == null){
			throw new IllegalArgumentException("rawClass");
		}
		this.rawClass = rawClass;
		this.name = name;
	}
	
	@Override
	protected Class<?> findClass(String name) throws ClassNotFoundException
	{
		if(this.name.equals(name)){			
			return defineClass(this.name, this.rawClass, 0, this.rawClass.length);
		}
		return super.findClass(name);
	}
}

Maven dependency for ASM

To have the full power of ASM at your disposal in a Maven project, add the following dependency in the pom.xml file:


<dependency>
    <groupId>org.ow2.asm</groupId>
    <artifactId>asm-all</artifactId>
    <version>${asm.version}</version>
</dependency>

The latest version (and the one I used in this tutorial) is 5.0.4.

Compilation

The idea for this example is to translate every production of the language in a method that returns the result of the evaluation of the correspondig subtree. I know it’s totally useless but, again, this is just a tutorial to learn how ASM works. I’ve never claimed that the entire arithmetic example was practically usefull in the first place.

Having an expression like “2 * 3”, I would like to create a class that corresponds to the one I’ve just reported previously (look at the SimulatedCompiledExpression above). Every time I did not know how to use the ASM APIs to accomplish my task, I just wrote the Java code correspondig to the ideal result I wanted, checked with Bytecode Outline and then went back to my bytecode generation code.

The actual work of translating expressions into bytecode is done by the NaiveCompilerVisitor. For each significant production, it creates a method that computes and returns the value of its subtree. The visitor is defined as a subclass of ArithmeticBaseVisitor because each visit method returns the name of the method it just created, so that it can be used by the parent level.

Let’s see some code:

	public String visitProgram(ProgramContext ctx)
	{ 
		// builds the prolog of the class
//		FieldVisitor fv;
		MethodVisitor mv;
//		AnnotationVisitor av0;
		
		traceClassVisitor.visit(V1_7, ACC_PUBLIC + ACC_SUPER,
				getQualifiedName(), null, "java/lang/Object",
				null);

		traceClassVisitor.visitSource(className + ".java", null);
		
		// builds the default constructor
		{
			// [here goes the code obtained from the bytecode outline,
			//   slightly modified to fit our needs]
			mv = traceClassVisitor.visitMethod(ACC_PUBLIC, "<init>", "()V", null, null);
			mv.visitCode();
			Label l0 = new Label();
			mv.visitLabel(l0);
			//mv.visitLineNumber(3, l0);
			mv.visitVarInsn(ALOAD, 0);
			mv.visitMethodInsn(INVOKESPECIAL, "java/lang/Object", "<init>", "()V", false);
			mv.visitInsn(RETURN);
			Label l1 = new Label();
			mv.visitLabel(l1);
			mv.visitLocalVariable("this", getStackQualifiedName(),
					null, l0, l1, 0);
			mv.visitMaxs(1, 1);
			mv.visitEnd();
		}
		
		// passes itself into the child node
		String innerMethodName = ctx.expression().accept(this);
		
		// creates a top level method named "compute"
		// that internally calls the previous generated innerMethodName
		{	
			// [here goes the code obtained from the bytecode outline,
			//   slightly modified to fit our needs]
			mv = classWriter.visitMethod(ACC_PUBLIC, "compute", "()D", null, null);
			mv.visitCode();
			Label l0 = new Label();
			mv.visitLabel(l0);
			//mv.visitLineNumber(14, l0);
			mv.visitVarInsn(ALOAD, 0);
			mv.visitMethodInsn(INVOKEVIRTUAL, getQualifiedName(), innerMethodName, "()D", false);
			mv.visitInsn(DRETURN);
			Label l1 = new Label();
			mv.visitLabel(l1);
			mv.visitLocalVariable("this", getStackQualifiedName(), null, l0, l1, 0);
			mv.visitMaxs(2, 1);
			mv.visitEnd();
		}
		
		// build the epilog of the class
		traceClassVisitor.visitEnd();
		return "compute";
	}

The code in the top level visit method shown here, writes the bytecode that defines a class, a default constructor and a public method named “compute”. The result of this code alone, translated in Java, would look like this:

public class <TheClassName>;
{
	public double compute(){
		return compute_0(); // "compute_0" is the name returned by ctx.expression().accept(this), see line 35 of the previous snippet
        } 
} 

At line 35 you see the visitor starting the recursion into the subtree. Each subnode, once visited, enriches the class with a new method and returns the name of it, so that it can be employed by the parent level. At the end of the visit, the getRawClass method of the NaiveCompilerVisitor returns the raw byte representation of the class: it can be saved as a .class file (then it becomes a totally legitimate class) or loaded on the fly by the ByteArrayClassLoader.

Let’s see another visit method. From now on you can realize that the code is really similar to that of NaiveInterpreterVisitor:

public String visitAlgebraicSum(AlgebraicSumContext ctx)
	{
		int byteCodeOp = -1;
		String operand = ctx.getChild(1).getText();
		if(operand.equals("+")){
			byteCodeOp = DADD;
		}
		else if(operand.equals("-")){
			byteCodeOp = DSUB;
		}
		else
		{
			throw new ArithmeticException("Something has really gone wrong");
		}
		
		String leftArgumentMethod = ctx.expression(0).accept(this);
		String rightArgumentMethod = ctx.expression(1).accept(this);
		
		// builds a method whose body is
		// 'return <leftArgumentMethod>() + rightArgumentMethod()'
		
		String currentMethodName = getNextMethodName();
		MethodVisitor methodVisitor;
		{
			methodVisitor = classWriter.visitMethod(ACC_PUBLIC, currentMethodName, "()D", null, null);
			methodVisitor.visitCode();
			Label l0 = new Label();
			methodVisitor.visitLabel(l0);

			methodVisitor.visitVarInsn(ALOAD, 0);
			methodVisitor.visitMethodInsn(INVOKEVIRTUAL, getQualifiedName(), leftArgumentMethod, "()D", false);
			methodVisitor.visitVarInsn(ALOAD, 0);
			methodVisitor.visitMethodInsn(INVOKEVIRTUAL, getQualifiedName(), rightArgumentMethod, "()D", false);
			methodVisitor.visitInsn(byteCodeOp);
			methodVisitor.visitInsn(DRETURN);
			Label l1 = new Label();
			methodVisitor.visitLabel(l1);
			methodVisitor.visitLocalVariable("this", getStackQualifiedName(), null, l0, l1, 0);
			methodVisitor.visitMaxs(4, 1);
			methodVisitor.visitEnd();
		}
		
		return currentMethodName;
	}

The idea is the same: first we visit each subtree of the AlgebraicSumContextNode. Each visit creates a method in the output bytecode and returns its name to the parent level. Then we use those names in the generation of the current method (line 31 and 33). As the comment states, the goal here is to have a bytecode method whose body is equivalent to the Java statement:

return <leftArgumentMethod>() (+ | -) <rightArgumentMethod>();

Test

A unit test might help understand how such a visitor can be used by client code:

	@Test
	public void testWriteClass() throws Exception
	{
		String program = "1 + 1 + 1 * 2 * (4+2) * 2 - (1 + 1 - 4 + 1 +1 ) * 2 / 3 / 3 / 3"; // "4 + 1";
		TestArithmeticParser.ArithmeticTestErrorListener errorListener = new TestArithmeticParser.ArithmeticTestErrorListener();
		ProgramContext parseTreeRoot = TestArithmeticParser.parseProgram(program, errorListener);

		NaiveCompilerVisitor visitor = new NaiveCompilerVisitor("org.merka.onthefly",
				"CompiledExpression");

		visitor.visit(parseTreeRoot);
		byte[] rawClass = visitor.getRawClass();
		
		File file = new File("target/org/merka/onthefly/CompiledExpression.class");
		FileUtils.writeByteArrayToFile(file, rawClass);
	}

As usual, first we parse the program (line 6) then we create an instance of the compiler visitor that takes as parameter the name of the package and the simple name of the class to be generated (line 8). We visit the parse tree (line 11), then we get the resulting bytecode as a byte array (line 12). This is the actual content of a class file. We can save it to a file inside the expected folder structure (line 15): now we can use this class as we would do with any other class. In fact, you can also try and open it in Eclipse, and this is what you get:

bytecode2

nice and valid java bytecode.

On the other side, you can generate and load classes on the fly. To do this, I use my custom ByteArrayClassLoader and a bit of reflection, since none of the generated types are known at compile time:

	@Test
	public void testOnTheFly() throws Exception
	{
		String tempPackage = "org.merka.onthefly";
		String program = "2 + 3";
		double result = evaluateClassOnTheFly(program, tempPackage, "CompiledSum");
		assertEquals("result of current program: '" + program + "'", 5, result, 0.00001);
	}

	public double evaluateClassOnTheFly(String program, String packageName, String className) throws Exception
	{
		TestArithmeticParser.ArithmeticTestErrorListener errorListener = new TestArithmeticParser.ArithmeticTestErrorListener();
		ProgramContext parseTreeRoot = TestArithmeticParser.parseProgram(program, errorListener);

		NaiveCompilerVisitor visitor = new NaiveCompilerVisitor(packageName,
				className);

		visitor.visit(parseTreeRoot);
		byte[] rawClass = visitor.getRawClass();
		String name = packageName + "." + className;
		ByteArrayClassLoader classLoader = new ByteArrayClassLoader(rawClass, name);
		Class<?> compiledClass = classLoader.loadClass(name);

		assertNotNull(compiledClass);

		Object instance = compiledClass.newInstance();
		Class<?>[] parameterTypes = new Class<?>[0];
		Method computeMethod = compiledClass.getMethod("compute", parameterTypes);
		Object[] args = new Object[0];
		double result = (double) computeMethod.invoke(instance, args);
		return result;
	}

Simple Camel route in JBoss Fuse

For this tutorial and the following ones, I am going to use a Fuse instance on a Lubuntu Linux box. My IDE is Eclipse Luna and the build system is Apache Maven v. 3.5.2. Eclipse is actually running on Java 8 but it does not matter since I’m going to run all the builds from a terminal.

Let’s see how we can implement and deploy a simple Camel route in JBoss Fuse. We start by defining our requirements that, in this case, are going to be very trivial: we want to read messages from a queue, process them and the write them back to another queue. The processing step is going to be simple too: we will just log the message body.

First of all, we need two queues. Start Fuse and reach the console (see the previous post). In the menu bar, click on “ActiveMQ”. Then click on “+Create” on the submenu, select “queue” in the radio buttons group, type the appropriate name in the text box (I’m choosing “source-queue”) and finally click the “Create Queue” button (see pictures below). Repeat the process and create a “sink-queue” queue, it will be our destination queue. Later, we will be going to need to queue a message in the source queue through the web panel. In order to be able to do that, we have to set the username and password. In the web console, go to the drop down menu in the top right corner (where your username appears). Select “preferences”. Go to the “ActiveMQ” tab and insert your administrative username and password. The settings are automatically saved. Now that we have the infrastructure ready, let’s switch to the development side.

activemq-menu

create-queue

If you are going to develop projects for Fuse, Maven is the way to go. The default way you install new components in Fuse, is by telling it to get the artifacts from a Maven repository. For this tutorial I assume that you are quite familiar and comfortable with Maven and that you have a working installation of Maven on your system.

We can create the initial skeleton of the project starting from a Maven archetype. Open a Terminal and type:

mvn archetype:generate

choose the archetype named camel-archetype-blueprint (in my case is the number 449) and complete the process. Here is a dump of my terminal:

Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): 510: 449
Choose org.apache.camel.archetypes:camel-archetype-blueprint version:
1: 2.8.0
[...]
42: 2.14.1
Choose a number: 42: 42
Downloading: https://repo.maven.apache.org/maven2/org/apache/camel/archetypes/camel-archetype-blueprint/2.14.1/camel-archetype-blueprint-2.14.1.jar
Downloaded: https://repo.maven.apache.org/maven2/org/apache/camel/archetypes/camel-archetype-blueprint/2.14.1/camel-archetype-blueprint-2.14.1.jar (17 KB at 14.1 KB/sec)
Downloading: https://repo.maven.apache.org/maven2/org/apache/camel/archetypes/camel-archetype-blueprint/2.14.1/camel-archetype-blueprint-2.14.1.pom
Downloaded: https://repo.maven.apache.org/maven2/org/apache/camel/archetypes/camel-archetype-blueprint/2.14.1/camel-archetype-blueprint-2.14.1.pom (3 KB at 10.9 KB/sec)
Define value for property 'groupId': : org.merka
Define value for property 'artifactId': : fuse-demo-blueprint
Define value for property 'version':  1.0-SNAPSHOT: : 1.0
Define value for property 'package':  org.merka: : org.merka.demo
[INFO] Using property: camel-version = 2.14.1
[INFO] Using property: log4j-version = 1.2.17
[INFO] Using property: maven-bundle-plugin-version = 2.3.7
[INFO] Using property: maven-compiler-plugin-version = 2.5.1
[INFO] Using property: maven-resources-plugin-version = 2.6
[INFO] Using property: slf4j-version = 1.7.7
Confirm properties configuration:
groupId: org.merka
artifactId: fuse-demo-blueprint
version: 1.0
package: org.merka.demo
camel-version: 2.14.1
log4j-version: 1.2.17
maven-bundle-plugin-version: 2.3.7
maven-compiler-plugin-version: 2.5.1
maven-resources-plugin-version: 2.6
slf4j-version: 1.7.7
Y: : Y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: camel-archetype-blueprint:2.14.1
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: org.merka
[INFO] Parameter: artifactId, Value: fuse-demo-blueprint
[INFO] Parameter: version, Value: 1.0
[INFO] Parameter: package, Value: org.merka.demo
[INFO] Parameter: packageInPathFormat, Value: org/merka/demo
[INFO] Parameter: maven-bundle-plugin-version, Value: 2.3.7
[INFO] Parameter: maven-resources-plugin-version, Value: 2.6
[INFO] Parameter: groupId, Value: org.merka
[INFO] Parameter: maven-compiler-plugin-version, Value: 2.5.1
[INFO] Parameter: slf4j-version, Value: 1.7.7
[INFO] Parameter: version, Value: 1.0
[INFO] Parameter: log4j-version, Value: 1.2.17
[INFO] Parameter: camel-version, Value: 2.14.1
[INFO] Parameter: package, Value: org.merka.demo
[INFO] Parameter: artifactId, Value: fuse-demo-blueprint
[INFO] project created from Archetype in dir: /home/merka/workspace/fuse-demo-blueprint
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:20 min
[INFO] Finished at: 2014-12-23T20:29:24+01:00
[INFO] Final Memory: 13M/60M
[INFO] ------------------------------------------------------------------------

Now we can import the project in our IDE of choice and open it. Just to be sure that everything is OK, run a first build with the command mvn -DskipTests=true clean package from the shell and wait for the required dependencies to be downloaded.

The archetype project comes with a couple of “hello world” files that you can completely ignore at this point: they only serves as an example. It’s actually better if you delete all the hello-world-related stuff, unit test included.  Open the blueprint.xml file in the folder OSGI-INF/blueprint and comment out the existing route. You should also delete the “helloBean” definition. The file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<blueprint xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:camel="http://camel.apache.org/schema/blueprint"
       xsi:schemaLocation="
       http://www.osgi.org/xmlns/blueprint/v1.0.0 http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd
       http://camel.apache.org/schema/blueprint http://camel.apache.org/schema/blueprint/camel-blueprint.xsd">

  <camelContext id="blueprintContext" trace="false" xmlns="http://camel.apache.org/schema/blueprint">
<!--     <route id="timerToLog"> -->
<!--       <from uri="timer:foo?period=5000"/> -->
<!--       <setBody> -->
<!--           <method ref="helloBean" method="hello"/> -->
<!--       </setBody> -->
<!--       <log message="The message contains ${body}"/> -->
<!--       <to uri="mock:result"/> -->
<!--     </route> -->
  </camelContext>

</blueprint>

Before the camelContext element, add the following bean definition: it provides Camel with the credentials to access the ActiveMQ Broker:

	<!-- connect to the local ActiveMQ broker -->
	<bean id="activemq" class="org.apache.activemq.camel.component.ActiveMQComponent">
		<property name="brokerURL" value="tcp://localhost:61616" />
		<property name="userName" value="merka" />
		<property name="password" value="merka" />
	</bean>

Then, add the following snippet inside the camelContext element. It defines a route that satisfies our requirements:

    <route id="queue2queue">
        <from uri="activemq:source-queue"></from>
        <log message="the message body is: ${body}" loggingLevel="INFO"></log>
        <to uri="activemq:sink-queue"></to>
    </route>

It is quite readable. The from element defines a consumer endpoint: it will read messages from the queue named source-queue (that we have just created), as soon as they are available. The message body is logged, thanks to the log element, and then it is written to sink-queue.

Build the project with Maven (mvn clean install). This puts the artifact in the local Maven repository, thus making it available to Fuse. To install the artifact in Fuse, go to the Karaf prompt and type

osgi:install mvn:org.merka/fuse-demo-blueprint/1.0

With this command you tell Karaf to search for the artifact with the given groupId/artifactId/version. If found, it answers with the id of the installed osgi bundle:

JBossFuse:karaf@root> osgi:install mvn:org.merka/fuse-demo-blueprint/1.0
Bundle ID: 251

You can see all the installed components typing list. Start the bundle with the commnand start <bundle-id>. Check the log for possible errors with log:display or log:tail. If no errors are found, go to the web console and manually send a message into the source-queue queue. The message is immediately consumed from the queue, the message body appears in the log and a brand new message is sent to sink-queue.

 send-msg

Where do you come from?

Sometimes it’s hard to figure out where a Java class has been loaded from. In a typical Enterprise System, there could be tens of web applications, maybe contained in an enterprise portal, running on several kinds of web and application servers, clustered over hundreds of servers. The common library used at run time can contain thousands of jars. A Java web application can load classes from several sources (his private WEB-INF/lib, the web server common library, some custom shared library, etc…), each of which can be associated with a different ClassLoader. Then it is no surprise that, if you work for years in an environment like this, you come to hate violently the class loading mechanism of Java, and every NoClassDefFoundError is seen as a voodoo curse.

Here’s a trick that I usually use when I work with Eclipse or an IDE that derives from it (the infamous RAD, for example). You have to be able to debug your program in order to perform the trick on the fly.

Place a breakpoint in an appropriate position (for example, just after some “new” instruction of an object whose class you want to know about). Then open the “Display” view. To do so, go to Windows ->Show view -> Other… . Then select the “Display” view (see figures below).

Cattura

Cattura2

The “Display” view lets you input Java instructions that are interpreted interactively and affect the execution of the program. Now you can discover where a certain class definition is taken from. Just type something like

objectIWantToKnowAbout.getClass().getProtectionDomain().getCodeSource().getLocation()

ctrl+a (select all) and ctrl+shift+d (Display), and you will get the output of the invokation:

(java.net.URL) file:/C:/Users/currentUser/workspace/myProject/bin/

This always gives you the exact location (i.e. the folder, as in this example, or the jar file) from which the class definition of the object has been loaded. You can also discover a bunch of fancy information about your object, such as the ClassLoader that has been used to load the class.

It might look like a very basic thing that every Java developer should know, but it is not. In fact, I cannot recall all the times that this activity saved me a lot of troubles and made me look like a smartass.

If you cannot debug the application that gives you problems then, well, i guess your only option is to insert some smart debug trace and wait for the next production release.