ANTLR4 project with Maven – Tutorial (episode 1)

Introduction

I’ve always been fascinated by Language Theory and the related technologies. Since I have been prevalently a Java guy, I used to use Javacc and JJTree to build parsers and interpreters. Nowadays it seems that the big name in the field of language recognition is ANTLR. I have wanted to learn more about ANTLR for a long time and lately I finally had the opportunity to spend some time on it. I thought it would have been a good idea to share the sample projects I created in the process.

I plan to write at least three parts:

  1. Setup of the project with all the basic pieces working.
  2. Implementation of a visitor.
  3. Grammar refinement to include self-embedding and implementation of a second visitor.

Through this series I will design a language to give the specification of the position of some geometrical shapes that will be used later to add shapes to the gravity simulator 3D scene (at least this is the idea).

The whole source code is available for download at https://github.com/Rospaccio/learnantlr. The project contains some tags that are related to the various episodes of the tutorial (unfortunately not always with the corresponding sequence number, but I will make sure to reference the right tag for each episode).

Disclaimer: this is not a comprehensive guide to ANTLR and I am not an expert in the field of Language Theory nor in ANTLR. It’s just a sharing of my (self) educational process. The focus is almost entirely on the setup of a build process through Maven, not on the internals of ANTLR itself nor the best practices to design a grammar (though I will occasionally slip on that topics).

Project setup (Git tag: v0.1)

Outline:

  1. first version of the language;
  2. basic pom.xml;
  3. specification of the grammar;
  4. first build.

First version of the language

the first version of the language is going to be very trivial and it is supposed to be just a pretext to show how a possible pom looks like. We want to be able to recognize strings like the following:

cube 0 0 0
sphere 12 2 3
cube 1 1 1
cube 4 3 10
<etc...>

where the initial keyword (“cube” or “sphere”) specifies the nature of the shape and the following three number specify the coordinates of the shape in a three dimensional space.

Basic POM

ANTLR has a very good integration with Maven: every necessary compile dependency is available from the central repository. Plus, there’s a super handful plugin that invokes the antlr processor, thus it is possible to define and tune the entire build process through the pom.xml file. But enough of these words, let’s vomit some code.

You can start the project as a default empty Maven project with jar packaging. First, you need to add the ANTLR dependency in your pom.xml file. Here’s the fragment:

<properties>
	<antlr4.plugin.version>4.5</antlr4.plugin.version>
	<antlr4.version>4.5</antlr4.version>
</properties>
<dependencies>
	<dependency>
		<groupId>org.antlr</groupId>
		<artifactId>antlr4-runtime</artifactId>
		<version>${antlr4.version}</version>
	</dependency>

	<dependency>
		<groupId>org.antlr</groupId>
		<artifactId>antlr4-maven-plugin</artifactId>
		<version>${antlr4.plugin.version}</version>
	</dependency>
</dependencies>

Here I am using version 4.5, which is the latest available at the time I am writing, because it supports Javascript as a target language, a feature that I am going to use later in the tutorial.

The first dependency, antlr4-runtime, as the name suggests, is the runtime support for the code generated by ANTLR (basically it’s what you need to compile the generated code and execute it). It contains the base types and classes used by the generated parsers.

The second, antlr4-maven-plugin, is the plugin that can be used in the “generate-sources” phase of the build. To actually use it, the following fragment is also needed:

<build>
	<plugins>
		<plugin>
			<groupId>org.antlr</groupId>
			<artifactId>antlr4-maven-plugin</artifactId>
			<version>${antlr4.plugin.version}</version>
			<configuration>
				<arguments>
					<argument>-visitor</argument>
					<!-- <argument>-Dlanguage=JavaScript</argument> -->
				</arguments>
			</configuration>
			<executions>
				<execution>
					<goals>
						<goal>antlr4</goal>
					</goals>
				</execution>
			</executions>
		</plugin>
	</plugins>
</build>

Note that you can pass argument to the ant command: I user the -visitor option because it generates a handful interface that you can implement in a Visitor class for the parse tree.

Specification of the grammar

In order to have something that makes sense, let’s add a grammar file in the appropriate folder. Create the file (in this case ShapePlacer.g4) inside src/main/antlr4. Make also sure to build a folder structure that mimic the package structure that you want for the generated classes. For example, if you place the grammar file inside src/main/antlr4/org/my/package, the generated classes will belong to the package with name org.my.package.

Here’s our first grammar:

grammar ShapePlacer;
program : (shapeDefinition)+ ;
shapeDefinition : sphereDefinition | cubeDefinition ;
sphereDefinition : SPHERE_KEYWORD coordinates ;
cubeDefinition : CUBE_KEYWORD coordinates ;
coordinates : NUMBER NUMBER NUMBER ;
SPHERE_KEYWORD : 'sphere' ;
CUBE_KEYWORD : 'cube' ;
NUMBER : [1-9]+ ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines ;

Not a very interesting language, but we must only try and see if the build works.

First build

In order to do that, type mvn clean package in a terminal window and see what happens.

What’s happend

During the generate-sources phase of the Maven build (i.e. certainly before compile), the ANTLR plugin is activated and its default goal (“antlr4”) is called. It invokes the antlr4 processor on the grammar file (by default, it looks recursively inside src/main/antlr4 and compiles every .g4 files it finds). If the goal executes with no error, the generated source files are placed in target/generated-sources/antlr4, and they are automatically taken into account for the compile phase.

As we do not have any manually-written source file yet, only the generated files are compiled and included in the jar.

Test

Let’s try and test the parser. To do that we can add a JUnit test case with a test that looks like the following (please browse the source code to find out more about details like the TestErrorListener class):

@Test
	public void testExploratoryString() throws IOException {

		String simplestProgram = "sphere 12 12 12 cube 2 3 4 cube 4 4 4 sphere 3 3 3"

		CharStream inputCharStream = new ANTLRInputStream(new StringReader(simplestProgram));
		TokenSource tokenSource = new ShapePlacerLexer(inputCharStream);
		TokenStream inputTokenStream = new CommonTokenStream(tokenSource);
		ShapePlacerParser parser = new ShapePlacerParser(inputTokenStream);

		parser.addErrorListener(new TestErrorListener());

		ProgramContext context = parser.program();

		logger.info(context.toString());
	}

Lubuntu on VMWare Tutorial – “Behind a Proxy” Edition

A Lubuntu virtual machine on VMWare Player is currently one of my standard tools for developing and testing software. I had a Hard Time (c) the first time I installed and fully configured such an instance because I was (suspense, suspense…) behind a proxy! The need for this setup has been triggered by the fact that I like my VMWare machines to be able to resize the desktop as I resize the VMWare window that contains them. That feature is available in most Linux distros only if you install the VMWare-tools package on the guest machine. That, in turn, requires the build-essential package (gcc, make, and the like).

Here’s the outline:

  1. set the system-wide proxy;
  2. tell apt to use the proxy (’cause no, it won’t use the fucking system proxy);
  3. update and upgrade apt;
  4. install build-essential with apt-install;
  5. install VMWare-tools.

Now that I read it, it seems like an easy thing to do, but it took me some time to have all the pieces set up, so I guess it’s worth to write a brief tutorial to share what I have learned. Of course, I did not find out all these things by myself: this a just a summary of the knowledge that I gathered from the Internet (thank you, Internet).

1. Set the system-wide proxy.

Interestingly, Lubuntu does not have a nice GUI to let you set the proxy, so you have to do it some other way. For me, the best way is to just set the appropriate variable inside the /etc/environment file, so that it is shared across all users. Since it is going to be a development box I do not care about which user logs in at all. I just want the variable available and I want it fast.

to do that, open the aforementioned file with:

sudo nano /etc/environment

or with your favourite text editor (I am sorry if it’s vim), and make sure that the following lines are added:

http_proxy=http://101.101.101.101:3456
https_proxy=http://101.101.101.101:3456
ftp_proxy=http://101.101.101.101:3456
no_proxy="localhost,127.0.0.1"

(As you might imagine, you have to replace the fake IP addresses and ports with your proxy’s ones). To make the change effective you must log out and log in again.

2. Tell apt to use the proxy.

Because otherwise it won’t. Open the file /etc/apt/apt.conf for edit. If it does not exist, create it. Add the following lines:

acquire::http::proxy "http://101.101.101.101:3456"
acquire::https::proxy "https://101.101.101.101:3456"
acquire::socks::proxy "socks://101.101.101.101:3456"
acquire::ftp::proxy "ftp://101.101.101.101:3456"

again, replace the “101” and “3456” placeholders with your actual addresses and ports, and save it.

3. update and upgrade apt.

Run the following command to make apt up to date:

sudo apt-get update
sudo apt-get upgrade

4. Install build-essential with apt-install.

Now you are ready to use apt commands behind your proxy. Type

sudo apt-get install build-essential

to install the necessary build tools.

5. Install VMWare-tools

Now that you have all the prerequisites, you are ready to install the VMWare-tools. Select the menu item as in the picture below, follow the instructions that VMWare Player prompts to you, and you should be fine.

player-tools

Stress reducing tools and methodologies

A brief list of things that made me a better developer and a less anxious person.

Test Driven Development: Even if nobody in your team is doing TDD and your manager thinks it is just building a servlet for each back end Web Service, you can start applying TDD today and become a better developer. Of all the good natural consequences of TDD, what I like most is its stress reducing effect. If I feel afraid that something might be broken or it might fail in production, I just add more and more test cases, until every edge case is covered. I no longer wake up in the middle of the night worried of what could happen tomorrow when the servers will be restarted. Everything can still go wrong like it used to, but your level of confidence in your code and your reaction speed are greatly improved. You just feel better. And a fix is usually much easier to implement than without tests. Let alone the fact that stupid bugs will actually appear far less frequently than before…

Git: I hated the beast and avoided it like hell until I found an illuminating video on Youtube where a clever guy managed to explain with great clearness how Git actually works and how you can use it effectively. That has been a turning point. I realized that I was unable to use it because of lack of understanding. And once you see what branching and merging really means, you feel powerful, and a whole new world of possibilities unfolds before your eyes. It’s like living in three dimensions after having spent your whole life in Flatland. As of TDD, you do not have to wait until your company understands that leaving SVN and transitioning to Git is the right thing to do (Here I don’t even want to take into consideration SourceSafe, ClearCase or other hideous abominations): you can start using it today. Just “git init” a repository inside the root directory of a project; it does not matter if it’s under SVN source control, if you “gitignore” the right things the two do not interfere which each other. And your are ready to go. Now I wonder how could I have lived so long without Git.

Maven: you can say that it is verbose, it is slow, it eats up a lot of disk space, it’s ugly… I don’t care. I have personally seen what a build system without proper dependency management could be and what can cost in terms of time, money and stress: it’s not even a build system, it’s a destruction system. Maven is currently my default. There is only one thing that pisses me off more than a project not using Maven: one that uses it badly. If a person is not able to checkout a project and run a clean build at his first try, you are doing something wrong.

Sonarqube: A Wonderful free tool that helps you improve your code. It’ a bunch of tools that perform static analysis of code, integrated in a web application that keeps track of how the various parameters of the projects evolve from build to build. You can always learn something from the issues detected by Sonarqube and their relative descriptions. And it feels good to see how the color of a project shifts from red, to yellow, to green as you become a better programmer.

Virtual Machines: This is incredibly important, fundamental, if you happen to work in a hybrid environment. A usual situation for me is having a development machine running Windows and a deployment environment (for test, UAT, production, etc…) completely based on Linux. This is not so strange if you work with JavaEE: most applications and systems actually behave in the same way in Windows and Linux… almost… That is why you always want to give them a spin in a Linux box, before releasing it. After trying almost every virtualization software, my current choice is VMWare Player + Lubuntu. The first is available free of charge for non commercial use and works surprisingly well, the second is a lightweight Linux distro based on Ubuntu that gets rid of the super ugly desktop environment of Canonical and replaces it with LXDE, which requires few resources and performs well in virtual machines and older computers.

Fear Driven Development – Enterprise Induced Worst Practices (part 0)

The Internationalization Antipattern

Some years ago I was still working for Big Time Consulting but I was not even a proper employee. I was a contractor from a company owned by BTC. Well, you figure out the real names. I was sort of an in shore indian developer. We had this huge system built upon Liferay. The system was composed of hundreds of portlets, scattered across tens of WARs. The portal was heavily customized. The language strings for every portlet were all defined in a single Language.properties file at the portal level. That’s right: WARs did not have their own Language file: everything was defined centrally. That meant that if you needed to change the label of a button, you had to modify the portal Language file, build an artifact that belonged to the architectural sector of the system (i.e. it impacted ALL the portlets) and then, once deployed, restart the servers involved in the process.

Nowhere along this path there was an automated test.

As you might imagine, quite often things went wrong. The less severe issue that you could get was the total replacement of the Language strings with their corresponding keys (that was the default behavior in that version of Liferay: if the string was not found, it was simply set to its key). So, after the reboot, everything on every page looked something like “add.user.button.text”, “customer.list.title”, “user.not.found.error.message” and so on. Everywhere. In every single application. The default reaction in the team was “The Strings have been fucked up. Again.”

On the extreme end of the spectrum there was a funny set of poltergeist bugs. Mysterious NoClassDefFoundError, ClassCastException, Liferay showing a bare white page, Liferay showing every graphical component in the wrong place, portlets not deploying, etc…

After being forced to spend a couple of looong evenings to fix this issue (Did I mention that the entire process of compiling, packaging and deploying was excruciatingly long?) I learned my lesson: never mess with the strings again. I decided to apply my personal internationalization antipattern: always include a default value for every string with

LanguageUtil.get(Locale locale, String key, String defaultValue)

and don’t even try to package the architectural artifact (AA, from now on). Just modify and commit the Language file. Then deploy the WAR: the next day the strings magically appear on the screen, and nobody will ever notice that they are hardcoded. Wait until the next release cycle of the AA to have the strings file available. Luckily you won’t be the one needing to deploy it so, if something goes wrong, you can blame someone else and save your evenings.