ANTLR4 project with Maven – Tutorial (episode 1)

Introduction

I’ve always been fascinated by Language Theory and the related technologies. Since I have been prevalently a Java guy, I used to use Javacc and JJTree to build parsers and interpreters. Nowadays it seems that the big name in the field of language recognition is ANTLR. I have wanted to learn more about ANTLR for a long time and lately I finally had the opportunity to spend some time on it. I thought it would have been a good idea to share the sample projects I created in the process.

I plan to write at least three parts:

  1. Setup of the project with all the basic pieces working.
  2. Implementation of a visitor.
  3. Grammar refinement to include self-embedding and implementation of a second visitor.

Through this series I will design a language to give the specification of the position of some geometrical shapes that will be used later to add shapes to the gravity simulator 3D scene (at least this is the idea).

The whole source code is available for download at https://github.com/Rospaccio/learnantlr. The project contains some tags that are related to the various episodes of the tutorial (unfortunately not always with the corresponding sequence number, but I will make sure to reference the right tag for each episode).

Disclaimer: this is not a comprehensive guide to ANTLR and I am not an expert in the field of Language Theory nor in ANTLR. It’s just a sharing of my (self) educational process. The focus is almost entirely on the setup of a build process through Maven, not on the internals of ANTLR itself nor the best practices to design a grammar (though I will occasionally slip on that topics).

Project setup (Git tag: v0.1)

Outline:

  1. first version of the language;
  2. basic pom.xml;
  3. specification of the grammar;
  4. first build.

First version of the language

the first version of the language is going to be very trivial and it is supposed to be just a pretext to show how a possible pom looks like. We want to be able to recognize strings like the following:

cube 0 0 0
sphere 12 2 3
cube 1 1 1
cube 4 3 10
<etc...>

where the initial keyword (“cube” or “sphere”) specifies the nature of the shape and the following three number specify the coordinates of the shape in a three dimensional space.

Basic POM

ANTLR has a very good integration with Maven: every necessary compile dependency is available from the central repository. Plus, there’s a super handful plugin that invokes the antlr processor, thus it is possible to define and tune the entire build process through the pom.xml file. But enough of these words, let’s vomit some code.

You can start the project as a default empty Maven project with jar packaging. First, you need to add the ANTLR dependency in your pom.xml file. Here’s the fragment:

<properties>
	<antlr4.plugin.version>4.5</antlr4.plugin.version>
	<antlr4.version>4.5</antlr4.version>
</properties>
<dependencies>
	<dependency>
		<groupId>org.antlr</groupId>
		<artifactId>antlr4-runtime</artifactId>
		<version>${antlr4.version}</version>
	</dependency>

	<dependency>
		<groupId>org.antlr</groupId>
		<artifactId>antlr4-maven-plugin</artifactId>
		<version>${antlr4.plugin.version}</version>
	</dependency>
</dependencies>

Here I am using version 4.5, which is the latest available at the time I am writing, because it supports Javascript as a target language, a feature that I am going to use later in the tutorial.

The first dependency, antlr4-runtime, as the name suggests, is the runtime support for the code generated by ANTLR (basically it’s what you need to compile the generated code and execute it). It contains the base types and classes used by the generated parsers.

The second, antlr4-maven-plugin, is the plugin that can be used in the “generate-sources” phase of the build. To actually use it, the following fragment is also needed:

<build>
	<plugins>
		<plugin>
			<groupId>org.antlr</groupId>
			<artifactId>antlr4-maven-plugin</artifactId>
			<version>${antlr4.plugin.version}</version>
			<configuration>
				<arguments>
					<argument>-visitor</argument>
					<!-- <argument>-Dlanguage=JavaScript</argument> -->
				</arguments>
			</configuration>
			<executions>
				<execution>
					<goals>
						<goal>antlr4</goal>
					</goals>
				</execution>
			</executions>
		</plugin>
	</plugins>
</build>

Note that you can pass argument to the ant command: I user the -visitor option because it generates a handful interface that you can implement in a Visitor class for the parse tree.

Specification of the grammar

In order to have something that makes sense, let’s add a grammar file in the appropriate folder. Create the file (in this case ShapePlacer.g4) inside src/main/antlr4. Make also sure to build a folder structure that mimic the package structure that you want for the generated classes. For example, if you place the grammar file inside src/main/antlr4/org/my/package, the generated classes will belong to the package with name org.my.package.

Here’s our first grammar:

grammar ShapePlacer;
program : (shapeDefinition)+ ;
shapeDefinition : sphereDefinition | cubeDefinition ;
sphereDefinition : SPHERE_KEYWORD coordinates ;
cubeDefinition : CUBE_KEYWORD coordinates ;
coordinates : NUMBER NUMBER NUMBER ;
SPHERE_KEYWORD : 'sphere' ;
CUBE_KEYWORD : 'cube' ;
NUMBER : [1-9]+ ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines ;

Not a very interesting language, but we must only try and see if the build works.

First build

In order to do that, type mvn clean package in a terminal window and see what happens.

What’s happend

During the generate-sources phase of the Maven build (i.e. certainly before compile), the ANTLR plugin is activated and its default goal (“antlr4”) is called. It invokes the antlr4 processor on the grammar file (by default, it looks recursively inside src/main/antlr4 and compiles every .g4 files it finds). If the goal executes with no error, the generated source files are placed in target/generated-sources/antlr4, and they are automatically taken into account for the compile phase.

As we do not have any manually-written source file yet, only the generated files are compiled and included in the jar.

Test

Let’s try and test the parser. To do that we can add a JUnit test case with a test that looks like the following (please browse the source code to find out more about details like the TestErrorListener class):

@Test
	public void testExploratoryString() throws IOException {

		String simplestProgram = "sphere 12 12 12 cube 2 3 4 cube 4 4 4 sphere 3 3 3"

		CharStream inputCharStream = new ANTLRInputStream(new StringReader(simplestProgram));
		TokenSource tokenSource = new ShapePlacerLexer(inputCharStream);
		TokenStream inputTokenStream = new CommonTokenStream(tokenSource);
		ShapePlacerParser parser = new ShapePlacerParser(inputTokenStream);

		parser.addErrorListener(new TestErrorListener());

		ProgramContext context = parser.program();

		logger.info(context.toString());
	}

4 thoughts on “ANTLR4 project with Maven – Tutorial (episode 1)

  1. Pingback: ANTLR4 project with Maven – Tutorial (episode 3) | CodeVomit

  2. i am getting errors: line 1:16 token recognition error at: ‘cub ‘
    line 1:20 extraneous input ‘2’ expecting {, ‘sphere’, ‘cube’, ‘region’}

Leave a comment