ANTLR4 project with Maven – Tutorial (episode 2)

[The reference tag for this episode is step-0.2.]

At the end of the previous episode we have been able to feed sentences to the parser and find out if they are valid (i.e. belong to the language) or not. In this post I will show you how you can implement a visitor to interpret the language.

At the end of every successful parsing operation, the parser produces a concrete syntax tree. The function parse<StartSymbol> of the parser returns an object of type <StartSymbol>Context, which represents the root node of the concrete syntax tree (it is a structure that follows the classic Composite pattern). In our case, take a look at the testJsonVisitor test method (forget about the “Json” part of the name, the method is named like this by mistake):


 @Test
 public void testJsonVisitor() throws IOException{
    String program = "sphere 0 0 0 cube 5 5 5 sphere 10 1 3";
    TestErrorListener errorListener = new TestErrorListener(); 
    ProgramContext context = parseProgram(program, errorListener);
 
    assertFalse(errorListener.isFail());
 
    BasicDumpVisitor visitor = new BasicDumpVisitor();
 
    String jsonRepresentation = context.accept(visitor);
    logger.info("String returned by the visitor = " + jsonRepresentation);

...
 
 }

After parsing the string, the test method instantiates a visitor object (BasicDumpVisitor) and provides it with the ProgramContext object as the input.

Let’s take a closer look at the visitor. If you use the -visitor option when calling the preprocessor, ANTLR, alongside with the parser, generates for you a basic interface for a visitor that can walk the concrete syntax tree. All you have to do is implementing that interface and make sure that the tree nodes are visited in the right order.

I created the BasicDumpVisitor class, the simplest visitor that I could think of: it walks the tree and creates a string (also known as “a program”) that, once parsed, gives back the visited tree. In other words it just dumps the original program that created the current contrete syntax tree.

The base visitor interface is declared as follows:


public interface ShapePlacerVisitor<T> extends ParseTreeVisitor<T>

The name’s prefix (ShapePlacer) is taken from the name of the grammar, as defined in the grammar source file. The interface contains a “visit” method for each node type of the parse tree, as expected. Moreover, it has a bunch of extra methods inherited by the base interface ParseTreeVisitor: see the source code to get an idea, they are quite self-explanatory. I report here one of the “visit” methods as an example. The other ones in the class follow a similar logic:


	public String visitShapeDefinition(ShapeDefinitionContext ctx) {
		StringBuilder builder = new StringBuilder();
		for (ParseTree tree : ctx.children){
			builder.append(tree.accept(this) + " ");
		}
		builder.append("\n");
		return builder.toString();
	} 

The interface is parametric: when you implement it, you have to specify the actual type: that will be the return type of each “visit” method.

So far, I have always written about “concrete syntax tree”. When we deal with interpreted languages, we usually want to manipulate an Abstract Syntax Tree (AST), that is, a tree structure that omits every syntactic detail that is not useful to the interpreter and can be easily inferred by the structure of the tree itself. In the case of our language, if we have, say, the string “sphere 1 1 1”, the parser creates for us a tree that looks like this:

  • program
    • shapeDefinition
      • sphereDefinition
        • SPHERE_KEYWORD
        • coordinates
          • NUMBER [“1”]
          • NUMBER [“1”]
          • NUMBER [“1”]

That is not ideal since, when it comes down to interpretation, we may want to work with something that looks like this:

  • sphereDefinition
    • coordinates
      • 1
      • 1
      • 1

or, depending on your needs, something even simpler, for example:

  • sphere definition
    • (1, 1, 1)

Unfortunately ANTLR 4, unlike its previous versions, does not allow for tree rewriting in the grammar file. As far as I know, this is a precise design decision. So, if you want to work with an AST of your invention, you have to build it yourself, i. e. you have to write the tree node classes and the code that traverses the concrete syntax tree and builds the AST. Hopefully, I will cover this case in the next episode.

One thought on “ANTLR4 project with Maven – Tutorial (episode 2)

  1. Pingback: ANTLR4 project with Maven – Tutorial (episode 3) | CodeVomit

Leave a comment