portrait

End of Line blog

Thoughts on software development, by Adam Ruka

Graal Truffle tutorial part 5 – global variables

This article is part of a tutorial on GraalVM's Truffle language implementation framework.


At the end of part 4 of the series, a program in our EasyScript language consisted entirely of a single expression, built from addition and numeric literals, like 1 + 2 + 3.0. In this part of the series, we will move one step closer towards making EasyScript a real language by adding support for variables. This will require changing EasyScript programs from a single expression to a list of statements.

Our goal is to be able to execute the following JavaScript program:

var a = 0;
let b;
const c = 2.0;
b = 1;
a + b + c

Since we still don’t have a way to print anything to the screen in our language, as we don’t have function calls yet, we will say that executing a program like that returns the result of executing the last statement – in the above case, that would be 3.0.

Implementing variables will require quite a large amount of changes – our grammar will need expanding, we will learn about implementing values used in GraalVM polyglot bindings, and add support for the JavaScript undefined concept to EasyScript. Let’s dive right in, because there’s a lot to do!

Note: JavaScript supports a concept called hoisting, which means it’s legal to use a variable before it’s defined, in code like:

var a = b;
var b = 3;

However, this feature is confusing, and considered more of a historical accident than anything else; given it complicates the implementation considerably, and I can’t imagine anyone wanting to create a new language that includes it, we will skip supporting it in EasyScript.

Grammar

Our language’s grammar will need a few more elements – most importantly, we will have to introduce the concept of ‘statements’ to it. It will also need a new type of expression, as assignment is an expression in JavaScript.

Our ANTLR grammar looks as follows:

grammar EasyScript ;

@header{
package com.endoflineblog.truffle.part_05;
}

start : stmt+ EOF ;

stmt : kind=('var' | 'let' | 'const') binding (',' binding)* ';'?     #DeclStmt
     |                                                 expr1 ';'?     #ExprStmt
     ;
binding : ID ('=' expr1)? ;

expr1 : ID '=' expr1               #AssignmentExpr1
      | expr2                      #PrecedenceTwoExpr1
      ;
expr2 : left=expr2 '+' right=expr3 #AddExpr2
      | expr3                      #PrecedenceThreeExpr2
      ;
expr3 : literal                    #LiteralExpr3
      | ID                         #ReferenceExpr3
      | '(' expr1 ')'              #PrecedenceOneExpr3
      ;

literal : INT | DOUBLE | 'undefined' ;

fragment DIGIT : [0-9] ;
INT : DIGIT+ ;
DOUBLE : DIGIT+ '.' DIGIT+ ;

fragment LETTER : [a-zA-Z$_] ;
ID : LETTER (LETTER | DIGIT)* ;

// skip all whitespace
WS : (' ' | '\r' | '\t' | '\n' | '\f')+ -> skip ;

A few notes about the grammar:

Parsing

Similarly like we did in the previous article, we introduce a class that performs parsing by first invoking ANTLR, and then translating the received parse tree to the Truffle AST nodes:

public enum DeclarationKind {
    VAR, LET, CONST;

    public static DeclarationKind fromToken(String token) {
        switch (token) {
            case "var": return DeclarationKind.VAR;
            case "let": return DeclarationKind.LET;
            case "const": return DeclarationKind.CONST;
            default: throw new EasyScriptException("Unrecognized variable kind: '" + token + "'");
        }
    }
}

public final class EasyScriptTruffleParser {
    public static List<EasyScriptStmtNode> parse(Reader program) throws IOException {
        var lexer = new EasyScriptLexer(CharStreams.fromReader(program));
        // remove the default console error listener
        lexer.removeErrorListeners();
        var parser = new EasyScriptParser(new CommonTokenStream(lexer));
        // remove the default console error listener
        parser.removeErrorListeners();
        // throw an exception when a parsing error is encountered
        parser.setErrorHandler(new BailErrorStrategy());
        return parseStmtList(parser.start().stmt());
    }

    private static List<EasyScriptStmtNode> parseStmtList(List<EasyScriptParser.StmtContext> stmts) {
        return stmts.stream()
                .flatMap(stmt -> stmt instanceof EasyScriptParser.ExprStmtContext
                        ? Stream.of(parseExprStmt((EasyScriptParser.ExprStmtContext) stmt))
                        : parseDeclStmt((EasyScriptParser.DeclStmtContext) stmt))
                .collect(Collectors.toList());
    }

    private static ExprStmtNode parseExprStmt(EasyScriptParser.ExprStmtContext exprStmt) {
        return new ExprStmtNode(parseExpr1(exprStmt.expr1()));
    }

    private static Stream<EasyScriptStmtNode> parseDeclStmt(EasyScriptParser.DeclStmtContext declStmt) {
        DeclarationKind declarationKind = DeclarationKind.fromToken(declStmt.kind.getText());
        return declStmt.binding()
                .stream()
                .map(binding -> {
                    String variableId = binding.ID().getText();
                    var bindingExpr = binding.expr1();
                    EasyScriptExprNode initializerExpr;
                    if (bindingExpr == null) {
                        if (declarationKind == DeclarationKind.CONST) {
                            throw new EasyScriptException("Missing initializer in const declaration '" + variableId + "'");
                        }
                        initializerExpr = new UndefinedLiteralExprNode();
                    } else {
                        initializerExpr = parseExpr1(bindingExpr);
                    }
                    return GlobalVarDeclStmtNodeGen.create(initializerExpr, variableId, declarationKind);
                });
    }

    private static EasyScriptExprNode parseExpr1(EasyScriptParser.Expr1Context expr1) {
        // the parts dealing with expressions omitted for brevity...

We introduce an enum that represents each kind of variable in JavaScript (var / let / const). When we encounter a variable declaration statement, we return a Stream of Truffle AST Nodes, to handle a single declaration containing multiple variables – we transform code like let a, b; to the equivalent let a; let b;. When a variable declaration does not have an initializer, we create it with the undefined literal as the initializer (except if it’s a const, in which case we error out).

TruffleLanguage

Our implementation of TruffleLanguage will need to store the global variables somewhere. The storage itself will be pretty simple – we create a new class, GlobalScopeObject, that saves the variables in a private Map<String, Object> field, and exposes an API for creating, updating and reading the variables (which corresponds to declarations, assignment expressions, and reference expressions, respectively):

public final class GlobalScopeObject {
    private final Map<String, Object> variables = new HashMap<>();
    private final Set<String> constants = new HashSet<>();

    public boolean newVariable(String name, Object value, boolean isConst) {
        Object existingValue = this.variables.put(name, value);
        if (isConst) {
            this.constants.add(name);
        }
        return existingValue == null;
    }

    public boolean updateVariable(String name, Object value) {
        if (this.constants.contains(name)) {
            throw new EasyScriptException("Assignment to constant variable '" + name + "'");
        }
        Object existingValue = this.variables.computeIfPresent(name, (k, v) -> value);
        return existingValue != null;
    }

    public Object getVariable(String name) {
        return this.variables.get(name);
    }
}

The interesting question is: how do we surface this GlobalScopeObject instance to the AST Nodes that will read and write to it?

One way could be to store this GlobalScopeObject instance in the TruffleLanguage instance itself. Another would be to use the Context type parameter of TruffleLanguage, which we have not used up to this point, leaving it as Void. Because of how GraalVM language interoperability works (which we discuss in detail below), the latter option is preferable. So, we introduce an EasyScriptLanguageContext class that contains GlobalScopeObject as a public final field. We return an instance of this class from the createContext() method in our TruffleLanguage class:

public final class EasyScriptLanguageContext {
    public final GlobalScopeObject globalScopeObject;

    public EasyScriptLanguageContext() {
        this.globalScopeObject = new GlobalScopeObject();
    }
}

@TruffleLanguage.Registration(id = "ezs", name = "EasyScript")
public final class EasyScriptTruffleLanguage extends
        TruffleLanguage<EasyScriptLanguageContext> {
    @Override
    protected CallTarget parse(ParsingRequest request) throws Exception {
        List<EasyScriptStmtNode> stmts = EasyScriptTruffleParser.parse(request.getSource().getReader());
        var rootNode = new EasyScriptRootNode(this, stmts);
        return rootNode.getCallTarget();
    }

    @Override
    protected EasyScriptLanguageContext createContext(Env env) {
        return new EasyScriptLanguageContext();
    }
}

Truffle AST Nodes

Let’s now see how do the Truffle AST Nodes look for these variable operations.

Code organization

Before we show the code of the actual Nodes, a quick note about organizing the code of your language implementation.

We will need to introduce a new kind of Nodes representing Statements, and also a few new types of expression Nodes. While up to this point, we simply kept the Nodes in the same Java package as the rest of the classes in the language implementation, this would become very messy with all of these new Node classes.

I would like to show you a way of organizing your language implementation code into Java packages that makes it very easy to tell where everything is, and which is quickly becoming a standard in the Truffle world:

basepackage
     |--- TruffleLanguage class
     |--- parser class
     |--- TypeSystem class
     |--- ...
     |--- basepackage.runtime
               |--- Undefined class
               |--- other runtime classes...
     |--- basepackage.nodes
               |--- RootNode class
               |--- basepackage.nodes.exprs
                         |--- expression Node classes...
               |--- basepackage.nodes.stmts
                         |--- statement Node classes...

This layout keeps everything organized and easy to find, and also scales nicely when you start supporting built-in functions (which we will get to soon, I promise!).

Statement Nodes

We will introduce a new abstract base class that all statements will extend:

public abstract class EasyScriptStmtNode extends Node {
    public abstract Object executeStatement(VirtualFrame frame);
}

It only has a single execute*() method, unlike the expression Nodes. To make sure we don’t confuse it with the execute*() methods from the expression Nodes, we call it executeStatement(), but, of course, the name can be anything you want (as long as it starts with the word “execute”).

The simplest kind of statement is the expression statement:

public final class ExprStmtNode extends EasyScriptStmtNode {
    @SuppressWarnings("FieldMayBeFinal")
    @Child
    private EasyScriptExprNode expr;

    public ExprStmtNode(EasyScriptExprNode expr) {
        this.expr = expr;
    }

    @Override
    public Object executeStatement(VirtualFrame frame) {
        return this.expr.executeGeneric(frame);
    }
}

It simply returns the result of executing the expression it wraps.

The second kind of statement is the variable declaration statement:

import com.oracle.truffle.api.dsl.NodeChild;
import com.oracle.truffle.api.dsl.NodeField;
import com.oracle.truffle.api.dsl.Specialization;

@NodeChild(value = "initializerExpr", type = EasyScriptExprNode.class)
@NodeField(name = "name", type = String.class)
@NodeField(name = "declarationKind", type = DeclarationKind.class)
public abstract class GlobalVarDeclStmtNode extends EasyScriptStmtNode {
    protected abstract String getName();
    protected abstract DeclarationKind getDeclarationKind();

    @Specialization
    protected Object createVariable(Object value) {
        String variableId = this.getName();
        boolean isConst = this.getDeclarationKind() == DeclarationKind.CONST;
        if (!this.currentLanguageContext().globalScopeObject.newVariable(variableId, value, isConst)) {
            throw new EasyScriptException(this, "Identifier '" + variableId + "' has already been declared");
        }
        // we return 'undefined' for statements that declare variables
        return Undefined.INSTANCE;
    }
}

This class uses the Truffle DSL that we learned about in part 3, but utilizes a few things we haven’t seen before.

The first is the @NodeField annotation. It’s similar to @NodeChild – it allows us to tell the DSL that the generated Node class should have a field with the given name and type. The difference between @NodeField and @NodeChild is that @NodeField does not result in the generated field being annotated with @Child, and so it can be marked final.

The fields will be populated by getting their value from the generated create() static factory method. The parameters for @NodeFields will be added to the create() method after all of the @NodeChild parameters – so, in our case, create() will have the signature:

public final class GlobalVarDeclStmtNodeGen extends GlobalVarDeclStmtNode {
    public static GlobalVarDeclStmtNode create(
            EasyScriptExprNode initializerExpr,
            String name, DeclarationKind declarationKind) {
        // ...

To use the field in the abstract superclass of the generated class, we declare an abstract getter for it, like we do here with getName() and getDeclarationKind(). The DSL will implement these methods in the generated subclass. In fact, the same ‘getter’ trick works for @NodeChild fields as well – we could use it here if we declared an abstract getInitializerExpr() method.

The second new element used here is the currentLanguageContext() method. Traditionally, the way to get a reference to a TruffleLanguage Context in @Specialization methods was the @CachedContext annotation. However, that annotation was removed in version of 22 of GraalVM. Given that, we have to use a different way: the ContextReference class. The typical way of using this class is storing it as a private static field in the Context class, and accessing it through a static helper method that takes a Node as an argument:

public final class EasyScriptLanguageContext {
    private static final TruffleLanguage.ContextReference<EasyScriptLanguageContext> REF =
          TruffleLanguage.ContextReference.create(EasyScriptTruffleLanguage.class);

    public static EasyScriptLanguageContext get(Node node) {
        return REF.get(node);
    }

    // ...
}

With this in place, we can introduce a common ancestor of all EasyScript Nodes, and add a helper method to it that calls that EasyScriptLanguageContext.get() method:

public abstract class EasyScriptNode extends Node {
    protected final EasyScriptLanguageContext currentLanguageContext() {
        return EasyScriptLanguageContext.get(this);
    }
}

And make EasyScriptStmtNode extend it, instead of the Truffle Node:

public abstract class EasyScriptStmtNode extends EasyScriptNode {
    // ...
}

This allows us to use the currentLanguageContext() method from any Node that needs access to the Context in its implementation, like GlobalVarDeclStmtNode above.

Expression Nodes

We need to add new expression classes to our language. The first, and simplest, is the undefined literal expression:

public final class UndefinedLiteralExprNode extends EasyScriptExprNode {
    @Override
    public int executeInt(VirtualFrame frame) throws UnexpectedResultException {
        throw new UnexpectedResultException(Undefined.INSTANCE);
    }

    @Override
    public double executeDouble(VirtualFrame frame) throws UnexpectedResultException {
        throw new UnexpectedResultException(Undefined.INSTANCE);
    }

    @Override
    public Object executeGeneric(VirtualFrame frame) {
        return Undefined.INSTANCE;
    }
}

We return Undefined.INSTANCE in executeGeneric(), and throw UnexpectedResultException for the remaining execute*() methods.

The second new expression node is the assignment expression:

@NodeChild(value = "assignmentExpr")
@NodeField(name = "name", type = String.class)
public abstract class GlobalVarAssignmentExprNode extends EasyScriptExprNode {
    protected abstract String getName();

    @Specialization
    protected Object assignVariable(Object value) {
        String variableId = this.getName();
        if (!this.currentLanguageContext().globalScopeObject.updateVariable(variableId, value)) {
            throw new EasyScriptException(this, "'" + variableId + "' is not defined");
        }
        return value;
    }
}

It’s very similar to the GlobalVarDeclStmtNode, but updates the variable in context.globalScopeObject instead of creating it.

The third new expression node is the reference to a variable:

@NodeField(name = "name", type = String.class)
public abstract class GlobalVarReferenceExprNode extends EasyScriptExprNode {
    protected abstract String getName();

    @Specialization
    protected Object readVariable() {
        String variableId = this.getName();
        var value = this.currentLanguageContext().globalScopeObject.getVariable(variableId);
        if (value == null) {
            throw new EasyScriptException(this, "'" + variableId + "' is not defined");
        }
        return value;
    }
}

We also need to change our addition node, to account for the presence of undefined:

@NodeChild("leftNode")
@NodeChild("rightNode")
public abstract class AdditionExprNode extends EasyScriptExprNode {
    @Specialization(rewriteOn = ArithmeticException.class)
    protected int addInts(int leftValue, int rightValue) {
        return Math.addExact(leftValue, rightValue);
    }

    @Specialization(replaces = "addInts")
    protected double addDoubles(double leftValue, double rightValue) {
        return leftValue + rightValue;
    }

    @Fallback
    protected double addWithUndefined(Object leftValue, Object rightValue) {
        return Double.NaN;
    }
}

We add a third specialization that’s annotated with @Fallback, which means it uses the negation of all the other specialization activation conditions (you can only have a single @Fallback specialization). In that specialization, we return Double.NaN, which is how JavaScript addition behaves when at least one of its constituents is undefined.

Note that replacing @Fallback with @Specialization would not have worked here, as a specialization with all Object arguments is the most generic one possible. Which means, if a Node activates such a specialization, it will never attempt to activate another one after that. So, if a given addition in your program was first passed undefined as one of its operands, it would then always return NaN, even if later passed an integer or double, which is obviously not the correct behavior.

RootNode

And finally, we have our RootNode. It takes an instance of TruffleLanguage and a list of statements in its constructor. It passes the TruffleLanguage to its RootNode superclass with a super() call. In the execute() method, it evaluates all of the statements, and returns the result of executing the last one:

public final class EasyScriptRootNode extends RootNode {
    @Children
    private final EasyScriptStmtNode[] stmtNodes;

    public EasyScriptRootNode(EasyScriptTruffleLanguage truffleLanguage,
            List<EasyScriptStmtNode> stmtNodes) {
        super(truffleLanguage);

        this.stmtNodes = stmtNodes.toArray(new EasyScriptStmtNode[]{});
    }

    @Override
    public Object execute(VirtualFrame frame) {
        Object ret = Undefined.INSTANCE;
        for (EasyScriptStmtNode stmtNode : this.stmtNodes) {
            ret = stmtNode.executeStatement(frame);
        }
        return ret;
    }
}

The one new Truffle thing about this class is the @Children annotation. It’s basically identical to @Child, but used in case the Node has a variable amount of subnodes, like in our case.

You might be surprised to see an array used for storing the children, but Truffle actually requires that! Arrays are much easier to convert to native code than collections like List. Because arrays are mutable in Java, we can also mark the entire field as final (which you can’t do for @Child fields).

The way I’ve dealt with this in EasyScriptRootNode is a pretty common pattern in Truffle: the class takes a collection in its constructor, but converts it to an array internally.

Undefined polyglot class

The JavaScript undefined value is represented by the Undefined class. Since there’s only ever a single member of the undefined type, the class is a singleton – that’s why we always refer to it as Undefined.INSTANCE. But that’s not the end of the story with this class.

Since it can now be returned as the result of evaluating EasyScript (in a program like let a; a), we need to make it a language interop value, so that it can be handled correctly by the GraalVM polyglot API.

We do that by implementing the TruffleObject interface from the com.oracle.truffle.api.interop package (it’s a marker interface, so doesn’t have any methods), and annotating the class with the @ExportLibrary annotation from the com.oracle.truffle.api.library package, passing it the class of InteropLibrary from the com.oracle.truffle.api.interop package, and then implementing the messages from that library.

The complete list of messages can be found in the documentation for InteropLibrary. You implement messages by adding instance methods to the class, and annotating them with @ExportMessage. Note that the receiver object is implied to be the instance of your implementing class, so your implementations should skip the first argument compared to the library methods.

The name of the method in the implementing class must match the name from the library, or you can use the name attribute of @ExportMessage to change it. For example, the isNull() message can be implemented by a method declared as @ExportMessage boolean isNull(), or by @ExportMessage(name = "isNull") boolean representsNull().

Note that the methods implementing the messages do not have to be public. It’s common practice to make them package-private, to not pollute the public API of the class.

In our Undefined class, we need to implement the isNull() message, to return true. We also implement the toDisplayString() message, which is what the Value class that wraps our polyglot instance uses when toString() is called on it; we just return the "undefined" string from that method.

In summary, the code for our class looks as follows:

import com.oracle.truffle.api.interop.InteropLibrary;
import com.oracle.truffle.api.interop.TruffleObject;
import com.oracle.truffle.api.library.ExportLibrary;
import com.oracle.truffle.api.library.ExportMessage;

@ExportLibrary(InteropLibrary.class)
public final class Undefined implements TruffleObject {
    public static final Undefined INSTANCE = new Undefined();

    private Undefined() {
    }

    @ExportMessage
    boolean isNull() {
        return true;
    }

    @ExportMessage
    Object toDisplayString(@SuppressWarnings("unused") boolean allowSideEffects) {
        return this.toString();
    }

    @Override
    public String toString() {
        return "undefined";
    }
}

This allows us to write the following unit test:

    @Test
    public void correctly_returns_undefined() {
        Context context = Context.create();
        Value result = context.eval("ezs",
                "var a; " +
                "a"
        );

        assertTrue(result.isNull());
        assertEquals("undefined", result.toString());
    }

Surfacing the global bindings

With all of the above code in place, we can execute the program we set as our goal at the beginning of the article:

    @Test
    public void evaluates_statements() {
        Context context = Context.create();
        Value result = context.eval("ezs",
                "var a = 0; " +
                "let b; " +
                "const c = 2.0; " +
                "b = 1; " +
                "a + b + c"
        );

        assertEquals(3.0, result.asDouble(), 0.0);
    }

However, there’s one more thing we should do to make EasyScript a good citizen of the GraalVM polyglot ecosystem. The Context class allows retrieving the global variables of a given language with the getBindings(String languageId) method. We should allow this capability for EasyScript as well; in order to do that, we have to add a few elements to our implementation.

First of all, we need to override the getScope() method in our TruffleLanguage class. We need to return an object allowing access to the global variables from it, which is GlobalScopeObject in our case. Conveniently, the getScope() method receives the language context as its argument, so, because of the way we designed our classes, we can just return contex.globalScopeObject from it:

@TruffleLanguage.Registration(id = "ezs", name = "EasyScript")
public final class EasyScriptTruffleLanguage
        extends TruffleLanguage<EasyScriptLanguageContext> {
    // ...

    @Override
    protected Object getScope(EasyScriptLanguageContext context) {
        return context.globalScopeObject;
    }
}

The second set of changes required is implementing the interop library in the GlobalScopeObject, similarly like we did in the Undefined class. We start with implementing the TruffleObject marker interface. Because we return this object from the getScope() method in TruffleLanguage, the first message that we have to implement is the isScope() message to return true. That in turn requires implementing a few other messages:

Taking it all together, our class now looks as follows:

@ExportLibrary(InteropLibrary.class)
public final class GlobalScopeObject implements TruffleObject {
    private final Map<String, Object> variables = new HashMap<>();
    private final Set<String> constants = new HashSet<>();

    // ...

    @ExportMessage
    boolean isScope() {
        return true;
    }

    @ExportMessage
    boolean hasMembers() {
        return true;
    }

    @ExportMessage
    boolean isMemberReadable(String member) {
        return this.variables.containsKey(member);
    }

    @ExportMessage
    Object readMember(String member) throws UnknownIdentifierException {
        Object value = this.variables.get(member);
        if (null == value) {
            throw UnknownIdentifierException.create(member);
        }
        return value;
    }

    @ExportMessage
    Object getMembers(@SuppressWarnings("unused") boolean includeInternal) {
        return new GlobalVariableNamesObject(this.variables.keySet());
    }

    @ExportMessage
    Object toDisplayString(@SuppressWarnings("unused") boolean allowSideEffects) {
        return "global";
    }

    @ExportMessage
    boolean hasLanguage() {
        return true;
    }

    @ExportMessage
    Class<? extends TruffleLanguage<?>> getLanguage() {
        return EasyScriptTruffleLanguage.class;
    }
}

@ExportLibrary(InteropLibrary.class)
final class GlobalVariableNamesObject implements TruffleObject {
    private final List<String> names;

    GlobalVariableNamesObject(Set<String> names) {
        this.names = new ArrayList<>(names);
    }

    @ExportMessage
    boolean hasArrayElements() {
        return true;
    }

    @ExportMessage
    long getArraySize() {
        return this.names.size();
    }

    @ExportMessage
    boolean isArrayElementReadable(long index) {
        return index >= 0 && index < this.names.size();
    }

    @ExportMessage
    Object readArrayElement(long index) throws InvalidArrayIndexException {
        if (!this.isArrayElementReadable(index)) {
            throw InvalidArrayIndexException.create(index);
        }
        return this.names.get((int) index);
    }
}

With this code in place, we can write the following unit test retrieving EasyScript’s global bindings:

    @Test
    public void surfaces_global_bindings() {
        this.context.eval("ezs",
                "var a = 1; " +
                "let b = 2 + 3; " +
                "const c = 4.0; "
        );

        Value globalBindings = this.context.getBindings("ezs");
        assertFalse(globalBindings.isNull());
        assertTrue(globalBindings.hasMembers());
        assertTrue(globalBindings.hasMember("a"));
        assertEquals(Set.of("a", "b", "c"), globalBindings.getMemberKeys());

        Value b = globalBindings.getMember("b");
        assertEquals(5, b.asInt());
    }

Summary

Phew! Something seemingly as simple as global variables turned out to be a lot of work, but we finally managed to power through it.

As usual, the full working code from the article is available on GitHub.

In the next part of the series, we will finally add function calls to the language, so make sure you don’t miss it!


This article is part of a tutorial on GraalVM's Truffle language implementation framework.