A universal interpreter

A universal interpreter

In the previous article I explained the basic idea behind a technique called Böhm-Berarducci encoding of algebraic data types, and showed a way to implement this technique in Raku. Unless you are already familiar with this formalism, I recommend you read that article first.

In this article I want to illustrate how the Böhm-Berarducci (BB) encoding of a data structure based on algebraic data types can be considered as a universal interpreter. What this means is that it is easy to perform computations that turn the data structure into something else. As an example, I will demonstrate how to create an evaluator and pretty-printer for a parsed polynomial expression.

A parse tree type

Consider expressions of the form a*x^2+b*x+c or x^3+1 or x*y^2-x^2*y. Let’s assume we have a parser for such an expression, for example built using parser combinators. Let’s also assume that this parser returns the parsed data as an algebraic data type, defined in Haskell as:

data Term = 
      Var String
    | Par String 
    | Const Int
    | Pow Term Int
    | Add [Term]
    | Mult [Term]

and in Raku:

role Term {}
role Var [Str \v] does Term {
    has Str $.var = v;
}
role Par [Str \p] does Term {
    has Str $.par = p;
}
role Const [Int \c] does Term {
    has Int $.const = c;
}
role Pow [Term \t, Int \n] does Term {
    has Term $.term = t;
    has Int $.exp = n;
}
role Add [Array[Term] \ts] does Term {
    has Array[Term] $.terms = ts;
}
role Mult [Array[Term] \ts] does Term {
    has Array[Term] $.terms = ts;
}

The additional complexity compared to the types discussed in the previous article is that this type is recursive: the Pow, Add and Mult roles take parameters of type Term.

Before we look at the BB encoding, let’s first write a pretty-printer for this type, using recursive multi subs.

# Pretty-print a Term 
multi sub ppTerm(Var \t) { t.var }
multi sub ppTerm(Par \c) { c.par }
multi sub ppTerm(Const \n) { "{n.const}" }
multi sub ppTerm(Pow \pw){ 
    ppTerm(pw.term) ~ '^' ~ "{pw.exp}" 
}
multi sub ppTerm(Add \t) { 
    my @pts = map {ppTerm($_)}, |t.terms;
    "("~join( " + ", @pts)~")"
}
multi sub ppTerm(Mult \t){ 
    my @pts = map {ppTerm($_)}, |t.terms;
    join( " * ", @pts)
}

In the same way we can write an evaluator for this type:

# Evaluate a Term 
multi sub evalTerm(%vars,  %pars, Var \t) { %vars{t.var} }
multi sub evalTerm(%vars,  %pars,Par \c) { %pars{c.par} }
multi sub evalTerm(%vars,  %pars,Const \n) { n.const }
multi sub evalTerm(%vars,  %pars,Pow \pw){ 
    evalTerm(%vars,  %pars,pw.term) ** pw.exp 
}
multi sub evalTerm(%vars,  %pars,Add \t) { 
    my @pts = map {evalTerm(%vars,  %pars,$_)}, |t.terms;
    [+] @pts
}
multi sub evalTerm(%vars,  %pars,Mult \t){ 
    my @pts = map {evalTerm(%vars,  %pars,$_)}, |t.terms;
    [*] @pts
}

Example parse trees

As an example, let’s create the parse tree for a few expressions using the Term type.

# a*x^2 + b*x + x
my \qterm1 = Add[ 
    Array[Term].new(
    Mult[ Array[Term].new(
        Par[ "a"].new, 
        Pow[ Var[ "x"].new, 2].new) 
        ].new,
    Mult[
        Array[Term].new(
            Par[ "b"].new, 
            Var[ "x"].new) 
        ].new,
    Par[ "c"].new
    )
    ].new;

#   x^3 + 1    
my \qterm2 = Add[ 
    Array[Term].new(
        Pow[ Var[ "x"].new, 3].new, 
        Const[ 1].new
    )
    ].new;

#   qterm1 * qterm2    
my \qterm = Mult[ 
    Array[Term].new(
        qterm1, qterm2
    )
    ].new;

Calling the pretty-printer and evaluator on this term:

say ppTerm( qterm); # => (a * x^2 + b * x + c) * (x^3 + 1)

say evalTerm(
    {"x" => 2}, {"a" =>2,"b"=>3,"c"=>4},  qterm
); # => 162

BB encoding of the parse tree type

The BB encoding of the Term algebraic data type in Raku is pleasingly compact:

role TermBB[&f] {
    method unTermBB(
        &var:(Str --> Any),
        &par:(Str --> Any),
        &const:(Int --> Any),
        &pow:(Any,Int --> Any),
        &add:(Array[Any] --> Any),
        &mult:(Array[Any] --> Any) 
        --> Any
    ) {
        f(&var,&par,&const,&pow,&add,&mult);
    }
}

It would of course be even more compact without the signatures, but then we’d have no information about the encoded type.

We could of course use this type directly, but instead I want to look at how we can convert between Term and TermBB.

As before, we create our little helpers. Each of the functions below is a constructor which generates the TermBB instance for the corresponding alternative in the Term algebraic data type. (When Raku’s macro language is more developed, we will be able to generate these automatically.)

sub VarBB(Str \s --> TermBB) { 
    TermBB[ 
        sub (\v, \c, \n, \p, \a, \m) { v.(s) }
    ].new;
    }
sub ParBB(Str \s --> TermBB) { 
    TermBB[ 
        sub (\v, \c, \n, \p, \a, \m) { c.(s) }
    ].new;
    }
sub ConstBB(Int \i --> TermBB) { 
    TermBB[ 
        sub (\v, \c, \n, \p, \a, \m) { n.(i) }
    ].new;
    }    
sub PowBB( TermBB \t, Int \i --> TermBB) {
    TermBB[  sub (\v, \c, \n, \p, \a, \m) { 
        p.( t.unTermBB( v, c, n, p, a, m ), i);
    }
    ].new;
}
sub AddB( Array[TermBB] \ts --> TermBB) {
    TermBB[  sub (\v, \c, \n, \p, \a, \m) { 
        a.( map {.unTermBB( v, c, n, p, a, m )}, ts )
    }
    ].new;
}
sub MultBB(  Array[TermBB] \ts --> TermBB) { 
    TermBB[  sub (\v, \c, \n, \p, \a, \m) { 
        m.( map {.unTermBB( v, c, n, p, a, m )}, ts )
    }
    ].new;
}

The interesting generators are PowBB, AddBB and MultBB because they are recursive. In PowBB, the function passed as parameter to the TermBB role constructor calls p which has a signature of :(Any,Int --> Any), but actually requires an argument of the same type as the return value (we need a -> Int -> a). The argument t is of type TermBB which is a wrapper around a function which, when applied, will return the right type. In the Raku implementation, this function is the method unTermBB. So we need to call t.unTermBB( ... ). In AddBB and MultBB, we have an Array[TermBB] so we need to call unTermBB on every element, hence the map call.

Using these generators we can write a single function to convert the algebraic data type into its BB encoding. Unsurprisingly, it is very similar to the pretty-printer and evaluator we wrote for Term instances:

# Turn a Term into a BB Term
multi sub termToBB(Var \t  ) { VarBB(t.var)}
multi sub termToBB(Par \c  ) { ParBB( c.par)}
multi sub termToBB(Const \n) {ConstBB(n.const)}
multi sub termToBB(Pow \pw ) { 
    PowBB( termToBB(pw.term), pw.exp)
}
multi sub termToBB(Add \t  ) { 
    AddBB( typed-map( TermBB, t.terms, &termToBB ))
}
multi sub termToBB(Mult \t ) { 
    MultBB( typed-map( TermBB, t.terms, &termToBB ))
}

# map &f and return in an Array of type T
sub typed-map (\T,\lst,&f) {
    Array[T].new(map {f($_) }, |lst )
}

Because PowBB, AddBB and MultBB require a TermBB, we need to call termToBB on the Term fields. And because AddBB and MultBB take an array of Term, we need a map. However, Raku’s map returns values of type Seq, so we need an explicit conversion into Array.

We can now convert any data structure of type Term into its BB encoding:

my \qtermbb = termToBB( qterm);

say qtermbb.raku; # => TermBB[Sub].new

Interpreter 1: Pretty-printer with BB encoding

To create a pretty-printer for the BB-encoded type, we write implementations for each alternative, and the unTermBB call magically combines these.

sub ppTermBB(TermBB \t --> Str){ 
    sub var( \x ) { x }
    sub par( \x ) { x }
    sub const(\x ) { "{x}" }
    sub pow( \t, \m ) { t ~ "^{m}" } 
    sub add( \ts ) { "("~join( " + ", ts)~")" }
    sub mult( \ts ) { join( " * ", ts) }
    t.unTermBB( &var, &par, &const, &pow, &add, &mult);
}

Compared with ppTerm (copied below for convenience), the main differences are that there is no recursion and no need to map anything. We also don’t need a multi sub to pattern match on the constructors, and there is no need to unpack the values stored in the type using attribute accessors. As a result, the BB version is markedly less cluttered.

multi sub ppTerm(Var \t --> Str) { t.var }
multi sub ppTerm(Par \c --> Str) { c.par }
multi sub ppTerm(Const \n --> Str) { "{n.const}" }
multi sub ppTerm(Pow \pw --> Str){ 
    ppTerm(pw.term) ~ '^' ~ "{pw.exp}" 
}
multi sub ppTerm(Add \t --> Str) { 
    my @pts = map {ppTerm($_)}, |t.terms;
    "("~join( " + ", @pts)~")"
}
multi sub ppTerm(Mult \t --> Str){ 
    my @pts = map {ppTerm($_)}, |t.terms;
    join( " * ", @pts)
}

Interpreter 2: Evaluator with BB encoding

And an evaluator is equally simple:

sub evalTermBB( %vars,  %pars, \t) {
    t.unTermBB( 
        -> \x { %vars{x} }, 
        -> \x { %pars{x} },
        -> \x {x},
        -> \t,\m { t ** m},
        -> \ts { [+] ts},
        -> \ts { [*] ts}
    );
}

As with evalTerm below, we pass hashes for variable and parameter definitions as arguments to provide context for the evaluation. In the BB version we need to do this only once, rather than for every multi variant, so I have written it below using a given/when. Even then, the BB version is a lot cleaner, for the same reasons as above.

sub evalTerm(%vars,  %pars, Term \t) {
    given t {
        when Var { %vars{t.var} }
        when Par { %pars{t.par} }
        when Const { t.const }
        when Pow { evalTerm(%vars,  %pars,t.term) ** t.exp }
        when Add {
            my @pts = map {
                evalTerm(%vars,  %pars,$_)
                }, |t.terms;
            [+] @pts
        }
        when Mult { 
            my @pts = map {
                evalTerm(%vars,  %pars,$_)
                }, |t.terms;
            [*] @pts
        }
    }
}

Interpreter 3: Converting TermBB to Term

Finally, let’s look at converting TermBB to Term. This is yet another type of interpreter so we can follow exactly the same approach as before:

sub toTerm(TermBB \t --> Term){ 
        sub var( \x ) { Var[x].new }
        sub par( \x ) { Par[x].new }
        sub const( $x ) { Const[$x].new }
        sub pow( \t, $m ) { Pow[ t, $m].new } 
        sub add( \ts ) { Add[ ts ].new }
        sub mult( \ts ) { Mult[ ts ].new) }
        t.unTermBB( &var, &par, &const, &pow, &add, &mult);
}

say toTerm(qtermbb).raku;

Using the BB type directly

In the examples above I have created the data structures using the Term type and converted the result to a TermBB type. We can of course also directly use the BB type. If we don’t use strict typing and make the argument of Add and Mult slurpy, we get a nice and clean representation:

# a*x^2 + b*x + x
my \qtermbb1 = AddBB(
    MultBB( 
        ParBB( "a"), 
        PowBB( VarBB( "x"), 2) 
        ),
    MultBB(
        ParBB( "b"), 
        VarBB( "x") 
        ),
    ParBB( "c")
);

#   x^3 + 1    
my \qtermbb2 = AddBB( 
    PowBB( VarBB( "x"), 3), 
    ConstBB(1)
);

#   qterm1 * qterm2    
my \qtermbb3 = MultBB( 
    qterm1, qterm2
);
    

This is structurally very similar to the examples using the Term type. We can obtain exactly the same representation by using a slurpy helper function to wrap the role constructors for Term. See the code in no-b-timing.raku` and ubb-timing.raku for details.

The code as presented above is not entirely correct: I have not always typed everything explicitly, but the explicit signatures in the role definition will cause type errors unless everything is explicitly typed. See the code in tbb-timing.raku for details.

The code in no-bb-timing and ubb-timing is comparable in terms of complexity. I ran a timing test, and the BB implementation of the algebraic data type is about 20% slower than the ‘ordinary’ implementation. However, the fully-typed version tbb-timing is three times slower. Types in Raku are clearly not zero-cost abstractions.

On the other hand, somewhat paradoxically, we don’t really need this explicit typing. It is useful to write down the function types for the BB encoding, and I think it helps with the explanations, but the actual type safety comes from the algebraic data types that we created.

Conclusion

In this article and the previous one I have shown another way to implement algebraic data types in Raku. As with the approach discussed in ‘Roles as Algebraic Data Types in Raku’, I use a role to create the type. However, in this approach the entire data structure is encoded as a function using the Böhm-Berarducci encoding. From a type theoretical perspective, both approaches are precisely equivalent. In terms of coding effort and performance, both approaches are comparable.

The advantage of the BB approach is that because the data is encoded as a function, it becomes easier to create interpreters for the data type, and I have illustrated this with a pretty-printer and evaluator for a parsed expression. All interpreters for BB types have the same structure, which is why I call it a universal interpreter. The key feature is that these interpreters do not require any explicit recursion.

The complete code for both articles is in universal-interpreter.raku