Back to index

4 ways to build a simple calculator in Python - Part 3

In this concluding part of the series, I will build an abstract syntax tree or AST. An AST holds the information present in the arithmetic expression and can be evaluated to yield the result. It is built from grammar rules just like the translator in part 2 and evaluated using the design pattern known as the visitor pattern. Note that the ast module in Python can be used to build the tree, as I will show below, but it is more instructive to build it from first principles. The ast module is a general method to contruct ASTs for code, not just for arithmetic expressions.

One may argue that building an AST is overkill for simple arithmetic expressions - the method given in part 1 or in part 2 of this series should suffice. But the purpose here is to study it to get a peek at how programming languages parse expressions (and code in general) behind the scenes.

What is an AST?

An AST is a tree-like structure that holds the essential information in an expression. The interior nodes are the operators of the expression while the leaves are the operands.

In contrast, a parse tree is contructed from the grammar with the nonterminals as the inner nodes and the terminals or tokens of the expression as leaves.

The diagram below shows an AST and a parse tree for the expression 5 * (2 + 4).

Example AST and parse tree

The diagram shows that the AST comprises of only the operators and operands of the expression, without the parentheses, while the parse tree is elaborate - consisting of the nonterminals of the grammar (expr, term and factor) and the expression itself (reading the leaves from left to right).

The Python ast module

We can use the Python standard library module ast to build an AST. I will use the same expression as above to illustrate.

>>> import ast
>>> tree = ast.parse('5 * (2 + 4)', mode='eval')
>>> ast.dump(tree)
'Expression(body=BinOp(left=Num(n=5), op=Mult(), right=BinOp(left=Num(n=2), op=Add(), right=Num(n=4))))'

We see that the AST has, among other classes, two classes named BinOp and Num. These stand for the interior and leaf nodes of the AST respectively. I will use these classes in the design of the parser below.

The AST node (or tree) above can be evaluated to yield a value. It first needs to be compiled into a code object by the compile builtin function and then passed to the eval function to find the value.

>>> co = compile(tree, filename='<string>', mode='eval')
>>> eval(co)
30

The lexer

The lexical analyzer or lexer for the translator is identical to what was built in part 2. I am repeating it below for convenience.

import re

class Num:
    def __init__(self, val):
        self.value = float(val) if '.' in val else int(val)

    def __repr__(self):
        return f'Num: {self.value}'

def lexan():
    '''Lexical analyzer - creates a stream of tokens'''

    inp = input('Enter the expression ')

    i = 0
    while i < len(inp):
        c = inp[i]
        if c.isdigit() or c == '.':
            val = re.match(r'[.0-9]+', inp[i:])[0]
            yield Num(val)
            i += len(val)
        else:
            if inp[i:i + 2] == '**':
                yield '**'
                i += 1
            elif c != ' ':
                yield c
            i += 1

The parser

The grammar for the parser will be the same as in part 2. I have repeated it below:

Translator grammar

However, instead of performing the operations while parsing, we build nodes of the tree. There are three kinds of nodes - Num, BinOp and UMinus. Num represents a number - a sequence of digits with an optional decimal point. BinOp stands for a binary operation. It has three parts - left (left operand), op (operator) and right (right operand). UMinus stands for a unary minus operation. It has two parts - op (operator - - sign) and value (the number). The Python classes for BinOp and UMinus are shown below - Num was shown above:

class BinOp:
    def __init__(self, op, left, right):
        self.left = left
        self.op = op
        self.right = right

    def __repr__(self):
        return f'<BinOp: left={self.left} op={self.op} right={self.right}>'

class UMinus:
    def __init__(self, value):
        self.op = '-'
        self.value = value

    def __repr__(self):
        return f'<UMinus: value={self.op}{self.value}>'

The rest of the code for the parser is similar to what was built in part 2. The difference is in the places where operations were performed - we now build nodes.

def expr():
    '''Function to start the parsing. It builds the `+` and `-` nodes as
    appropriate.'''

    result = term()
    while True:
        if lookahead == '+':
            match('+')
            result = BinOp(left=result, op='+', right=term())
        elif lookahead == '-':
            match('-')
            result = BinOp(left=result, op='-', right=term())
        else:
            break

    return result


def term():
    '''This function builds the `*` and `/` nodes.'''

    result = factor()
    while True:
        if lookahead == '*':
            match('*')
            result = BinOp(left=result, op='*', right=factor())
        elif lookahead == '/':
            match('/')
            result = BinOp(left=result, op='/', right=factor())
        else:
            break

    return result


def factor():
    '''This recursive function builds the node for the right associative
    power operation.'''

    result = base()
    if lookahead == '**':
        match('**')
        result = BinOp(left=result, op='**', right=factor())

    return result


def base():
    '''This function evaluates another expression in parentheses or negates
    a number or simply return a number.'''

    if lookahead == '(':
        match('(')
        result = expr()
        match(')')
    elif isinstance(lookahead, Num):
        result = lookahead
        match(lookahead)
    elif lookahead == '-':
        match('-')
        result = UMinus(value=base())
    else:
        match('number')

    return result


def match(t):
    '''Checks if token is valid and gets the next one.'''

    global lookahead
    if t == lookahead:
        try:
            lookahead = next(token_gen)
        except StopIteration:
            lookahead = ''
    else:
        raise RuntimeError(f'Malformed input. Token {lookahead}')

To test the parser, we can code a driver function as follows:

def main():
    '''This function sets a starting value for the `lookahead` and calls the
    parser.'''

    global token_gen, lookahead
    token_gen = lexan()
    lookahead = next(token_gen)

    tree = expr()
    match('')
    print(tree)

Running main with the string 5 * (2 + 4) gives the following output:

>>> main()
Enter the expression 5 * (2 + 4)
<BinOp: left=Num: 5 op=* right=<BinOp: left=Num: 2 op=+ right=Num: 4>>

Note the similarity with what was obtained above by a dump of the AST using Python's ast module.

Now we discuss the evaluation of the AST.

The visitor pattern

This is a design pattern commonly used in parsing and compiling. There are two ideas inherent in it. Firstly, it separates the manipulation of a complex data structure like an AST from the data structure itself. The classes Num, BinOp and UMinus do not carry any logic to manipulate the data they hold. Secondly, the visitor class, given below, uses a clever technique to dispatch actions to different methods, based on the node type. It uses recursion in the process. The visitor pattern can be used to evaluate an AST or take some other action like generating instructions for instance.

class NodeVisitor:
    def visit(self, node):

        method_name = 'visit_' + type(node).__name__
        method = getattr(self, method_name, None)
        if method:
            return method(node)
        else:
            self.generic_visit(node)

    def generic_visit(self, node):

        raise RuntimeError(f'No method found for node {node}')

class Visitor(NodeVisitor):
    def visit_BinOp(self, node):

        if node.op == '+':
            return self.visit(node.left) + self.visit(node.right)
        elif node.op == '-':
            return self.visit(node.left) - self.visit(node.right)
        elif node.op == '*':
            return self.visit(node.left) * self.visit(node.right)
        elif node.op == '/':
            return self.visit(node.left) / self.visit(node.right)
        elif node.op == '**':
            return self.visit(node.left) ** self.visit(node.right)

    def visit_UMinus(self, node):

        return -self.visit(node.value)

    def visit_Num(self, node):

        return node.value

The NodeVisitor class dispatches a node of the AST to methods based on the type of the node. It uses the getattr builtin function to dynamically get the method from its name. If no method is found, then the generic_visit method is called to raise an exception. The Visitor class inherits from the NodeVisitor class and provides visit methods for the different node types. Its methods call themselves recursively to traverse the AST.

To test this code, we can modify the driver function as shown below:

def main():
    '''This function sets a starting value for the `lookahead` and calls the
    parser.'''

    global token_gen, lookahead
    token_gen = lexan()
    lookahead = next(token_gen)

    tree = expr()
    match('')
    v = Visitor()
    result = v.visit(tree)
    print(result)

Following are some tests carried out in the REPL:

>>> main()
Enter the expression 5 * (2 + 4)
30
>>> main()
Enter the expression -(1 + 2) / 2
-1.5
>>> main()
Enter the expression 10 + ( 3 * 2 ) ** 2 ** 3 - 25 / 5
1679621.0

The last example shows the same result as obtained in parts 1 and 2 of the series.

Complete listing

The complete listing of the translator program including the lexer, parser, AST evaluation and the driver components:

import re


class Num:
    def __init__(self, val):
        self.value = float(val) if '.' in val else int(val)

    def __repr__(self):
        return f'Num: {self.value}'


class BinOp:
    def __init__(self, op, left, right):
        self.left = left
        self.op = op
        self.right = right

    def __repr__(self):
        return f'<BinOp: left={self.left} op={self.op} right={self.right}>'


class UMinus:
    def __init__(self, value):
        self.op = '-'
        self.value = value

    def __repr__(self):
        return f'<UMinus: value={self.op}{self.value}>'


class NodeVisitor:
    def visit(self, node):

        method_name = 'visit_' + type(node).__name__
        method = getattr(self, method_name, None)
        if method:
            return method(node)
        else:
            self.generic_visit(node)

    def generic_visit(self, node):

        raise RuntimeError(f'No method found for node {node}')


class Visitor(NodeVisitor):
    def visit_BinOp(self, node):

        if node.op == '+':
            return self.visit(node.left) + self.visit(node.right)
        elif node.op == '-':
            return self.visit(node.left) - self.visit(node.right)
        elif node.op == '*':
            return self.visit(node.left) * self.visit(node.right)
        elif node.op == '/':
            return self.visit(node.left) / self.visit(node.right)
        elif node.op == '**':
            return self.visit(node.left) ** self.visit(node.right)

    def visit_UMinus(self, node):

        return -self.visit(node.value)

    def visit_Num(self, node):

        return node.value


def lexan():
    '''Lexical analyzer - creates a stream of tokens'''

    inp = input('Enter the expression ')

    i = 0
    while i < len(inp):
        c = inp[i]
        if c.isdigit() or c == '.':
            val = re.match(r'[.0-9]+', inp[i:])[0]
            yield Num(val)
            i += len(val)
        else:
            if inp[i:i + 2] == '**':
                yield '**'
                i += 1
            elif c != ' ':
                yield c
            i += 1


def expr():
    '''Function to start the parsing. It builds the `+` and `-` nodes as
    appropriate.'''

    result = term()
    while True:
        if lookahead == '+':
            match('+')
            result = BinOp(left=result, op='+', right=term())
        elif lookahead == '-':
            match('-')
            result = BinOp(left=result, op='-', right=term())
        else:
            break

    return result


def term():
    '''This function builds the `*` and `/` nodes.'''

    result = factor()
    while True:
        if lookahead == '*':
            match('*')
            result = BinOp(left=result, op='*', right=factor())
        elif lookahead == '/':
            match('/')
            result = BinOp(left=result, op='/', right=factor())
        else:
            break

    return result


def factor():
    '''This recursive function builds the node for the right associative
    power operation.'''

    result = base()
    if lookahead == '**':
        match('**')
        result = BinOp(left=result, op='**', right=factor())

    return result


def base():
    '''This function evaluates another expression in parentheses or negates
    a number or simply return a number.'''

    if lookahead == '(':
        match('(')
        result = expr()
        match(')')
    elif isinstance(lookahead, Num):
        result = lookahead
        match(lookahead)
    elif lookahead == '-':
        match('-')
        result = UMinus(value=base())
    else:
        match('number')

    return result


def match(t):
    '''Checks if token is valid and gets the next one.'''

    global lookahead
    if t == lookahead:
        try:
            lookahead = next(token_gen)
        except StopIteration:
            lookahead = ''
    else:
        raise RuntimeError(f'Malformed input. Token {lookahead}')


def main():
    '''This function sets a starting value for the `lookahead` and calls the
    parser.'''

    global token_gen, lookahead
    token_gen = lexan()
    lookahead = next(token_gen)

    tree = expr()
    match('')
    v = Visitor()
    result = v.visit(tree)
    print(result)

if __name__ == '__main__':
    main()


This concludes this post and the series. In this post, you learned about ASTs and the visitor pattern for evaluating them. ASTs are used by compilers as a representation of code from which target language instructions are generated. Here I applied the concept to a simple calculator. Any of the techniques presented in this 3-part series can be used to build the calculator.

References

  1. Compilers - Principles, techniques and tools by Aho, Sethi and Ullman

  2. Let's build a simple interpreter. Part 7 - Blog by Ruslan Spivak

  3. Python cookbook by David Beazley and Brian Jones

Back to index