- Savidu Dias

# Developing a Simple Programming Language - Part 4: Associativity, Precedence, and Grammars

Greetings, my child. In the last chapter, we learned about parsing arithmetic expressions with multiple additions and subtractions. In addition to that, we also learned about syntax diagrams, and how they can be used to design programming languages.

You might be wondering that you signed up to do a programming language, but we’re in part 4, and still making a calculator. Well, worry no further! What we are already doing is something that is going to be implemented in our Javascript interpreter, which is **math**. Implementing basic arithmetic operations is one of the most complex operations in developing a programming language. Once we get through this hurdle, the rest is going to be a piece of cake. I promise :)

Also, don’t forget that we have learned so many things apart from making a basic calculator. We learned the basic concepts of building an interpreter, like; its basic components, and Syntax Diagrams. Getting a hang of these basic concepts will help you a lot in the long run, instead of immediately diving into the deep end. Even if you’re not 100% sure of how everything works right now, there will come a time where things will “click” and all the pieces will come together!

The best piece of advice I could give anybody learning to develop an interpreter is to read the explanations in the code, go through the code, and write the code yourself!

Today, we will be learning how to parse, and interpret arithmetic expressions that have any number of addition, subtraction, division, and multiplication operators. We will be able to develop an interpreter that is able to evaluate expressions like “12 * 3 + 1 -13 / 2”.

Before getting into the good stuff, let me teach you some basic math. Let’s talk about **associativity** and **precedence** of operators. In ordinary arithmetic, and most programming languages, arithmetic operations are **left associative**. What does this mean?

1 + 2 + 3 is equivalent to (1 + 2) + 3

1 - 2 - 3 is equivalent to (1 - 2) - 3

1 * 2 * 3 is equivalent to (1 * 2) * 3

1 / 2 / 3 is equivalent to (1 / 2) / 3

Consider the operand “2” in the expression “1 + 2 + 3”. It has plus signs on both sides. We need to know which plus sign applies to 2. Also, “2” is known as the **operand**, while plus is known as the **operator**. Generally, we all agree that that an operand that has plus signs on both sides belongs to the operator on the left. Therefore, we say that the operator ‘+’ is left associative. That’s why an operation like 1 + 2 + 3 is equivalent to (1 + 2) - 3 by the associativity convention.

Now let’s talk about precedence. You may remember learning something called “BODMAS” back in school (PEMDAS for some people). Most programming languages such as Java and C++ tend to follow the rules of BODMAS, although both methods should yield the same results if you did the math right :)

If you are not already aware, BODMAS refers to the precedence of operators. Imagine an operation like 1 + 2 * 3, where there are different kinds of operators on both sides of the operand 2. Is the correct expression (1 + 2) * 3 or 1 + (2 * 3)? Here is where BODMAS comes to save the day, It stands for:

B - Brackets O - Order D - Division M - Multiplication A - Addition S - Subtraction

Coming back to the problem at hand, by looking at the BODMAS rule for the precedence of operations, we can see that 1 + (2 * 3) is the correct form of the expression. This rule says that multiplication operations (*) take the operand before + does, and therefore has a higher precedence. It is agreed upon that multiplications and divisions have a higher precedence than additions and subtractions.

To recap, the **associativity** rule is considered when the same operation is on both sides of an operand. The **precedence** rule is considered when different operations are there on both sides of the operand.

How do we develop a Syntax Diagrams for precedence? If you want to try it yourself, I’ll give you a hint. You’ll need two separate diagrams; one to represent additions and subtractions, and another to represent multiplications and divisions.

Here’s how we’re going to do it:

We have defined expr for addition and subtractions, and term for multiplications and divisions. Let’s see how we’re going to change our code for this.

Our Multiplication, and Division tokens should be added to our type enum.

enum Type { INTEGER, PLUS, MINUS, MUL, DIV, EOF }

Our lexer should identify multiplication, and division operators.

private Token getNextToken() { while (this.currentChar != Character.MIN_VALUE) { if (Character.isWhitespace(this.currentChar)) { skipWhitespace(); continue; } if (Character.isDigit(this.currentChar)) { return new Token(Type.INTEGER, integer()); } if (this.currentChar == '+') { advance(); return new Token(Type.PLUS, currentChar); } if (this.currentChar == '-') { advance(); return new Token(Type.MINUS, currentChar); } if (this.currentChar == '*') { advance(); return new Token(Type.MUL, currentChar); } if (this.currentChar == '/') { advance(); return new Token(Type.DIV, currentChar); } try { throw new Exception("Invalid token type"); } catch (Exception e) { System.err.printf("Invalid token type %c found", currentChar); System.exit(0); return null; } } return new Token(Type.EOF, null); }

Finally, all we have to do is to follow the syndax diagram we just designed, and create our Parser. Try doing it yourself. If you did, your Parser should look something like this.

/* factor : INTEGER */ int factor() { Token token = lexer.getCurrentToken(); lexer.eat(Type.INTEGER); return (int) token.getValue(); } /* term : factor ((MUL | DIV) factor)* */ int term() { int result = factor(); Token currentToken = lexer.getCurrentToken(); while (currentToken.getType() == Type.MUL || currentToken.getType() == Type.DIV) { if (currentToken.getType() == Type.DIV) { lexer.eat(Type.DIV); result = result / factor(); } if (currentToken.getType() == Type.MUL) { lexer.eat(Type.MUL); result = result * factor(); } currentToken = lexer.getCurrentToken(); } return result; } /* expr : term ((PLUS | MINUS) term)* term : factor ((MUL | DIV) factor)* factor : INTEGER */ int expr() { int result = term(); Token currentToken = lexer.getCurrentToken(); while (currentToken.getType() == Type.PLUS || currentToken.getType() == Type.MINUS) { if (currentToken.getType() == Type.PLUS) { lexer.eat(Type.PLUS); result = result + term(); } else if (currentToken.getType() == Type.MINUS) { lexer.eat(Type.MINUS); result = result - term(); } currentToken = lexer.getCurrentToken(); } return result; }

Let’s compile and run our program to see if everything works.

$ javac -d . Token.java $ javac -d . Lexer.java $ javac -d . Parser.java $ javac -d . Main.java $ java Main calc> 2 + 3 * 4 14 calc> 2 + 12 / 4 5 calc> 80 / 20 + 8 * 10 84

There you go! Not only can our calculator do additions and subtractions, it can now do divisions and multiplications as well. Now that that’s out of the way, let me teach you something new.

You would have noticed the gibberish written on as the method documentation in our example code. You may be wondering what that means, and rightfully so. What you see here are called **grammars**. Grammars are similar to Syntax Diagrams, except, they are a way to represent the programming language syntax in text, instead of having a diagram. There are many reasons to use grammars.

They serve as great documentation.

Good starting point, even if the parser is manually written from scratch. Grammars can be converted to code by following a set of simple rules.

Can use

**parser generator**tools, which accept a grammar as an input, and automatically generates a parser based on that grammar.

This is the grammar for our calculator so far. You may have noticed that this looks like an awfully similar representation of our Syntax Diagram. That’s because it is.

A grammar consists of a sequence to rules known as **productions**. A production consists of a non-terminal called the **head**, a colon, and a sequence of terminals or non terminals called the **body**.

In our grammar, tokens like PLUS, MINUS, MUL, DIV, and INTEGER are called **terminals**, and variables like expr, term, and factor are called **non-terminals**. Non terminals usually consist of a sequence of terminals and/or non-terminals. The head of the first rule is called the **start symbol**. In our case, the start symbol is expr.

Following are the symbols used in grammars, and their meanings:

| - Similar to an ‘or’ operator. So (MUL | DIV) means either MUL or DIV

( … ) - a grouping of terminals and/or non-terminals

( … )* - loop through contents within the group zero or more times

That’s it for today’s chapter. We have learned a lot today, so let’s do a quick recap, shall we? We started off with an introduction to basic arithmetic principles known as associativity, and precedence. The associativity rule is considered when the same operation is on both sides of an operand. The precedence rule is considered when different operations are there on both sides of the operand.

Then, we went on to design the syntax diagrams for implementing multiplication and division using principles of associativity, and precedence. After that, we developed our Parser based on out syntax diagram. Finally, we discussed Grammars, their similarity to Syntax Diagrams, and how they are useful.

I have some good news, fellow kids! “What is it?” you ask? The next chapter is the part where we’ll be wrapping up our discussion on arithmetic expressions, and completing our calculator. After that, we will get into the really good stuff :)

In the next chapter, we will be improving our interpreter that will be able to evaluate parenthesized expressions. So, we will be able to calculate arithmetic expressions like 6 + 2 * (10 - 3 / (12 / 4) + 1). Sounds scary? Worry not. Just like everything we have done so far, this is not as difficult as you may think it is. Looking forward to seeing you in the next part where we’ll wrap everything up!

**Previous Chapter**: Part 3: Syntax Diagrams

**Next Chapter**: Part 5: Parenthesized Expressions