Study Guide: Interpreters
Instructions
This is a study guide with links to past lectures, assignments, and handouts, as well as additional practice problems to assist you in learning the concepts.
Assignments
Important: For solutions to these assignments once they have been released, see the main website
Handouts
Lectures
Readings
Guides
How does the computer make sense of the questions we ask in the interpreter?
scm> (+ 1 2)
3
When we type the characters, (+ 1 2)
, how does a computer understand the
meaning of each character: the open parenthesis (
, the plus symbol, +
, the
space character , and so forth?
Interpreters
An interpreter is a program which can directly execute instructions in a
programming language like Python or Scheme. To do this, the interpreter needs
to understand how the rules of a language work, and to do that, the interpreter
needs to be designed (by a human, usually!) to follow those rules. In CS 61A,
we'd like to learn how the interpreter program works and understand the ideas
that go into an interpreter to reveal the magic behind how a computer makes
sense of (+ 1 2)
in Scheme and 1 + 2
in Python.
REPL
The interpreters we study in this course are designed around a Read-Eval-Print-Loop.
The interpreter waits for our input, looping until we type in a string of
characters, (+ 1 2)
.
In the Read stage, we take the string of characters and convert those characters into data structures that can be understood by the interpreter. In the Scheme interpreter, we'd like to take the string of characters
(+ 1 2)
and convert it into aPair
structure,Pair('+', Pair(1, Pair(2, nil)))
.- The Lexer takes the string
(+ 1 2)
and splits into meaningful tokens wherever there is whitespace or a syntax character like the open parenthesis(
. The lexer returns a structure that kind of resembles a list of individual characters like['(', '+', 1, 2, ')']
. - The Parser takes the list of smaller strings and converts them into
a data structure that understands the nesting in the language. It's not too
tricky to convert the example above from
['(', '+', '1', '2', ')']
toPair('+', Pair(1, Pair(2, nil)))
. But the pair structure is much more useful when we want to work on a more nested example like(= (+ 1 2) 3)
which the lexer converts into the flat list['(', '=', '(', '+', 1, 2, ')', 3, ')']
. As the expressions get more and more complicated, using a nested data structure helps the computer make sense of the expression in chunks, much like how we humans make sense of a Scheme expression as well.
- The Lexer takes the string
In the Eval stage, we evaluate the nested data structure to a value.
- If the expression is simple, like a number or a boolean, we can just
return the number or boolean itself without any further evaluation. Evaluating
1 in the global frame returns the number
1
. If we pass in a name like 'x', we'll try to evaluate the name 'x' in the current environment, looking up through the parent frames as necessary. - If we pass a call expression like
Pair('+', Pair(1, Pair(2, nil)))
into the evaluator, we'll follow the call expression evaluation rules to determine the value of the value of the expression. We'll evaluate the operator,'+'
, to the built-in procedure which adds values, and each operand, 1 and 2, before applying the procedure to the arguments. - if the expression is a special form like
Pair('define', Pair('x', Pair(1, nil)))
(after parsing(define x 1)
), then we'll follow the special form evaluation rules defined in the Scheme interpreter. In this example, we don't need to evaluate the name 'x' (it's just a name!) but we do need to evaluate the value, the number1
.
- If the expression is simple, like a number or a boolean, we can just
return the number or boolean itself without any further evaluation. Evaluating
1 in the global frame returns the number
- In the Print stage, we take the value we determined in the evaluation stage and figure out how to display the value to the user. Even if an expression evaluates to a number, the computer actually stores that number in its memory as a bunch of 0 and 1 binary digits! We take that number and convert it back to a more readable representation like '3'!
- Once we've displayed the value, we can wait in a Loop until the user asks another question to the interpreter.
There are some special rules in the parser to handle scenarios like the quote
operator '
. Run python3 scheme_reader.py
to explore its behavior.
read> '1
str : (quote 1)
repr: Pair('quote', Pair(1, nil))
read> '(1 2 3)
str : (quote (1 2 3))
repr: Pair('quote', Pair(Pair(1, Pair(2, Pair(3, nil))), nil))
Likewise, the evaluator also needs to handle the behavior of special forms differently from standard call expressions. The Scheme Specification contains more information, and the special form evaluation rules will also be briefly described by example here.
Counting Calls
One way of understanding what occurs in the Scheme interpreter is by counting
the number of calls to scheme_eval
that occur when evaluating an expression.
Atomic data types like numbers, booleans, strings, symbols, etc. only take a single step to evaluate. Primitive expressions like names evaluate to a value in one step as well.
Expression: 1
Eval: - (1 scheme_eval)
Expression: #t
Eval: -- (1 scheme_eval)
Expression: 'cs61a
Eval: ------ (1 scheme_eval)
Call Expressions
Call expressions follow the call expression evaluation procedure.
Expression: (+ 1 2)
Eval: -------
- - - (4 scheme_eval)
Apply: ~~~~~~~ (1 scheme_apply)
Expression: (= (quotient 5 1) (+ 1 2))
Eval: --------------------------
- --------------
-------- - -
Apply: ~~~~~~~~~~~~~~
Eval: -------
- - - (10 scheme_eval)
Apply: ~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~ (3 scheme_apply)
Special Forms
Special forms follow different evaluation rules. A more complete description of their behaviors is described in the Scheme Project and the Scheme Specification.
Unlike call expressions where we need to evaluate the operator, special forms
do not need to evaluate the first term. In scheme_eval
, the logic looks like
this.
SPECIAL_FORMS = {
'and': do_and_form,
'begin': do_begin_form,
'cond': do_cond_form,
'define': do_define_form,
'if': do_if_form,
'lambda': do_lambda_form,
'let': do_let_form,
'or': do_or_form,
'quote': do_quote_form,
}
first, rest = expr.first, expr.second
if scheme_symbolp(first) and first in SPECIAL_FORMS:
return SPECIAL_FORMS[first](rest, env)
Instead of continuing with the call expression evaluation procedure, we handle the special form with its own set of rules.
Expression: (if (= 1 1) #t #f)
Eval: ------------------
------- (note that if is not evaluated)
- - -
Apply: ~~~~~~~ (1 scheme_apply)
Eval: -- (6 scheme_eval)
(note that #f is not evaluated)
We can also trace through how our Scheme interpreter handles another special
form, cond
.
Expression: (cond ((= 1 2) #f) (else #t))
Eval: -----------------------------
-------
- - -
Apply: ~~~~~~~ (1 scheme_apply)
Eval: -- (6 scheme_eval)
Practice Problems
Q1: Eval/Apply
This question is from the CS 61A Summer 2017 Final Exam.
With your Project 4 implementation, how many total calls to scheme_eval
and
scheme_apply
would be made when evaluating the following two expressions?
Assume that you are not using the tail call optimized scheme_eval_optimized
function.
Suppose we have already evaluated the following definition in the current environment.
(define lst (list 1 2 3))
How many calls are made to scheme_eval
and scheme_apply
?
(+ (car lst) (- 5 (car (cdr lst))))
13 eval calls, 5 apply calls
How many calls are made to scheme_eval
and scheme_apply
?
((if (or (null? lst) (null? (cdr lst)))
(lambda (s) 0)
(lambda (s) (car (cdr s))))
lst)
18 eval calls, 6 apply calls