CSC/MAT 208 (Spring 2024)

Reading: Concrete Evaluation

We begin our journey into the foundations of computer science by first studying one of its key applications to computer programming: program correctness. Rigorously stating and proving program properties will require us to deep-dive into mathematical logic, the sub-field of mathematics devoted to modeling deductive reasoning. Deductive reasoning is the foundation for virtually all activities in computer science, whether you are designing an algorithm, verifying the correctness of a circuit, or assessing a program’s complexity. In this manner, this initial portion of our investigation of the foundations of computer science is, quite literally, the cornerstone of everything you do in this field moving forward.

Why Racket (Scheme Standard)?

In this course, we’ll use the Racket programming language as our vehicle for studying program correctness. We’ll state and prove properties of Racket programs rather than use a more conventional language like Python, C, or Java. One reason that we will do this is purely logistics: this course’s only pre-requisite is CSC 151, where we use mini Scheme language Camper. Because of this, Racket is the only programming language that I can assume you know.

However, there is a much deeper reason why we’ll use Racket. We will see shortly that pure, functional languages admit for a simple, substitution-based model of computation. By “pure,” we mean that the language does not allow functions to produce side-effects. A side-effect is some behavior observable to the caller of the function beyond the function’s return value.

The canonical example of a side-effect is the mutation of global variables. For example, in C:

int global = 0;   // a global variable

void increment_global(void) {
    global = global + 1;
}

int main(void) {
    printf("%d", global);   // 0
    increment_global();
    printf("%d", global);   // 1
    increment_global();
    printf("%d", global);   // 2
    increment_global();
    increment_global();
    printf("%d", global);   // 4
    return 0;
}

The increment_global function takes no parameters and produces no output (void). Its sole task is to produce a side-effect: change the state of global by incrementing its value. Since the function does not return a value, main can only use increment_global for its side-effect by calling the function and observing the change to global.

We’ll have more to say about side-effects and their relationship with the substitutive model of computation we introduce later in this reading. For now, note that ultimately it is not the language that is important here. What is important is whether we are considering a pure fragment of that language, instead. Indeed, the lessons we learn here regarding program correctness apply to any program that behaves in a pure manner, i.e., has no side-effects. For example, we can easily translate our model of computation to the pure fragment Python!

This fact supports a general maxim about programming you have hopefully heard at some point:

Minimize the number of side-effects in your code.

Intuitively, complicated code with side-effects is frequently tricky to reason about, e.g., explicit pointer manipulation in C. Our study of program correctness in this course will give us concrete reasons why this is indeed the case.

Program Correctness

Program correctness is a particularly topical application of logic to start with because every developer cares that their programs are correct. However, what do we mean by correctness? For example consider the following Racket function that appends two lists together:

(define list-append
  (lambda (l1 l2)
    (if (null? l1)
        l2
        (cons (car l1) (list-append (cdr l1) l2)))))

What does it mean for list-append to be “correct”? Of course, we have a strong intuition about how list-append should behave: the result of list-append should be a list that “glues” l2 onto the end of l1. But being more specific about what this means without saying the word “append” is tricky! As we try to crystallize what it means for list-append to be correct, we run into two issues:

Generality

A natural way to specify the correctness of list-append is through a test suite. For example, in Racket, we can write a simple collection of tests using rackunit:

(require rackunit)

(define expected (list 1 2 3 4 5))

(check-equal? expected (list-append (list 1 2) (list 3 4 5)))
(check-equal? expected (list-append null (list 1 2 3 4 5)))
(check-equal? expected (list-append (list 1 2 3 4 5) null))

Note that these tests exemplify very specific properties about the behavior of list-append. A test case demands that when a function is given a particular set of inputs, e.g., (list 1 2) and (list 3 4 5), that the function must produce the output (list 1 2 3 4 5). You can’t get more specific about the behavior of list-append than that!

However, are these tests enough? While they cover a wide variety of the potential behavior of list-append, they certainly don’t cover every possible execution of the function. For example, the tests don’t cover when list-append is given two empty (null) lists. They also don’t cover the case where the expected output is not (list 1 2 3 4 5)! In this sense, while tests are highly specific, they do not necessarily generalize to the full behavior of the function. This always leaves us with an inkling of doubt that our functions are truly correct.

Specificity

How might we arrive at a more general notion of program correctness? Rather than thinking about concrete test cases, we might consider more abstract propositions about the behavior of the function. Such abstract propositions are typically framed in terms of relationships between the inputs and outputs of the function. With list-append, we might think about the lengths of the input and output lists and how we might phrase our intuition about the function “gluing” its arguments together. This insight leads to the following property that we might check of the function:

For all lists l1 and l2, (+ (length l1) (length l2)) = (length (list-append l1 l2)).

To put another way, the lengths of l1 and l2 are preserved through list-append. That is, list-append doesn’t (a) add any elements to the output beyond those found in the input, and (b) does not remove any elements.

By virtue of its statement, this property applies to all possible inputs to list-append. So if we were able to prove that this property held, we knew that it holds for the function, irrespective of the inputs that you pass to the function (assuming that the inputs are lists, of course). This stands in contrast to test cases where we make claims over individual pairs of inputs and outputs.

However, unlike a test case which is highly indicative of our intuition of correctness for this function, this property is only an approximation of our intuition. To put it another way, even if we know this property holds of every input to list-append, it doesn’t mean that the function is correct! Here is an example implementation of list-append where the property holds, but the function is not “correct”:

(define bad-list-append
  (lambda (l1 l2)
    (if (null? l1)
        l2
        (cons 0 (bad-list-append (cdr l1) l2)))))

The lengths of the input lists are preserved with bad-list-append. However, the output is not the result of gluing together the input lists!

(bad-list-append (list 1 2 3) (list 4 5))
> '(0 0 0 4 5)

Every element of l1 is replaced with 0! In this sense, our length-property of list-append is general—it applies to every possible set of inputs to the function—but not specific—the property only partially implies correctness.

Balancing Generality and Specificity

In summary, we’ve seen that test cases and general program properties sit on opposite ends of the generality-specificity spectrum. When we think about program correctness and verifying the behavior of programs, we are always trying to find the balance between generality and specify in our verification techniques. Ultimately, you can never be 100% sure that your program works in all cases—after all, a stray gamma ray can ruin whatever verification you have done about your program. So when employing program verification techniques—testing, automated analysis, or formal proof—you must ultimately make an engineering decision based on context, needs, and available time and resources. We won’t have time to dedicate substantial time to these pragmatic concerns in this course, but keep them in the back of your mind as we introduce formal reasoning for program correctness in the sections that follow.

Exercise: Consider the Racket function (lcm x y) which returns the least-common multiple (LCM) of the numbers x and y. For example, the LCM of 3 and 4 is 12, and the LCM of 6 and 15 is 30.

  1. Write a few tests for this function.
  2. Write a few properties of this function’s behavior that implies its correctness.

A Substitutive Model of Computation

In order to rigorously prove that a property holds of a program, we must first have a rigorous model of how a program executes. You likely have some intuition about Racket programs operate. For example, try to deduce what this poorly named function does:

(define foo
  (lambda (bar)
    (if (null? bar)
        0
        (+ 1 (foo (cdr bar))))))

How might you proceed? You might imagine taking an example input, e.g., (list 9 1 2), and predicting how the function processes that input. With a few examples and prior knowledge of how conditionals and recursion work, you can likely deduce that this function calculates the length of the input list given to it. Our first goal, therefore, is to formalize this intuition, namely, explicate the rules that govern how Racket programs operate.

For a pure, functional programming language like the subset of Racket that we work with, we can use a simple substitutive model of computation that extends how we evaluate arithmetic expressions. While simple, this model is capable of capturing the behavior of any Racket program we might consider in this course.

Review: Arithmetic Expressions

First, let’s recall the basic definitions and rules surrounding arithmetic expressions. We can divide these into two sorts:

Syntax

Here is an example of a well-formed arithmetic expression:

\[ 8 × (10 - 5 ÷ 2) \]

In contrast, here is an ill-formed arithmetic expression:

\[ (4 + 5) - ( × 9) \]

In this ill-formed arithmetic expression, the multiplication operator (×) is missing a left-hand argument.

We can formalize this intuition about how to create a well-formed arithmetic expression with a grammar. Similarly to natural language, grammars concisely define syntactic categories as well as rules of formation for various textual objects. Here is a grammar for arithmetic expressions:

e ::= <number> | e1 + e2 | e1 - e2 | e1 × e2 | e1 ÷ e2 | (e)

To the left of the ::= symbol is a variable, e, that represents a particular syntactic category. To the right are the rules for forming an element of that category. Here, the syntactic category is “arithmetic expressions,” traditionally represented with the variable e. The rules are given as a collection of possible forms or alternatives, separated by pipes (|). The grammar says that a well-formed expression is either:

About Pronunciation: it may not seem like it, but pronunciation is an excellent tool for internalizing the meaning of a collection of mathematical symbols. Every collection of mathematical symbols is really a sentence in the natural language sense. In our grammar, we can pronounce ::= as “is” and | as “or,” so we obtain: “e is a number, or an addition, or … .” This makes it more intuitive and clear what the symbols represent. As we introduce more mathematical symbols throughout the course, always look for ways to pronounce these symbols as cohesive sentences.

Importantly, the grammar is recursive in nature: the various alternatives contain expressions inside of them. This allows us to systematically break down an arithmetic expression into smaller parts, a fact that we leverage when we evaluate expressions. For example:

Note that when interpreting symbols, there is sometimes ambiguity as to how different symbols group together. For example, consider the sub-expression from above:

\[ 10 - 5 ÷ 2. \]

Should we interpret the expression as \(10\) and \(5 ÷ 2\) as in the example or alternatively \(10 - 5\) and \(2\). Our knowledge of arithmetic tells us that the first interpretation is correct because the \((÷)\) operator takes precedence over the \(-\) operator. Rules of precedence are, therefore, additional syntactic rules that we must consider in some cases. There also exist rules of associativity for some grammars that govern the order in which we group symbols when the same operator is invoked multiple times. For example, we traditionally interpret subtraction as left-associative. That is \(1 - 2 - 3\) is understood to be \((1 - 2) - 3\) rather than \(1 - (2 - 3\), i.e., \(1 - 2\) goes first rather than \(2 - 3\).

Semantics

Once we have established what a well-formed expression looks like, we then need to define what it means. In general, the meaning of a well-formed object is dependent entirely on the purpose of said object. For arithmetic expressions, we care about computing the result of such an expression, so we take that to be the meaning or semantics of an expression.

Traditionally, we define the rules for computing or evaluating an arithmetic expression as follows:

  1. Find the next sub-expression to evaluate using the rules of associativity and precedence to resolve ambiguity.
  2. Substitute the value that the sub-expression evaluates to for that sub-expression in question.
  3. Repeat the process until you are left with a final value.

A value is simply an expression that can not be evaluated or simplified any further. For arithmetic expressions, any number is a value.

By repeating this process, we obtain a series of evaluation steps that an arithmetic expression takes before arriving at a final value. For example, here are the steps of evaluation taken for our sample expression from above:

    8 × (10 - 5 ÷ 2)
--> 8 × (10 - 2.5)
--> 8 × 7.5
--> 60

Note that at each step, the resulting expressions are equivalent, e.g., \(8 × (10 - 5 ÷ 2)\) and \(8 × 7.5\) are equivalent; both of these expressions are also equivalent to the value \(60\).

Exercise: For each of the following arithmetic expressions, determine if (a) they are well-formed and (b) if they are well-formed, what is their step-by-step evaluation to a final value.

  1. \(100 ÷ (25 × 5 - 3)\).
  2. \(38 - × 5 + 2)\).
  3. \(1 + 2 - 3 + 4 × 5 ÷ 6 - 7\).

Core Racket and the Substitutive Model

Our substitutive computational model for Racket extends the rules for evaluating arithmetic expressions. Like arithmetic expressions, we will think of Racket programs as an expression that evaluates to a final value. And like arithmetic, programs operate by stepwise evaluation, where we find the next sub-expression to evaluates and substitute. In fact, Racket simplifies the arithmetic quite a bit, at least as far as syntax is concerned! Recall that Racket only uses prefix operators, i.e., we write (+ 1 2) instead of 1 + 2. Because of this, there is no ambiguity in interpreting a well-formed Racket expression—we need no rules for precedence!

For our purposes, well-formed Racket programs can be defined by the following grammar:

e ::= x                               (variables)
    | <number>                        (numbers)
    | #t                              (true)
    | #f                              (false)
    | (lambda (x1 ... xk) e)          (lambdas)
    | (e e1 ... ek)                   (func. app.)
    | (if e1 e2 e3)                   (conditionals)
    | (let* ([x1 e1] ... [xk ek]) e)  (let-bindings)

In other words, a Racket expression is either:

As an example, the body of the our list-append function:

(lambda (l1 l2)
  (if (null? l1)
      l2
      (cons x (list-append (cdr l1) l2))))

Is a well-formed expression that consists of a lambda of two arguments, l1 and l2. The body of the lambda is a conditional whose guard is a function application applying the variable l1 to the variable null?.

Top-level Definitions

In addition to expressions, Racket also features top-level definitions of the form:

(define <var> <expr>).

In our mental model, we’ll think of top-level definitions as defining aliases for expressions. For example, with the following definition:

(define inc
  (lambda (n)
    (+ n 1)))

Whenever we see the variable inc, we can think of it as shorthand for (lambda (n) (+ n 1)).

Values

In arithmetic, numbers were our only kind of value. In Racket, we have multiple kinds of values corresponding to the different types of Racket:

Substitutive Evaluation for Racket

The heart of Racket evaluation is function application. Function application directly generalizes arithmetic evaluation for operators: In arithmetic evaluation:

  1. We first evaluate the left- and right-hand arguments to the operator to values.
  2. We then apply the operator to those values.

For example, for \(e_1 + e_2\), we first evaluate \(e_1\) to a value, call it \(v_1\) and then evaluate \(e_2\) to a value, call it \(v_2\). We then carry out the addition of \(v_1\) and \(v_2\).

To evaluate a function application, we first evaluate each of the sub-expressions of the function application to values. For example, if we have the expression, (inc (+ 3 (* 5 2))), it evaluates as follows:

    (inc (+ 3 (* 5 2)))
--> (inc (+ 3 10))
--> (inc 13)

(Recall that inc is a defined variable, so it is shorthand for (lambda (n) (+ n 1)).)

Once we arrive at this point, we need to evaluate the actual function application itself. In the case of primitive operations, we just carry them out directly and substitute their resulting values into the overall expression. However, arbitrary functions will perform arbitrary operations, so we cannot simply just “substitute the result”; we must calculate it by hand!

To do this, we perform the following steps:

  1. We substitute the body of the function be applied to for the overall function application.
  2. We substitute the actual arguments passed to the function for each of its parameters.
  3. We then continue evaluation of the resulting expression like normal.

In the example above, this means substituting the body of the inc function, (+ n 1) for the application (inc 13). However, we also replace every occurrence of the parameter n of the function for the value passed in, 13. This results in the following steps of evaluation:

    (inc 13)
--> (+ 13 1)
--> 14

Note that these rules require that the first expression of a function application evaluates to a lambda. Otherwise, we have no function to perform these steps on! This is why you receive the following error in Racket:

> (0 1)
application: not a procedure;
 expected a procedure that can be applied to arguments
  given: 0
  arguments...:

These rules apply to functions of multiple arguments as well. For example, consider the following definition of a function that averages three numbers:

(define avg-3
  (lambda (x1 x2 x3)
    (/ (+ x1 x2 x3)
       3)))

Then the expression (avg-3 (* 2 5) 8 (+ 1 1)) evaluates as follows:

    (avg-3 (* 2 5) 8 (+ 2 1))
--> (avg-3 10 8 (+ 2 1))
--> (avg-3 10 8 3)
--> (/ (+ 10 8 3) 3)
--> (/ 21 3)
--> 7

Note that we evaluate the arguments to avg-3 in left-to-right order, one argument at a time.

Exercise: Consider the following Racket top-level definitions:

(define f
  (lambda (x y)
    (+ x y)))

(define g
  (lambda (a b)
    (- (f a a) (f b b))))

Use your mental model of computation to give the step-by-step evaluation of the following Racket expression: (g 5 3). Check your work by evaluating this expression in DrRacket and verifying that you get the same final result.

Conditionals

Conditionals, while they look similar at first glance, behave differently from functions. Recall that conditional has the form (if e1 e2 e3) where e1 is the guard, e2 is the if-branch, and e3 is the else-branch. To evaluate a conditional:

  1. We first evaluate the guard to a value. This value ought to be a boolean value, i.e., #t or #f.
  2. If the guard evaluates to #t, we substitute e2 for the entire conditional.
  3. Otherwise, the guard must have evaluated to #f. We, therefore, substitute e3 for the entire conditional.
  4. We then continue evaluating the resulting expression.

Here is an example of a conditional within our substitutive model:

    (+ 1 (if (< 3 5) 2 (* 5 5)))
--> (+ 1 (if #t 2 (* 5 5)))
--> (+ 1 2)
--> 3

If we treated if as a function application, we would then evaluate e1, e2, and e3 in order. However, instead, we first evaluate e1 and then selectively evaluates e2 or e3 based on what e1 evaluates to. This is why if is considered a special form in Racket—it doesn’t follow the standard rules of function application.

Let-binding

In Racket, we use let*-binding to be able to name intermediate computations in an expression for the purposes of readability or performance (i.e., to avoid redundant work).

A let*-binding has the following form:

(let* ([x1 e1]
       ...
       [xk ek])
  e)

The ellipses indicate that there is a collection of bindings encased in square brackets of the form [x e] where x is a variable, and e is an expression. To evaluate this expression, we do the following:

  1. We first evaluate each binding of the let*:
    1. For each binding of the form [x e], we first evaluate e to a value, call it v.
    2. We then substitute value v for variable x in every successive binding and the body expression e of the let*.
  2. Once all bindings have been evaluated and substituted away, we substitute the body of the let* e (with all of its local bindings substituted away) the overall let* expression.
  3. We finally continue evaluation like normal.

Here’s an example of the step-by-step evaluation of a let*-binding:

    (let* ([x (+ 1 1)]
           [y (* x 2)]
           [z (* x y)])
     (+ x y z))
--> (let* ([x 2      ]
           [y (* x 2)]
           [z (* x y)])
     (+ x y z))
--> (let* ([y (* 2 2)]
           [z (* 2 y)])
     (+ 2 y z))
--> (let* ([y 4]
           [z (* 2 y)])
     (+ 2 y z))
--> (let* ([z (* 2 4)])
     (+ 2 4 z))
--> (let* ([z 8])
     (+ 2 4 z))
--> (+ 2 4 8)
--> 14

A note on precision: while performing these step-by-step derivations, you might be tempted to skip steps, e.g., evaluating all the arguments of a function call at once down to a single value. Resist this urge! One of the “hidden” skills we want to develop in working through these derivations is unfailing attention to detail. We want to be able to carry out mechanical processes like these derivations exactly as advertised.

This is useful for two purposes:

  1. Such attention to detail is a skill necessary for effective algorithmic design. By maintaining this level of precision, you are more likely to catch corner cases and oddities in your design that you can fix before they later become bugs in code.
  2. More immediate for our purposes, skipping steps in deductive reasoning is the number one way to introduce erroneous reasoning in your proofs! Mechanics don’t lie, so being able to explicate every step of a logical argument is necessary to be certain that your reasoning is sound.

Later in the course, we will discuss lifting our reasoning to more high-level concerns where we are free to skip low-level steps so that we don’t lose sight of the bigger picture of what we are trying to prove. However, our high-level reasoning is always backed by low-level mechanics. It is imperative your develop those mechanics now, otherwise, your high-level reasoning will be baseless!

Additional Exercises

Exercise (Additional Tracing): Consider the following Racket top-level definitions:

(define compute-3
  (lambda (x y z)
    (let* ([intermediate (+ x y)])
      (+ (* z intermediate) intermediate))))

(define triple
  (lambda (n) (compute-3 n n n)))

Give the step-by-step evaluation (i.e., evaluation traces) for each of the following expressions. Make sure to write down all steps of evaluation as required by our substitutive model of computation!

  1. (+ 3 (* 5 (/ 2 (- 10 5))))
  2. (compute-3 (* 2 3) (+ 8 3) (/ 1 2))
  3. (triple (+ 5 0.25))

Make sure to check your work by entering these expressions into Racket!