Lambda Calculus in K

a minimalist normal-order evaluator

by ngn, 27 IX 2023

Introduction

A term is one of:

a variable, such as x
an abstraction: λx.M where x is a variable (the parameter) and M is a term (the body)
an application: (M N) where M and N are terms (the function and argument)

An expression is a string generated by the following grammar:

<Var> ::= "A".."Z" | "a".."z"
<Atom> ::= <Var> | "^" <Var> "." <Expr> | "(" <Expr> ")"
<Expr> ::= <Atom> | <Expr> <Atom>

Note that applications are left-associative (abc = (ab)c) and abstractions have long right scope (^a.b^c.de = ^a.(b(^c.(de)))).

For simplicity, in expressions we:

ban whitespace
write ^ instead of λ
allow only single-letter case-sensitive variable names
allow only one parameter after the ^ (although traditionally ^xy.M is treated as short for ^x.^y.M)

Ignoring superfluous parentheses, an expression's parse tree naturally corresponds to the tree-like recursive structure of a term.

A binding is a pair of nodes in that tree:

a variable node
its closest ancestor λ-node whose parameter has a matching name

Depending on whether such a binding exists for a given variable node, we call it either a bound variable or a free variable.

In the following example x and y are are bound and z is free:

             binding
          ┌─┐
   ^x.^y.^x.xyz
       └─────┘ \
binding         free

α-conversion is a transformation of a term, in which a parameter and all variable nodes bound to it are given a new name that doesn't change any existing bindings or create new ones. We say that two terms are α-equivalent if a series of such transformations can lead from one to the other. Thus, parameter names are ignored in α-equivalence, and the only thing that matters about them is the structure of their bindings.

De Bruijn indexing is a technique to eliminate parameter names: we replace each bound variable with an integer - the 1-based index of its binding buddy in the list of ancestor λ-nodes in closest-to-farthest order. Free variables are not replaced.

Using the previous example: ^x.^y.^x.xyz becomes ^x.^y.^x.12z or in a possible alternative notation: ^^^12z.

β-reduction is a transformation of a term in which an abstraction node is applied to an argument (any node): (^x.M)N becomes M[x:=N], meaning M where all bindings of the parameter x replaced with N.

A β-reducibe expression - one of the form (^x.M)N - is called a redex. There may be multiple redexes among the descendants of a given term.

Evaluation is the transitive closure of β-reduction: while there is at least one redex in the term, we keep performing β-reductions. This process may or may not terminate, depending on the term and the strategy of choosing redexes to reduce. A term that cannot be β-reduced any further is the normal form of the original term. Two terms with the same normal form are β-equivalent. If it exists, the normal form is unique, regardless of the order in which we choose redexes for β-reduction, however some strategies may lead to an evaluation process that never terminates.

Normal order is a strategy for choosing a redex - always pick the leftmost outermost one; in other words, the first in depth-first search order. If a normal form exists, normal order evaluation is guaranteed to reach it.

Task

Let's implement a normal-order evaluator for λ-calculus that accepts an expression and returns some representation of its normal form.

We can split this into a parser p and a single evaluation step n. Then the full evaluator will be the composition of p and a repeated application of n until convergence: n/p@

In order to test, we can choose a pair of expressions: an arbitrary input expression and its expected normal form, for instance "(^x.yx)z" and "yz". Then we can evaluate the first with n/p@, parse the second with p, and compare the results for α-equivalence, which should be just ~ (K's "match" verb) if de Bruijn indices are used.

(n/p"(^x.yx)z")~p"yz" /test: 1=success 0=failure

Representation of terms

An important question about the implementation is: how do we represent λ-calculus terms in K? There are many valid ways. In this implementation we represent:

a free variable as a K symbol, e.g. "x" → `x
a bound variable as a positive integer - the de Bruijn index, e.g. the x in "^x.x" → 1
an abstraction as a list of length 1 containing the body, e.g. "^x.x" → ,1; the parameter name is not stored
an application as a list of length 2, e.g. "xy" → (`x;`y) or equivalently `x`y

This comment summarizes the above:

/λ:  fv  bv  ^x.M  (M N)
/k:  `x   1    ,M  (M;N)

For convenience, we declare a few functions to serve as predicates for the different types of terms:

F:`s=@:       /a free variable has k type symbol
V:{x~*x}      /a variable (free or bound) is a k atom, i.e. it matches its first item
A:2=#:        /an application has length 2
L:{V[x]<1=#x} /an abstraction (also called a λ) has length 1 but is not a variable

Tokenization

Thanks to the simplicity of the grammar, there isn't much to do here. All tokens are single-character tokens. The only thing we need to know about them is whether they are alphabetic and whether they are allowed in an expression at all.

We make a list that maps (the ASCII codes of) all letters to "a", and maps the four punctuation characters "()^." to themselves.

c:@[128#"";"()^.",,"Aa"+\:!26;:;"()^.a"]

Illegal characters are mapped to " ". That includes those outside the ASCII range 0..127, thanks to K's outdexing (out-of-bounds indexing) which returns typed nulls - spaces in this case.

Parsing

The core of the parser will consist of two mutually recursive functions:

q parses a sequence of applications, making sure they associate to the left
r parses an "atom" (in the λ-calculus sense) - a variable or abstraction or parenthesised application

q and r take two arguments: x - a stack of parameter names (because we want to compute de Bruijn indices as we parse), and y - an index in the expression string (where to start parsing). The result from q and r is a pair (a;i) where a is the parsed term and i is an index in the expression string indicating how far the parser went.

There will also be a separate function that serves as the entry point to the parser. It sets things up at the beginning and checks for incomplete parsing at the end:

p:{s::x            /put the expression string in a global variable s
   (a;i):q[0#`;0]  /start with an empty stack 0#` and call q starting from position 0
   :[i<#s;e i;a]}  /if we couldn't reach the end of s, signal an error (e) at position i, otherwise return the term

The function for error reporting prints s, tries to point to the problematic location with ASCII art, and throws a K error:

e:{`0:(s;"_^"@&x,1)
   `err"lc"}

Now let's look into q and r - the hardest part.

q starts by invoking r once. Then, while the current character is a letter or ( or ^, it tries to call r again and combine the two results. q never deals with abstraction nodes, so the parameter stack is always passed unchanged to r.

But how does q combine two results from r? Let's call them (a;i) and (b;j). Here a and b are terms that must form an application node, so we have to pair them up like this: (a;b) to make the resulting term. We also have to return how far we got in s - that is j. So, the result of the while loop's body should be ((a;b);j). With a bit of golfing trickery, we can transform that into ((a;i),(b;j))(0 2;3) which corresponds to the (y,r[x;y 1])(0 2;3) below:

      while loop's      while loop's      currying with x
       condition            body          │
           │                 │            │      initial call to r
   ┌───────┴───────┐┌────────┴──────────┐ │  ┌───┴┐
   │               ││                   │ │  │    │
q:{{|/"a(^"=c s x 1}{(y,r[x;y 1])(0 2;3)}[x]/r[x;y]}
                      │ │      │
                      │ └───┬──┘
                    (a;i) (b;j)

The implementation of r is larger but more straightforward:

r:{:["("=s y;:[")"=s@*|a:q[x;1+y];@[a;1;+;1];e y]
     "^a."~c s y+!3;@[;0;,:]q[x,`$s 1+y;3+y]
     "a"=c s y;(v^(`,|x)?v:`$s y;1+y)
     e y]}

It's a case analysis:

if the current char is (, then add 1 to y, delegate to q, and make sure there's a ) after it's done
if the next three chars are ^, followed by a letter, followed by ., append the parameter name to the stack x, call q, and enlist (,:) the resulting term using K's amend (@[;0;,:])
if the current char is a letter, convert it to a symbol (`$) and find its 1-based index in the reversed parameter stack ((`,|x)?v). If not found, fill with the variable name itself (v^)
otherwise, report an error

β-reduction

β-reduction is at the heart of λ-calculus evaluation.

The task of transforming the redex (^x.M)N into a copy of M where all bindings of x are replaced with N, without breaking any other bindings, is not as trivial as it sounds. M and N could be arbitrary expressions that contain their own hierarchies of abstraction nodes, so we must take care to readjust any affected de Bruijn indices in them.

But why bother with de Bruijn indices if they require adjustments? Wouldn't it be easier to keep the names? It turns out: no, it's actually a lot harder to deal with the possibility of name shadowing or the accidental binding of a variable that was meant to remain free, for instance the reduction of ^x.(^y.^x.xy)x would require renaming one of the x-s to avoid a name clash.

To a first approximation, the β-reduction of a redex as a tree looks something like this:

        │                         M.........................
        │                         .                        .
   ┌───app───┐                    .              .......N  .
  λx.     ...│..N                 .              .      .  .
   │      .  ╵  .        β        .  .......N    .      .  .
M..│...   .     .      ====>      .  .      .    ........  .
.  ╵  .   .     .                 .  .      .              .
.   x .   .......                 .  ........              .
. x   .                           .                        .
.......                           ..........................

Here M and N are subtrees that may contain their own internal λ-nodes. To build some intuition about the necessary adjustments, let's replace the x-s under M with de Bruijn indices and add some sample internal structure to M and N:

                                                            M..................│..............
                      │                                     .                  │             .
      redex →   ┌────app─────┐                              .           ┌─────app─────┐      .
   function →   λ            │                              .           λ             │      .
                │        ....│...N   ← argument             .           │             λ      .
body →   M......│......  .   │   .                          .           λ         ....│...N  .
         .   ┌─app──┐ .  .   │   .                          .           │         .   λ   .  .
         .   λ      │ .  .   λ   .                          .      ┌───app───┐    .   │   .  .
         .   │      λ .  .   │   .                          .      │         │    . ┌app┐ .  .
         .   λ      │ .  . ┌app┐ .                          .  ....│...N     │    . │   λ .  .
         .   │      2 .  . │   λ .             β            .  .   λ   .   ┌app┐  . 6   │ .  .
         . ┌app─┐     .  . 5   │ .           ====>          .  .   │   .   1   8  .     1 .  .
         . │    │     .  .     1 .                          .  . ┌app┐ .          .........  .
         . 3  ┌app┐   .  .........                          .  . │   λ .                     .
         .    1   9   .                                     .  . 7   │ .                     .
         ..............                                     .  .     1 .                     .
                                                            .  .........                     .
                                                            .                                .
                                                            ..................................

Here, under M on the left, the 2 and 3 are bound to the function's parameter (the former x) and will have to be replaced, the 1 is bound locally within M, and the 9 is bound to a λ-node above the redex.

Under N on the left, the 1 is local, and the 5 is bound to a λ above the redex.

In order to find all replaceable x-s in M, we traverse it recursively, while keeping track of the current λ-depth: the number of λs that are in M on the path to the currently visited node - let's call that ΔM. We'll start the recursion with ΔM=0.

When we encounter a bound variable, we have 3 possibilities:

if its de Bruijn index is equal to ΔM+1, we'll have to replace it with N (more about that - later)
if its de Bruijn index is less than ΔM+1, it's a local, so we leave it alone
if its de Bruijn index is more than ΔM+1, it's bound to a λ above the redex and, taking into account that the function node will disappear after the β-reduction, we have to decrement the de Bruijn index.

As for other types of nodes, we simply go through them recursively, only updating ΔM when passing through a λ-node.

The following code implements that recursive procedure:

/traversing the body M
/  x: the argument - N
/  y: λ-depth - ΔM
/  z: currently visited node
b:{:[~V z;o[x;y+L z]'z /if z is not a variable node, go through it recursively. if it's a λ-node, also add 1 to y
     F z;z             /if z is a free variable, return it as-is
     z=1+y;h[y;0;x]    /if z is a bound variable with de Bruijn index y+1, magic happens - it's replaced
     z-y<z]}           /if z>y (which also implies z>y+1 here), subtract 1 from z, otherwise keep the original z

When calling b to kick off the recursion, we will have to provide the argument node (N), an initial ΔM of 0, and the root of the body M (i.e. the body itself): b[argument;0;body].

Now let's go back to how the actual replacement should be done - the mysterious call to h in the implementation of b. It has to be another recursive procedure (say h, for "helper"), this time going through N and producing a copy of N suitable for putting at that particular place in M.

Like before, we recurse transparently through nodes that are not bound variables. Also like before, we keep track of the λ-depth in N as ΔN, starting from 0. But this time we have only two possibilities for a bound variable:

if its de Bruijn index is more than ΔN, it's a binding to a λ above the redex, so we have to increment the de Bruijn index by ΔM
if its de Bruijn index is less than or equal to ΔN, it's a local binding in N, so we leave it alone

Implementation of h:

/traversing the argument N
/  x: λ-depth in the body - ΔM (remains fixed for the duration of h's recursion)
/  y: λ-depth in the argument - ΔN
/  z: currently visited node
h:{:[~V z;o[x;y+L z]'z /if z is not a variable node, do like in b (note that here y is ΔN)
     F z;z             /if z is a free variable, return it as-is
     z+x*y<z]}         /if z is a bound variable, increment it by ΔM if and only if ΔN < the de Bruijn index

Evaluation

The final piece of the puzzle is a procedure to find a redex and apply β-reduction to it.

Function n goes through the descendants of a term recursively in depth-first search order. When it finds an application node that is also a redex (i.e. its first child is a λ-node), it calls b to perform β-reduction. When it finds an ordinary application, it recurses through its first child, and if the first child was left unchanged, it recurses into the second child too.

n:{:[V x;x                   /variable node - return as-is
     L x;o'x                 /λ node - recurse into its body
     L@*x;b[x 1;0;**x]       /app node, redex - β-reduction
     (v;(u~v:o u:*x)o/x 1)]} /app node, non-redex - recurse into the func; if unchanged, recurse into the arg too

(This is slightly different from the classic normal order, which would fall in an infinite loop if the β-reduction of a subterm is itself. But there's no harm - if classic normal order evalutation terminates, this will terminate too.)

All code again

F:`s=@:;V:{x~*x};A:2=#:;L:{V[x]<1=#x}
c:@[128#"";"()^.",,"Aa"+\:!26;:;"()^.a"]
e:{`0:(s;"_^"@&x,1);`err"lc"}
p:{s::x;(a;i):q[0#`;0];:[i<#s;e i;a]}
q:{{|/"a(^"=c s x 1}{(y,r[x;y 1])(0 2;3)}[x]/r[x;y]}
r:{:["("=s y;:[")"=s@*|a:q[x;1+y];@[a;1;+;1];e y]
     "^a."~c s y+!3;@[;0;,:]q[x,`$s 1+y;3+y]
     "a"=c s y;(v^(`,|x)?v:`$s y;1+y);e y]}
b:{:[~V z;o[x;y+L z]'z;F z;z;z=y+1;h[y;0;x];z-y<z]}
h:{:[~V z;o[x;y+L z]'z;F z;z;z+x*y<z]}
n:{:[V x;x;L x;o'x;L@*x;b[x 1;0;**x];(v;(u~v:o u:*x)o/x 1)]}

try online

Thanks

DiscoDoug for feedback and for showing how Apter trees can replace recursive data strucutures

Kamila for the blog post about λ-calculus in APL that piqued my interest in this topic

Suhr for sharing links to great educational material, including his own book

Garklein for finding typos