A recursive function that does not call itself

Consider the standard implementation of factorial:

>>> fact = lambda n: 1 if n < 2 else n * fact(n - 1)
>>> fact(5)
120

The lambda expression creates anonymous or inline functions in Python. However, here we are effectively naming the function by assigning it to the variable fact. This assignment is necessary because the function itself refers to fact, which means we cannot use this function inline.

If we insist on a truly anonymous version of factorial, it helps to think of self-reference not as a defining feature of recursion, but an implementation detail. It is merely a very convenient way to apply the same function on progressively smaller inputs. Without self-reference, we need a different way to talk about the same function. The trick is to create an enclosing scope and pass in the function itself:

>>> fact = lambda f: lambda n: 1 if n < 2 else n * f(f)(n - 1)

In the outer scope, f refers to fact, and since it now has an outer scope, f(f) is the inner function, ie, the actual computation. Hence we have to call fact as follows:

>>> fact(fact)(5)
120

Having removed self-reference, we can replace fact with its value to arrive at an inline factorial:

>>> (lambda f: lambda n: 1 if n < 2 else n * f(f)(n - 1))(
...     lambda f: lambda n: 1 if n < 2 else n * f(f)(n - 1)
... )(5)
120

While achieving the goal of a recursive function that does not call itself, the above example contains two copies of the factorial logic and the awkward f(f).

Let’s first remedy the latter problem by introducing a decorator function:

y_1 = lambda fact: lambda f: lambda n: fact(f(f))(n)

fact = y_1(lambda f: lambda n: 1 if n < 2 else n * f(n - 1))

The decorator y_1 allows us to write fact in a more natural way by factoring out the “unwrapping” of the enclosing scope. But as before,

>>> fact(fact)(5)
120

It is straightfoward to eliminate the double reference by having the decorator do the “passing in” as well:

y_2 = lambda fact: (lambda f: lambda n: fact(f(f))(n))(
    lambda f: lambda n: fact(f(f))(n)
)

fact = y_2(lambda f: lambda n: 1 if n < 2 else n * f(n - 1))

fact now works as a normal function should:

>>> fact(5)
120

Note that y_2 has nothing to do with factorials (parameter names are arbitrary). To confirm this, let’s apply it to a version of quicksort:

>>> y_2(
...     lambda f: lambda lst: []
...     if not lst  # base case
...     else f([n for n in lst[1:] if n < lst[0]])
...     + [lst[0]]
...     + f([n for n in lst[1:] if n >= lst[0]])
... )([4, 2, 5, 1, 3]) == [1, 2, 3, 4, 5]
True

We see that y_2 is part of a generic recipe to create recursive functions without self-reference. It is an implementation of the Y combinator, which was invented in the context of the lambda calculus, a formal language that does not have variables - so every “program” is basically an inline function call. Remarkably, the Y combinator shows that even such a minimal language is capable of doing recursive computation.

A recursive main function

In languages such as C, there is a privileged function - main - that is the entry point for execution. Running the program is tantamount to calling main, which in turn calls other functions.

Since main is a name in the global namespace, there is nothing to prevent it from calling itself. Consider this recursive main function that calculates the factorial:

main.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

  static int acc = 1;

  if (argc < 2) {
    printf("error: need argument\n");
    return 0;
  }

  int n = atoi(argv[1]);

  if (n <= 1) {
    printf("%d\n", acc);
    return 1;
  } else {
    acc = n * acc;
    sprintf(argv[1], "%d", --n);
    return main(argc, argv);
  }
}

main takes two inputs. argc is the number of command line arguments, and argv is an array of those arguments. The library function atoi converts a string to the integer it represents, and sprintf prints to a string rather than standard output. So we decrement n with each recursive call until the base case is reached, and print out the result. The accumulator acc is a static variable, which means it persists across function calls.

Here is an example output:

$ gcc main.c
$ ./a.out 5
120

The fact that this programs works is convincing evidence that main, other than being the entry point, is an ordinary function that can be called from another part of the code, including from itself.

Why is an omnivore a kind of vegetarian?

An important principle in object oriented design is that you can replace an object of one class with another object of its subclass, and the code should still work. For example, suppose we are creating a calorie counter for various foods. We have a parent class

class Food:
    def __init__(self, name, calories=0):
        self.name = name
        self.calories = calories

and child classes

class Vegetable(Food):
    is_leafy = True

and

class Meat(Food):
    is_bloody = True

Consider a function that returns the calories of an item of food:

def get_calories(food):
    return food.calories

Because a child class inherits the methods and attributes from its parent class - possibly overriding some and adding others - any code that expects Food will also work with Vegetable and Meat.

>>> get_calories(Food('gruel', 1))
1
>>> get_calories(Vegetable('potato', 100))
100
>>> get_calories(Meat('steak', 500))
500

This ability to replace with a subclass is known as the “Liskov Substitution Principle”.

Now let’s turn to the consumers of food. Here is a possible class hierarchy:

class Omnivore:
    def eat(self, food):
        print(food.name, "YUM!")
class Vegetarian(Omnivore):
    def eat(self, food):
        if not isinstance(food, Vegetable):
            raise Exception(food.name + " EWW")
        super().eat(food)
class Carnivore(Omnivore):
    def eat(self, food):
        if not isinstance(food, Meat):
            raise Exception(food.name + " EWW")
        super().eat(food)

Vegetarian and Carnivore override the eat method of Omnivore to raise an exception when fed an argument that violates their respective dietary restrictions. An omnivore can consume both Vegetable and Meat:

>>> guest = Omnivore()
>>> guest.eat(Vegetable('potato'))
potato YUM!
>>> guest.eat(Meat('steak'))
steak YUM!

But a vegetarian cannot consume Meat:

>>> guest = Vegetarian()
>>> guest.eat(Vegetable('potato'))
potato YUM!
>>> guest.eat(Meat('steak'))
Traceback (most recent call last):
...
Exception: steak EWW

The code breaks when an Omnivore is replaced by a Vegetarian. Therefore, the Liskov Substitution Principle implies, perhaps counterintuitively, that Vegetarian is not a subclass of Omnivore.

This dilemma is resolved by observing that a Vegetarian can be replaced by an Omnivore, and hence the class hierarchy should be inverted - Omnivore being a subclass of Vegetarian. Of course, we can repeat the argument for Carnivore, so Omnivore inherits from both Vegetarian and Carnivore:

class Vegetarian:
    def eat(self, food):
        if not isinstance(food, Vegetable):
            return super().eat(food)
        print(food.name, "YUM!")
class Carnivore:
    def eat(self, food):
        if not isinstance(food, Meat):
            return super().eat(food)
        print(food.name, "YUM!")
class Omnivore(Vegetarian, Carnivore):
    pass

The key difference here is that, instead of raising an error, Vegetarian and Carnivore pass the call to eat up the inheritance chain, hoping that another class is able to accept the kind of food. The implementation of Omnivore is now trivial and intuitive - nothing more than the union of the consumers of Vegetable and Meat.

>>> guest = Vegetarian()
>>> guest.eat(Vegetable('potato'))
potato YUM!
>>> guest = Omnivore()
>>> guest.eat(Vegetable('potato'))
potato YUM!
>>> guest = Carnivore()
>>> guest.eat(Meat('steak'))
steak YUM!
>>> guest = Omnivore()
>>> guest.eat(Meat('steak'))
steak YUM!

We see that a Vegetarian or Carnivore can be replaced by an Omnivore, so our class hierarchy obeys Liskov Substitution. This example also illustrates a general rule: a method of the child class should accept an argument that is less restrictive (or not more restrictive) than the corresponding method of the parent class. An Omnivore can eat either Vegetable or Meat, whereas a Vegetarian or Carnivore can only eat one of the food classes. In other words, even though a child class is more specific than its parent, the child’s methods take arguments that are more general. This property of arguments is known as “contravariance”.

Building a data structure with nothing but functions

A simple but far reaching consequence of being able to nest functions is that data can be stored in the enclosing scope - or closure - of the inner function. Consider the following function:

def make_node(value, next_node):
    return lambda func: func(value, next_node)

When make_node is called with two arguments, they are attached to the closure of the returned function. Subsequently, we can access the two arguments via

def value(node):
    return node(lambda value, next_node: value)

and

def next_node(node):
    return node(lambda value, next_node: next_node)

For example,

>>> node = make_node("head", "tail")
>>> value(node)
'head'
>>> next_node(node)
'tail'

With these functions in hand, it is easy to implement a stack as a linked list. We define the usual stack operations

def push(value, stack):
    return make_node(value, stack)

and

def pop(stack):
    return value(stack), next_node(stack)

The behavior is as expected:

>>> stack = None
>>> stack = push(1, stack)
>>> stack = push(2, stack)
>>> stack = push(3, stack)
>>> val, stack = pop(stack)
>>> val
3
>>> val, stack = pop(stack)
>>> val
2
>>> val, stack = pop(stack)
>>> val
1

It is remarkable that this stack does not use any built-in data structure such as an array, nor does it build the linked list by instantiating node objects. Rather, there is a chain of functions wherein each function’s closure contains a value and a reference to the next function.

This discussion was inspired by the classic Structure and Interpretation of Computer Programs. In particular, section 2.1 introduces the notion of using closures to build data structures.

Stackless merge sort

Merge sort divides a list into smaller lists of one or zero items, which are trivially sorted, and then merges them pairwise until one list remains. Here is a typical recursive implementation:

def mergesort(lst):

    if len(lst) < 2:
        return lst[:]

    mid = len(lst) // 2

    return merge(mergesort(lst[:mid]), mergesort(lst[mid:]))

Note that the base case returns a copy, not the original list, to be consistent with the general behavior. The merge function is:

def merge(left, right):

    merged = []

    while left and right:
        if left[0] <= right[0]:
            merged.append(left.pop(0))
        else:
            merged.append(right.pop(0))

    merged.extend(left)
    merged.extend(right)

    return merged

For simplicity, we use pop(0) which is inefficient but easily remedied with indexing.

Instead of dividing the list recursively, we can put each item into a list by itself, and then merge them iteratively:

def mergesort_stackless(lst):

    queue = [[item] for item in lst]

    while len(queue) >= 2:
        left = queue.pop()
        right = queue.pop()
        merged = merge(left, right)
        queue.insert(0, merged)

    return queue[0]

We maintain the individual lists in a queue. In each iteration, we take two lists from the queue, merge them, and put the combined list back in the queue. Eventually just one list remains, which is the desired result. The queue ensures that smaller lists are merged before larger ones, so that the pair being merged does not become too different in size - unbalanced pairs would degrade runtime.

Checking a test case:

>>> lst = [3, 2, 1, 4, 5]
>>> mergesort_stackless(lst)
[1, 2, 3, 4, 5]

Stackless quicksort

Quicksort is a famous example of a recursive algorithm. Here is a sub-optimal implementation:

def qsort(lst, start=0, end=None):
    if end is None:
        end = len(lst)

    if end - start < 2:
        return

    pivot_position = partition(lst, start, end)

    qsort(lst, start, pivot_position)
    qsort(lst, pivot_position + 1, end)

The partition function chooses a pivot value and places it at the correct position, with smaller values to its left and larger values to its right. It also returns the final position of the pivot value.

def partition(lst, start, end):
    pivot = lst[start]
    rest = lst[start + 1 : end]

    left = [item for item in rest if item <= pivot]
    right = [item for item in rest if item > pivot]

    lst[start:end] = left + [pivot] + right

    return start + len(left)

For brevity, we use list slicing instead of swaps, but the discussion does not depend on how the partitioning is done.

In the above implementation, observe that each recursive call stands alone, simply sorting a segment of the list. As this article points out, the recursive call stack serves merely to ensure that the list is divided into smaller segments until every item is a pivot or belongs to a segment of one item.

Because the order in which different parts of the list is sorted is immaterial, we don’t need recursion or even a stack for that matter. Here is an implementation of quicksort using a set to track which segments are still to be sorted:

def qsort_stackless(lst):
    not_sorted = {(0, len(lst))}

    while not_sorted:
        start, end = not_sorted.pop()

        pivot_position = partition(lst, start, end)

        if pivot_position - start > 0:
            not_sorted.add((start, pivot_position))

        if end - (pivot_position + 1) > 0:
            not_sorted.add((pivot_position + 1, end))

The set not_sorted contains start and end indices of segments which remain to be sorted. Note that the pop method returns an arbitrary element of a set, which becomes empty when no unsorted segments remain. The list is then sorted. Let’s check a test case:

>>> lst = [3, 2, 1, 4, 5]
>>> qsort_stackless(lst)
>>> lst
[1, 2, 3, 4, 5]

Instantiating functions and modules

In Python, anything that you can assign to a variable is an object, including a function.

>>> def inc():
...     return x + 1

>>> type(inc)
<class 'function'>

Normally, Python creates a function object upon function definition, but we can also instantiate a function like other objects - by calling its class:

>>> Function = type(inc)
>>> another_inc = Function(inc.__code__, {'x': 2})
>>> another_inc()
3

The class Function takes two required arguments. First, a code object which is a data structure containing compiled bytecode. Second, a dictionary that is the global scope of the resulting function. Indeed, calling the original inc would raise an error since x is not defined, but another_inc works because its global scope does have x.

More generally, the global scope of a function is the module that contains it. Let’s confirm this with a function from a standard module:

>>> import re
>>> from re import match
>>> match.__globals__ is re.__dict__
True

A module is a simple object whose main purpose is to hold objects that belong to it - in other words, provide a namespace. It turns out that we can also dynamically instantiate a module:

>>> Module = type(re)
>>> mod = Module('another module')
>>> dir(mod)
['__doc__', '__loader__', '__name__', '__package__', '__spec__']

Finally, we change the global scope of our existing function to the new module:

>>> mod.x = 42
>>> another_inc.__globals__.update(mod.__dict__)
>>> another_inc()
43

Determining endianness

“Endianness” is how the digits in a number are arranged. Humans write numbers from the most significant digit to the least significant digit. This order is called “big-endian” – big end first. The reverse is called “little-endian”.

In computer memory, a number is represented as a consecutive sequence of bits. A long integer takes up 8 bytes (64 bits), for example. This raises a question: does the lowest address byte represent the most significant or the least sigificant digits of the number?

To decide this, we can run a short C program:

endian.c
#include <stdio.h>

int main(void) {

  unsigned long n = 256 * 256;
  unsigned char *p = (unsigned char *)&n;

  int i;
  for (i = 0; i < sizeof n; i++) {
    printf("%d ", *(p + i));
  }

  printf("\n");
  return 0;
}

For simplicity, we initialize an unsigned long and assign its address to a pointer to an unsigned char. This means that if we add an integer to the pointer, the resulting address will advance by that many bytes. Simply put, we print the base 256 representation of the long integer.

The output on an Intel x86 machine is:

$ gcc endian.c
$ ./a.out
0 0 1 0 0 0 0 0

The third and only nonzero byte is 1. This establishes that the integer is stored in memory in little-endian fashion.

Self-referential data structures

Most data structures in Python can hold arbitrary heterogenous data. This is because the underlying C structures contain pointers to other locations in memory. For example, a Python list is implemented in C as an array of pointers.

A perhaps startling consequence is that a mutable data structure - such as a list - can contain itself as an element:

>>> lst = [1, 2, 3]
>>> lst.append(lst)
>>> lst
[1, 2, 3, [...]]
>>> lst[-1] is lst
True

A self-referential data structure, though somewhat obscure, can be more compact. As a use case, let’s consider the familiar exercise of a Markov chain text generator.

Given a string, we construct a list of consecutive pairs - or bigrams - of words that appear. For each unique bigram, its second word may be the first word in multiple other bigrams. We randomly choose one such bigram, and repeat the process. The sequence of words encountered in this random series of jumps through the text constitutes the Markov text.

First, we define a function to make a data structure that allows for random traversal:

def make_chain(raw_text):
    words = raw_text.split()

    chain = {}

    bigrams = list(zip(words[:-1], words[1:])) + [(words[-1], words[0])]

    for i, bigram in enumerate(bigrams):
        next_bigram = bigrams[(i + 1) % len(bigrams)]

        chain.setdefault(next_bigram, {})
        link = chain.setdefault(bigram, {})
        next_links = link.setdefault(bigram, [])
        next_links.append(chain[next_bigram])

    return chain

chain is self-referential because the append method, which adds to a list inside chain, takes as argument a component of chain itself. We can gain more insight with a small example:

>>> chain = make_chain('a a a b')
>>> bigram = ('a', 'a')
>>> link = chain[bigram]
>>> link
{('a', 'a'): [{...}, {('a', 'b'): [{('b', 'a'): [{...}]}]}]}
>>> link[bigram][0] is link
True

link is a dictionary of one key-value pair. The key is a bigram, and the value is a list of dictionaries, each having the same structure as link. Though the structure is finite, we can “descend” into it indefinitely. This suggests a simple strategy for random traversal:

def make_text(chain):

    link = random.choice(list(chain.values()))

    text = ""

    while len(text) < 140:
        bigram = list(link.keys())[0]
        text = text + bigram[0] + " "
        link = random.choice(link[bigram])

    return text

Let’s generate a random text based on a famous work of literature:

>>> s = """
... Beautiful is better than ugly.
... Explicit is better than implicit.
... Simple is better than complex.
... Complex is better than complicated.
... Flat is better than nested.
... Sparse is better than dense."""

>>> chain = make_chain(s)
>>> make_text(chain)
'than ugly. Explicit is better than implicit. Simple is better than ugly. Explicit is better than nested. Sparse is better than complicated. '

To be sure, compared to the usual implementations of Markov text generation, this version is quite inscrutable. It is to be taken as a proof of principle. You should almost always avoid self-referential data structures in production code!

Tree traversal made hard

Tree traversal is a well known task for which recursion is simple and natural. We define a Node class.

class Node:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

    def __repr__(self):
        return repr(self.value)

    @classmethod
    def make_tree(cls, values):
        if len(values) == 0:
            return None

        mid = len(values) // 2

        return Node(
            values[mid],
            left=cls.make_tree(values[:mid]),
            right=cls.make_tree(values[mid + 1 :]),
        )

The class method make_tree returns the root node of a balanced binary tree from a list of values. If the list is sorted, then the tree is a binary search tree (BST). A BST has the important property that if you do an in-order traversal of the nodes, you will visit them in order. A recursive implementation of in-order traversal is straightforward.

def traverse_recursive(node):
    if node.left:
        traverse_recursive(node.left)

    print(node)

    if node.right:
        traverse_recursive(node.right)

Let’s test it with a small sorted list of integers and verify that the nodes are printed out in order.

>>> tree = Node.make_tree([1, 2, 3, 4, 5, 6])

>>> traverse_recursive(tree)
1
2
3
4
5
6

Now, what if we insist on traversing without recursion? In lieu of the call stack, we need to maintain our own stack. Each item in our stack would hold the same information as a call frame. In traverse_recursive there are two potential recursive calls, each call adding another frame to the call stack. Since a recursive call suspends the calling frame, our stack must maintain a state - whether or not the left side of the BST has been examined. Let’s designate the possibilities for this state "left" and "right".

def traverse_iterative(node):

    stack = []

    stack.append((node, "right"))
    stack.append((node, "left"))

    while stack:
        node, state = stack.pop()
        if state == "left":
            if node.left:
                stack.append((node.left, "right"))
                stack.append((node.left, "left"))
        elif state == "right":
            print(node)
            if node.right:
                stack.append((node.right, "right"))
                stack.append((node.right, "left"))

Each call frame becomes, in our stack, a pair of items - as the stack is popped off, the left side of each node is examined before the right. The analogue of a recursive call is pushing another pair to the stack. Let’s test traverse_iterative on the same BST.

>>> traverse_iterative(tree)
1
2
3
4
5
6

Note that an in-order traversal is different from a depth-first search (DFS). To implement DFS iteratively, we simply maintain a stack of nodes without the additional state. This is because DFS first checks whether the target node has been found (and if so, exits) before adding the child nodes to the stack. Hence there is no reason to “remember” the parent node. Formally, DFS on a binary tree is equivalent to a pre-order traversal.

The iterative implementation of in-order tree traversal is less intuitive than the recursive version, which nicely illustrates that recursion is sometimes much easier than iteration. Though iteration has the advantage of not growing the call stack, and readily leads to tail recursion.

def traverse_tail_recursive(stack):

    if not stack:
        return

    node, state = stack.pop()

    if state == "left":
        if node.left:
            stack.append((node.left, "right"))
            stack.append((node.left, "left"))
    elif state == "right":
        print(node)
        if node.right:
            stack.append((node.right, "right"))
            stack.append((node.right, "left"))

    traverse_tail_recursive(stack)

Calling traverse_tail_recursive with the same tree but wrapped inside of a stack:

>>> traverse_tail_recursive(
...     [
...         (tree, "right"),
...         (tree, "left")
...     ]
... )
1
2
3
4
5
6

Python does not offer tail call optimization, but in languages that do, this example suggests a recipe for turning a recursive function into a tail recursive function: via an intermediate iterative implementation.