23. Axiomatic Foundations

In this final chapter, our story comes full circle. We started our journey with symbolic logic, using the propositional connectives to model logical terms like “and,” “or,” “not,” and “implies.” To that we added the quantifiers and function and relation symbols of first-order logic. From there, we moved to sets, functions, and relations, which are ubiquitous in modern mathematics; the natural numbers and induction; and then topics such as number theory, combinatorics, the real numbers, and the infinite. Here we return to symbolic logic, and see how it can be used to provide a formal foundation for all of mathematics.

Specifically, we will consider an axiomatic framework known as Zermelo-Fraenkel set theory, which was introduced early in the twentieth century. In the set-theoretic view of mathematics, every mathematical object is a set. The axioms assert the existence of sets with various properties. From the collection of all sets, we carve out the usual inhabitants of the mathematical universe, not just the various number systems we have considered, but also pairs, finite sequences, relations, functions, and so on. This provides us with an idealized foundation for everything we have done since Chapter 11.

At the end of this chapter, we will briefly describe another axiomatic framework, dependent type theory, which is the one used by Lean. We will see that it provides an alternative perspective on mathematical objects and constructions, but one which is nonetheless inter-interpretable with the set-theoretic point of view.

23.1. Basic Axioms for Sets

The axioms of set theory are expressed in first-order logic, for a language with a single binary relation symbol, \(\mathord{\in}\). We think of the entire mathematical universe as consisting of nothing but sets; if \(x\) and \(y\) are sets, we can express that \(x\) is an element of \(y\) by writing \(x \in y\). The first axiom says that two sets are equal if and only if they have the same elements.

\[\text{Extensionality:} \;\; \forall x, y \; (x = y \leftrightarrow \forall z (z \in x \leftrightarrow z \in y))\]

The next axiom tells us that there is at least one interesting set in the universe, namely, the set with no element.

\[\text{Empty set:} \;\; \exists x \; \forall y \; y \notin x\]

Here, of course, \(x \notin y\) abbreviates \(\neg (x \in y)\). By the axiom of extensionality, the set asserted to exist by this axiom is unique: in other words, if \(x_1\) and \(x_2\) each have no elements, then, vacuously, any element is in one if and only if it is in the other, so \(x_1 = x_2\). This justifies using the word the in the phrase the empty set. Given this fact, it should seem harmless to introduce a new symbol, \(\emptyset\), to denote the set matching that description. Indeed, one can show that this is case: in a precise sense, such expansions to a first-order language can be viewed as nothing more than a convenient manner of expression, and statements in the bigger language can be translated to the original language in a way that justifies all the expected inferences. We will not go into the details here, and, rather, take this fact for granted. Using the new symbol, the empty set axiom tells us the empty set satisfies the property \(\forall y \; y \notin \emptyset\).

The third axiom tells us that given two sets \(x\) and \(y\), we can form a new set \(z\) whose elements are exactly \(x\) and \(y\).

\[\text{Pairing:} \;\; \forall x, y \; \exists z \; \forall w \; (w \in z \leftrightarrow w = x \vee w = y)\]

There is a stealth usage of this axiom lurking nearby. The axiom does not require that \(x\) and \(y\) are different, so, for example, we can take them both to be the empty set. This tells us that the set \(\{ \emptyset \}\), whose only element is the empty set, exists. More generally, the axiom tells us that for any \(x\), we have the set \(\{ x \}\) whose only element is \(x\), and for any \(x\) and \(y\), we have \(\{x, y\}\), as described above. Once again, the axiom of extensionality tells us that the sets meeting these descriptions are unique, so it is fair to use the corresponding notation. We are now off and running! We now have all of the following sets, and more:

\[\emptyset, \;\; \{ \emptyset \}, \; \; \{ \{ \emptyset \} \}, \;\; \{ \emptyset, \{ \emptyset \} \}, \;\; \{ \{ \{ \emptyset \} \} \}, \;\; \ldots\]

Still, we can never form a set with more than two elements in this way. To that end, it would be reasonable to add an axiom that asserts for every \(x\) and \(y\), the set \(x \cup y\) exists. But we can do better. Remember that if \(x\) is any set, \(\bigcup x\) denotes the union of all the sets in \(x\). In other words, for any set \(z\), \(z\) is an element of \(\bigcup x\) if and only if \(z\) is in \(w\) for some set \(w\) in \(x\). The following axiom asserts that this set exists.

\[\text{Union:} \;\; \forall x \; \exists y \; \forall z \; (z \in y \leftrightarrow \exists w \; (w \in x \wedge z \in w))\]

Once again, this justifies the use of the \(\bigcup\) notation. We get the ordinary binary union using this axiom together with pairing, since we have \(x \cup y = \bigcup \{ x, y \}\).

At this stage, it will be useful to invoke some additional notation that was first introduced in our informal presentation of sets. If \(A\) is any first-order formula in the language of set theory, \(\forall x \in y \; A\) abbreviates \(\forall x \; (x \in y \rightarrow A)\) and \(\exists x \in y \; A\) abbreviates \(\exists x \; (x \in y \wedge A)\), relativizing the quantifiers as described in Section 7.4. The expression \(x \subseteq y\) abbreviates \(\forall z \in x \; (z \in y)\), as you would expect.

The next axiom asserts that for every set \(x\), the power set, \(\mathcal{P}(x)\) exists.

\[\text{Power Set:} \;\; \forall x \; \exists y \; \forall z \; (z \in y \leftrightarrow z \subseteq x)\]

We have begun to populate the universe with basic set constructions. It is the next axiom, however, that gives set theory its remarkable flexibility. Properly speaking, it is not a single axiom, but a schema, an infinite family of axioms given by a single template. The schema is meant to justify set-builder notation \(\{ w \mid \ldots \}\) that was ubiquitous in Chapter 11. The first question we need to address is what we are allowed to write in place of the ellipsis. In our informal presentation of set theory, we said that one can define a set using any property, but that only prompts the question here as to what counts as a “property.” Axiomatic set theory provides a simple but powerful answer: we can use any first-order formula in the language of set theory.

Another concern centers around Russell’s paradox, as discussed in Section 11.1. Any theory that allows us to define the set \(\{ w \mid w \notin w \}\) is inconsistent, since if we call this set \(z\), we can show \(z \in z\) if and only if \(z \notin z\), which is a contradiction. Once again, set theory offers a simple and elegant solution: for any formula \(A(z)\) and set \(y\), we can instead form the set \(\{ w \in y \mid A(w) \}\), consisting of the elements of \(y\) that satisfy \(A\). In other words, we have to first use the other axioms of set theory to form a set \(y\) that is big enough to include all the elements that we want to consider, and then use the formula \(A\) to pick out the ones we want.

The axiom schema we want is called separation, because we use it to separate the elements we want from those in a bigger collection.

\[\text{Separation:} \;\; \forall x_1, x_2, \ldots, x_n, y \; \exists z \; \forall w \; (w \in z \leftrightarrow w \in y \wedge A(w,x_1, x_2, \ldots, x_n))\]

Here, \(A\) can be any formula, and the list of variables \(x_1, \ldots, x_n\) that are shown indicate that the formula \(A\) can have some parameters, in which case the set we form depends on these values. For example, in ordinary mathematics, given a number \(m\) we can form the set \(\{ n \in \mathbb{N} \mid \mathit{prime}(n) \wedge n > m\}\). In this example, the description involves \(m\) and \(n\), and the set so defined depends on \(m\).

We could use the separation axiom to simplify the previous axioms. For example, as long as we know that any set \(x\) exists, we can define the empty set as \(\{ y \in x \mid \bot \}\). Similarly, in the pairing axiom, it is enough to assert that there is a set that contains \(x\) and \(y\) as elements, because then we can use separation to carve out the set whose elements are exactly \(x\) and \(y\).

These are only the first six axioms of set theory; we have four more to go. But these axioms alone provide a foundation for reasoning about sets, relations, and functions, as we did in Chapter 11, Chapter 13, and Chapter 15. For example, we have already defined the union operation, and we can define set intersection \(x \cap y\) as \(\{ z \in x \cup y \mid z \in x \wedge z \in y \}\). We cannot define arbitrary set complements; for example, the exercises ask you to show that in set theory we can prove that there is no set that contains all sets, and so the complement of the empty set does not exist. But given any two sets \(x\) and \(y\), we can define their difference \(x \setminus y\) as \(\{ z \in x \mid z \notin y \}\). The exercises below ask you to show that we can also define indexed unions and intersections, once we have developed the notion of a function.

We would like to define a binary relation between two sets \(x\) and \(y\) to be a subset of \(x \times y\), but we first have to define the cartesian product \(x \times y\). Remember that in Section 11.4 we defined the ordered pair \((u, v)\) to be the set \(\{ \{ u \}, \{ u, v \} \}\). As a result, we can use the separation axiom to define

\[x \times y = \{ z \in \ldots \mid \exists u \in x \; \exists v \in y \; (z = (u, v)) \}\]

provided we can prove the existence of a set big enough to fill the “….” In the exercises below, we ask you to show that the set \(\mathcal P (\mathcal P (x \cup y))\) contains all the relevant ordered pairs. A binary relation \(r\) on \(x\) and \(y\) is then just a subset of \(x \times y\), where we interpret \(r(u, v)\) as \((u, v) \in r\). We can think of ordered triples from the sets \(x\), \(y\), \(z\) as elements of \(x \times (y \times z)\) and so on. This gives us ternary relations, four-place relations, and so on.

Now we can say that a function \(f : x \to y\) is really a binary relation satisfying \(\forall u \in x \; \exists! v \in y \; f(u, v)\), and we write \(f(u) = v\) when \(v\) is the unique element satisfying \(f(u, v)\). A function \(f\) taking arguments from sets \(x\), \(y\), and \(z\) and returning an element of w can be interpreted as a function \(f : x \times y \times z \to w\), and so on.

With sets, relations, and functions, we have the basic infrastructure we need to do mathematics. All we are missing at this point are some interesting sets and structures to work with. For example, it would be nice to have a set of natural numbers, \(\mathbb{N}\), with all the properties we expect it to have. So let us turn to that next.

23.2. The Axiom of Infinity

With the axioms we have so far, we can form lots of finite sets, starting with \(\emptyset\) and iterating pairing, union, powerset, and separation constructions. This will give us sets like

\[\emptyset, \{ \emptyset \}, \{ \{ \emptyset \} \}, \{ \emptyset, \{ \emptyset \} \}, \{ \{ \{ \emptyset \} \} \}, \ldots\]

But the axioms so far do not allow us to define sets that are more interesting than these. In particular, none of the axioms gives us an infinite set. So we need a further axiom to tell us that such a set exists.

Remember that in Chapter 17 we characterized the natural numbers as a set with a distinguished element, \(0\), and an injective operation \(\mathit{succ}\), satisfying the principles of induction and recursive definition. In set theory, everything is a set, so if we want to represent the natural numbers in that framework, we need to identify them with particular sets. There is a natural choice for \(0\), namely, the empty set, \(\emptyset\). For a successor operation, we will use the function \(\mathit{succ}\) defined by \(\mathit{succ}(x) = x \cup \{ x \}\). The choice is a bit of a hack; the best justification for the definition is that it works. With this definition, the first few natural numbers are as follows:

\[0 = \emptyset, \;\; 1 = \{ \emptyset \}, \;\; 2 = \{ \emptyset, \{ \emptyset \} \}, \;\; 3 = \{ \emptyset, \{ \emptyset \}, \{ \emptyset, \{ \emptyset \} \} \}, \;\; \ldots\]

It is more perspicuous to write them as follows:

\[0 = \emptyset, \;\; 1 = \{ 0 \}, \;\; 2 = \{ 0, 1 \}, \;\; 3 = \{ 0, 1, 2 \}, \;\; 4 = \{ 0, 1, 2, 3 \}, \;\; \ldots\]

In general, \(n+1\) is represented by the set \(\{ 0, 1, \ldots, n \}\), in which case, \(m \in n\) is the same as \(m < n\). This is just an incidental property of our encoding, but it is a rather charming one.

Recall from Chapter 17 that we can characterize the set of natural numbers as follows:

  • There is an element \(0 \in \mathbb{N}\) and there is an injective function \(\mathit{succ} : \mathbb{N} \to \mathbb{N}\), with the additional property that \(\mathit{succ}(x) \ne 0\) for any \(x\) in \(\mathbb{N}\).

  • The set \(\mathbb{N}\) satisfies the principle of induction: if \(x\) is a subset of \(\mathbb{N}\) that contains \(0\) and is closed under \(\mathit{succ}\) (that is, whenever \(z\) is in \(\mathbb{N}\), so is \(\mathit{succ}\)), then \(x = \mathbb{N}\).

We have already settled on the definitions of \(0\) and \(\mathit{succ}\), but we don’t yet have any set that contains the first and is closed under applying the second. The axiom of infinity asserts precisely that there exists such a set.

\[\text{Infinity:} \;\; \exists x \; (\emptyset \in x \wedge \forall y \; (y \in x \rightarrow y \cup \{ y \} \in x))\]

Say a set \(x\) is inductive if it satisfies the property after the existential quantifier, namely, that it contains the empty set and is closed under our successor operation. Notice that the set of natural numbers, which we are still trying to define formally, has this property. The axiom of infinity asserts the existence of some inductive set, but not necessarily the natural numbers themselves; an inductive set can have other things in it as well. In a sense, the principle of induction says that the natural numbers is the smallest inductive set. So we need a way to separate that set from the one asserted to exist by the axiom of infinity.

Let \(x\) be any inductive set, as asserted to exist by the axiom of infinity. Let

\[y = \bigcap \{ z \subseteq x \mid \mbox{$z$ is inductive} \}.\]

Here \(z \subseteq x\) can also be written \(z \in \mathcal P(x)\), so the inside set exists by the separation axiom. According to this definition, \(y\) is the intersection of every inductive subset of \(x\), so an element \(w\) is in \(y\) if and only if \(w\) is in every inductive subset of \(x\). We claim that \(y\) itself is inductive. First, we have \(\emptyset \in y\), since the empty set is an element of every inductive set. Next, suppose \(w\) is in \(y\). Then \(w\) is in every inductive subset of \(x\). But since every inductive set is closed under successor, \(\mathit{succ}(w)\) is in every inductive subset of \(x\). So \(\mathit{succ}(w)\) is in the intersection of all inductive subsets of \(x\)—which is \(y\)!

It quickly follows that \(y\) is a subset of every inductive set. To see this, suppose that \(z\) is inductive. You can check that \(z \cap x\) is inductive, and thus \(y \subseteq z \cap x \subseteq z\).

The more interesting point is that \(y\) also satisfies the principle of induction. To see this, suppose \(u \subseteq y\) contains the empty set and is closed under \(\mathit{succ}\). Then \(u\) is inductive, and since \(y\) is a subset of every inductive set, we have \(y \subseteq u\). Since we assumed \(u \subseteq y\), we have \(u = y\), which is what we want.

To summarize, then, we have proved the existence of a set that contains \(0\) and is closed under a successor operation and satisfies the induction axiom. Moreover, there is only one such set: if \(y_1\) and \(y_2\) both have this property, then so does \(y_1 \cap y_2\), and by the induction principle, this intersection has to be equal to both \(y_1\) and \(y_2\), in which case \(y_1\) and \(y_2\) are equal. It then makes sense to call the unique set with these properties the natural numbers, and denote it by the symbol \(\mathbb{N}\).

There is only one piece of the puzzle missing. It is clear from the definition that \(0\) is not the successor of any number, but it is not clear that the successor function is injective. We can prove that by first noticing that the natural numbers, as we have defined them, have a peculiar property: if \(z\) is a natural number, \(y\) is an element of \(z\), and \(x\) is an element of \(y\), then \(x\) is an element of \(z\). This says exactly that the \(\in\) relation is transitive on natural numbers, which is not surprising, since we have noted that \(\in\) on the natural numbers, under our representation, coincides with \(<\). To prove this claim formally, say that a set \(z\) is transitive if it has the property just mentioned, namely, that every element of an element of \(z\) is an element of z. This is equivalent to saying that for every \(y \in z\), we have \(y \subseteq z\).


Lemma. Every natural number is transitive.

Proof. By induction on the natural numbers. Clearly, \(\emptyset\) is transitive. Suppose \(x\) is transitive, and suppose \(y \in \mathit{succ}(x)\) and \(z \in y\). Since \(\mathit{succ}(x) = x \cup \{ x \}\), we have \(y \in x\) or \(y \in \{x\}\). If \(y \in x\), then by the inductive hypothesis, we have \(z \in x\), and hence \(z \in \mathit{succ}(x)\). Otherwise, we have \(y \in \{ x \}\), and so \(y = x\). In that case, again we have \(z \in x\), and hence \(z \in \mathit{succ}(x)\).


The next lemma shows that, on transitive sets, union acts like the predecessor operation.


Lemma. If \(x\) is transitive, then \(\bigcup \mathit{succ}(x) = x\).

Proof. Suppose \(y\) is in \(\bigcup \mathit{succ}(x) = \bigcup (x \cup \{ x \})\). Then either \(y \in z\) for some \(z \in x\), or \(y \in x\). In the first case, also have \(y \in x\), since \(x\) is transitive.

Conversely, suppose \(y\) is in \(x\). Then \(y\) is in \(\bigcup \mathit{succ}(x)\), since we have \(x \in \mathit{succ}(x)\).

Theorem. \(\mathit{succ}\) is injective on \(\mathbb{N}\).

Proof. Suppose \(x\) and \(y\) are in \(\mathbb{N}\), and \(\mathit{succ}(x) = \mathit{succ}(y)\). Then \(x\) and \(y\) are both transitive, and we have \(x = \bigcup \mathit{succ}(x) = \bigcup \mathit{succ}(y) = y\).


With that, we are off and running. Although we will not present the details here, using the principle of induction we can justify the principle of recursive definition. We can then go on to define the basic operations of arithmetic and derive their properties, as done in Chapter 17. We can go on to define the integers, the rational numbers, and the real numbers, as described in Chapter Chapter 21, and to develop subjects like number theory and combinatorics, as described in Chapters Chapter 19 and Chapter 20. In fact, it seems that any reasonable branch of mathematics can be developed formally on the basis of axiomatic set theory. There are pitfalls, for example, having to do with large collections: for example, just as it is inconsistent to postulate the existence of a set of all sets, in the same way, there is no collection of all partial orders, or all groups. So when interpreting some mathematical claims, care has to be taken in some cases to restrict to sufficiently large collections of such objects. But this rarely amounts to more than careful bookkeeping, and it is a remarkable fact that, for the most part, the axioms of set theory are flexible and powerful enough to justify most ordinary mathematical constructions.

23.3. The Remaining Axioms

The seven axioms we have seen are quite powerful, and suffice to represent large portions of mathematics. We discuss the remaining axioms of Zermelo-Fraenkel set theory here.

So far, none of the axioms we have seen rule out the possibility that a set \(x\) can be an element of itself, that is, that we can have \(x \in x\). The following axiom precludes that.

\[\text{Foundation} \;\; \forall x \; (\exists y \; y \in x \to \exists y \in x \; \forall z \in x \; z \notin y)))\]

The axiom says that if \(x\) is a nonempty set, there is an element \(y\) of \(x\) with the property that no element of \(y\) is again an element of \(x\). This implies we cannot have a descending chain of sets, each one an element of the one before:

\[x_1 \ni x_2 \ni x_3 \ni \ldots\]

If we apply the axiom of foundation to the set \(\{x_1, x_2, x_3, \ldots\}\), we find that some element \(x_i\) does not contain any others, which is only possible if the sequence has terminated with \(x_i\). In other words, the axiom implies (and is in fact equivalent to) the statement that the elementhood relation is well founded, which explains the name.

The axioms listed in the previous section tell a story of how sets come to be: we start with the empty set, and keep applying constructions like power set, union, and separation, to build more sets. Set theorists often imagine the hierarchy of sets as forming a big V, with the empty set at the bottom and a set at any higher level comprising, as its elements, sets that appear in levels below. In a precise sense (which we will not spell out here), the axiom of foundation says that every set arises in such a way.

Now consider the following sequence of sets:

\[\mathbb{N}, \;\; \mathcal P(\mathbb{N}), \;\; \mathcal P(\mathcal P(\mathbb{N}), \;\; \mathcal P (\mathcal P (\mathcal P (\mathbb{N}))), \;\; \ldots\]

It is consistent with all the axioms we have seen so far that every set in the mathematical universe is an element of one of these. That still gives us a lot of sets, but, since we have described that sequence, we can just as well imagine a set that contains all of them:

\[\{ \mathbb{N}, \;\; \mathcal P(\mathbb{N}), \;\; \mathcal P(\mathcal P(\mathbb{N}), \;\; \mathcal P (\mathcal P (\mathcal P (\mathbb{N}))), \;\; \ldots \}.\]

The following axiom implies the existence of such a set.

\[\begin{split}\text{Replacement:} \;\; \forall x, y_1, \ldots, y_n \;\; (\forall z \in x \; \exists ! w \; A(z, w, y_1, \ldots, y_n) \rightarrow \\ \exists u \; \forall w \; (w \in u \leftrightarrow \exists z \in x \; A(z, w, y_1, \ldots, y_n)))\end{split}\]

Like the axiom of separation, this axiom is really a schema, which is to say, a separate axiom for each formula \(A\). Here, too, the variables \(y_1, y_2, \ldots, y_n\) are free variables that can occur in \(A\). To understand the axiom, it is easiest to think of them as parameters that are fixed in the background, and then ignore them. The axioms says that if, for every \(z\) in \(x\) there is a unique \(w\) satisfying \(A(z,w)\), then there is a single set, \(u\), that consists of the \(w\) values corresponding to every such \(z\). In other words, if you think of \(A\) as a function whose domain is \(x\), the axiom asserts that the range of that function exists. In the example above, \(x\) is the natural numbers, and \(A(z, w)\) says that \(w\) is the \(z\)-fold iterate of the power set of the natural numbers.

The nine axioms we have listed so far comprise what is known as Zermelo-Fraenkel Set Theory. There is on additional axiom, the axiom of choice, which is usually listed separately for historical reasons: it was once considered controversial, and in the early days, mathematicians considered it important to keep track of whether the axiom was actually used in a proof. There are many equivalent formulations, but this one is one of the most straightforward.

\[\text{Choice:} \;\; \forall x \; (\emptyset \notin x \rightarrow \exists f : x \to \bigcup x \; \forall y \in x \; f(y) \in y)\]

The axiom says that for any collection \(x\) of nonempty sets, there is a function \(f\) that selects an element from each one. We used this axiom, informally, in Section 15.2 to show that every surjective function has a right inverse. In fact, this last statement can be shown to be equivalent to the axiom of choice on the basis of the other axioms.

To summarize, then, the axioms of Zermelo-Fraenkel Set Theory with the axiom of choice are as follows:

  1. Extensionality:

    \[\forall x, y \; (x = y \leftrightarrow \forall z (z \in x \leftrightarrow z \in y))\]
  2. Empty set:

    \[\exists x \; \forall y \; y \notin x\]
  3. Pairing:

    \[\forall x, y \; \exists z \; \forall w \; (w \in z \leftrightarrow w = x \vee w = y)\]
  4. Union:

    \[\forall x \; \exists y \; \forall z \; (z \in y \leftrightarrow \exists w \; (w \in x \wedge z \in w))\]
  5. Power set:

    \[\forall x \; \exists y \; \forall z \; (z \in y \leftrightarrow z \subseteq y)\]
  6. Separation:

    \[\forall x_1, x_2, \ldots, x_n, y \; \exists z \; \forall w \; (w \in z \leftrightarrow w \in y \wedge A(w,x_1, x_2, \ldots, x_n))\]
  7. Infinity:

    \[\exists x \; (\emptyset \in x \wedge \forall y \; (y \in x \rightarrow y \cup \{ y \} \in x))\]
  8. Foundation:

    \[\forall x \; (\exists y \; y \in x \to \exists y \in x \; \forall z \in x \; z \notin y)))\]
  9. Replacement:

    \[\begin{split}\forall x, y_1, \ldots, y_n \;\; (\forall z \in x \; \exists ! w \; A(z, w, y_1, \ldots, y_n) \rightarrow \\ \exists u \; \forall w \; (w \in u \leftrightarrow \exists z \in x \; A(z, w, y_1, \ldots, y_n)))\end{split}\]
  10. Choice:

    \[\forall x \; (\emptyset \notin x \rightarrow \exists f : x \to \bigcup x \; \forall y \in x \; f(y) \in y)\]

23.4. Type Theory

As a foundation for mathematics, Zermelo-Fraenkel set theory is appealing. The underlying logic, first-order logic, provides the basic logical framework for quantifiers and the logical connectives. On top of that, the theory describes a single, intuitively natural concept, that of a set of elements. The axioms are plausible eminently reasonable. It is remarkable that virtually all of modern mathematics can be reduced to such simple terms.

There are other foundations on offer, however. These tend to be largely inter-interpretable with set theory. After all, set-theoretic language is now ubiquitous in everyday mathematics, so any reasonable foundation should be able to make sense of such language. On the other hand, we have already noted that set theory is remarkably expressive and robust, and so it should not be surprising that other foundational approaches can often be understood in set-theoretic terms.

This is, in particular, true of dependent type theory, which is the basis of the Lean theorem prover. The syntax of type theory is more complicated than that of set theory. In set theory, there is only one kind of object; officially, everything is a set. In contrast, in type theory, every well-formed expression in Lean has a type, and there is a rich vocabulary of defining types.

In fact, Lean is based on a version of an axiomatic framework known as the Calculus of Inductive Constructions, which provides all of the following:

  • A hierarchy of type universes, Type 0, Type 1, Type 2, … and a special type Prop. The expression Type abbreviates Type 0, and saying T : Type can be interpreted as saying that T is a datatype. The type Prop is the type of propositions.

  • Dependent function types Π x : A, B x. An element f of this type is a function which maps any element a of type A to an element f a of type B a. The fact that the type of the output depends on the type of the input is what makes the function “dependent.” In the case where the output type does not depend on the input, we have the simple function type A B.

  • Inductive types, like the natural numbers, specified by constructors, like zero and successor. Each such type comes with principles of induction and recursion.

These constructions account for both the underlying logic of assertions (that is, the propositions) as well as the objects of the universe, which are elements of the ordinary types.

It is straightforward to interpret type theory in set theory, since we can view each type as a set. The type universes are simply large collections of sets, and dependent function types and inductive types can be explained in terms of set-theoretic constructions. We can view Prop as the set \(\{ \top, \bot \}\) of truth values, just as we did when we described truth-table semantics for propositional logic.

Given this last fact, why not just use set theory instead of type theory for interactive theorem proving? Some interactive theorem provers do just that. But type theory has some advantages:

  • The fact that the rules for forming expressions are so rigid makes it easier for the system to recognize typographical errors and provide useful feedback. In type theory, if f has type it can be applied only to a natural number, and a theorem prover can flag an error if the argument has the wrong type. In set theory, anything can be applied to anything, whether or not doing so really makes sense.

  • Again, because the rules for forming expressions are so rigid, the system can infer useful information from the components of an expression, whereas set theory would require us to make such information explicit. For example, with f as above, a theorem prover can infer that a variable x in f x should have type , and that the resulting expression again has type . In set theory, \(x \in \mathbb{N}\) has to be stated as an explicit hypothesis, and \(f(x) \in \mathbb{N}\) is then a theorem.

  • By encoding propositions as certain kinds of types, we can use the same language for defining mathematical objects and writing mathematical proofs. For example, we can apply a function to an argument in the same way we apply a theorem to some hypotheses.

  • Expressions in a sufficiently pure part of dependent type theory have a computational interpretation, so, for example, the logical framework tells us how to evaluate the factorial function, given its definition. In set theory, the computational interpretation is specified independently, after the fact.

These facts hark back to the separation of concerns that we raised in Chapter 1: different axiomatic foundations provide different idealized descriptions of mathematical activity, and can be designed to serve different purposes. If you want a clean, simple theory that accounts for the vast majority of mathematical proof, set theory is hard to beat. If you are looking for a foundation that makes computation central or takes the notion of a function rather than a set as basic, various flavors of type theory have their charms. For interactive theorem proving, pragmatic issues regarding implementation and usability come into play. What is important to recognize is that what all these idealized descriptions have in common is that they are all designed to model important aspects of mathematical language and proof. Our goal here has been to help you reflect on those features of mathematical language and proof that give mathematics its special character, and to help you better understand how they work.

23.5. Exercises

  1. Use an argument similar Russell’s paradox to show that there is no “set of all sets,” that is, there is no set that contains every other set as an element.

  2. Suppose \(x\) is a nonempty set, say, containing an element \(y\). Use the axiom of separation to show that the set \(\bigcap x\) exists. (Remember that something is an element of \(\bigcap x\) if it is an element of every element of \(x\).)

  3. Justify the claim in Section 23.1 that every element of \(x \times y\) is an element of \(\mathcal P (\mathcal P (x \cup y))\).

  4. Given a set \(x\) and a function \(A : x \to y\), use the axioms of set theory to prove the existence of \(\bigcup_{i \in x} A(i)\).