10. First-Order Logic
Consider the statement “Bill and Tracy are married and all their children are smart.” Propositional logic will let us model this statement as a conjunction, but it doesn’t give us the means to model the fact that marriage is a relationship between two people, a relationship that in this case is claimed to hold between Bill and Tracy. It doesn’t allow us to express the fact that if a person, X, is married to another person, Y, then Y is married to X. It also doesn’t allow us to model the fact that the second conjunct quantifies over children, the fact that being a child of someone is an asymmetric relationship, or the fact that being smart is a property that someone may or may not have.
First-order logic will let us do all these things. In many ways, the syntax and semantics of first-order logic is similar to the syntax and semantics of propositional logic. One difference is that now we need two categories of expressions, terms and formulas. Terms name things in the intended interpretation (in the example above, people), whereas formulas say things about those objects. But in each case, the set of expressions is defined syntactically, and we use recursive definitions to specify how to evaluate them for a given interpretation, just as we did for propositional logic.
But there is a sense in which propositional logic and first-order logic are worlds apart. We have seen that for any given propositional formula, we can specify an interpretation by assigning truth values to its finitely many variables. In contrast, there are first-order formulas that are satisfiable, but only when interpreted with an infinite domain of objects. We also saw that the method of writing out a truth table provides an easy (though inefficient) decision procedure for propositional logic, and that a propositional formula is provable if and only if it is valid. This means that the question of provability for propositional logic is also decidable. In contrast, the question of provability for first-order logic is equivalent to the halting problem.
Even worse, the question of the truth of a first-order sentence in an intended interpretation is often even more undecidable than the halting problem. For example, questions as to the truth of a first-order statement about the natural numbers in a vocabulary that includes only basic arithmetic is undecidable, even giving an oracle for the halting problem, an oracle for the halting problem relative to the halting problem, or any finite iteration of that sort.
What is a poor computer scientist to do? We will see that there are at least two avenues we can pursue. The first thing we can do is develop decision procedures for fragments of first-order logic, or restricted interpretations of first-order logic. In particular, we will consider procedures for equational reasoning and procedures for reasoning about linear arithmetic on the real numbers. Such procedures are implemented by contemporary SMT solvers, on top of SAT-solving methods. The other thing we can do is develop means of searching for proofs from axioms, in such a way that we are guaranteed to find one if such a proof exists, even though the search may not terminate if there is none. This brings us to the domain of first-order theorem proving, which we will also explore.
10.1. Syntax
To specify a first-order language, \(L\), we start by specifying some constant symbols, some function symbols, and some relation symbols. Each of the function and relation symbols comes with an associated arity, namely, a specification of the number of arguments. For example, to design a language to reason about the integers, we might choose to have constants \(0\) and \(1\), a binary function \(f(x, y)\) to represent addition, a binary function \(g(x, y)\) to represent multiplication, a unary function \(h(x)\) to represent negation, and a binary relation \(R(x, y)\) to represent the relation \(x < y\).
The set of terms is defined to be the set of all things we can obtain using variables, constants, and function symbols. For example, \(f(g(x, 1), h(y))\) is a term. The following definition makes this more precise:
Definition
The set of terms of the language \(L\) is generated inductively as follows:
Each variable \(x, y, z, \ldots\) is a term.
Each constant symbol of \(L\) is a term.
If \(f\) is any \(n\)-ary function symbol of \(L\) and \(t_1, t_2, \ldots, t_n\) are terms of \(L\), then \(f(t_1, t_2, \ldots, t_n)\) is a term.
Keep in mind that a term is supposed to name an object, given an interpretation of the symbols and an assignment to the free variables. For example, with the interpretation above, \(f(g(x, 1), h(y))\) denotes the integer 4 when \(x\) is assigned to 6 and \(y\) is assigned to \(2\). The semantics we present below makes this precise. In contrast, a formula is supposed to make a statement, again given an interpretation of the symbols and an assignment to the free variables.
Definition
The set of formulas of the language \(L\) is generated inductively as follows:
If \(R\) is any \(n\)-ary relation symbol of \(L\) and \(t_1, t_2, \ldots, t_n\) are terms of \(L\), then \(R(t_1, t_2, \ldots, t_n)\) is a formula.
If \(s\) and \(t\) are terms, then \(s = t\) is a formula.
\(\top\) and \(\bot\) are formulas.
If \(A\) and \(B\) are formulas, so are \(\lnot A\), \(A \land B\), \(A \lor B\), \(A \limplies B\), and \(A \liff B\).
If \(A\) is a formula and x is any variable, then \(\fa x A\) and \(\ex x A\).
Most of the clauses should be familiar. In the last one, \(\fa x A\) (“for all \(x\), \(A\)”) expresses that \(A\) holds of every element in the intended interpretation, and \(\ex x A\) (“there exists an \(x\) such that \(A\)”) expresses that \(A\) holds of some element in the intended interpretation. Once again, the semantics we present below makes this precise. It is sometimes useful to think of constants as 0-ary function symbols, that is, function symbols that don’t take any arguments. In a similar way, a 0-ary relation symbol is just a propositional variable.
As we did in Chapter 4, we can define notions of depth and complexity for terms and formulas. We can say what it means for a term to be a subterm of another term, and say what it means for a formula to be a subformula of another formula. We can also say what it means to substitute a term \(t\) for a variable \(x\) in another term \(s\), which we denote \(s[t/x]\), and what it means to substitute a term \(t\) for a variable \(x\) in a formula \(A\), which we denote \(A[t/x]\). These operations are of central importance in first-order logic.
We will later need a notion of simultaneous substitution, which replaces multiple variables at once. In this context, a substitution \(\sigma\) is a mapping from a set of variables to terms in a language. (If we want, we can think of \(\sigma\) as a mapping from the set of all variables to terms which assigns the remaining variables to themselves.) Given a substitution \(\sigma\) and a term t, the substitution \(\sigma \, t\) is defined recursively as follows:
In other words, \(\sigma\) tells us what to plug in for the variables, and otherwise the substitution leaves the function symbols and constants the same. If \(A\) is a formula, the definition of the substitution \(\sigma \, A\) is similar, though it is slightly complicated by the fact that we might have to rename the bound variables when we carry out the substitution. Implementations of first-order logic have to deal with that appropriately, but we will not worry about the details here.
The set of variables that occur in a term can be defined by recursion on terms. Consider the formula \(\ex z (x < z \land z < y)\). Intuitively, this says “there is something between \(x\) and \(y\).” The variable \(z\) is said to be bound in this formula, whereas \(x\) and \(y\) are said to be free. The set of free variables of a formula can be defined by recursion on formulas. Intuitively, the formula above says the same thing as \(\ex w (x < w \land w < y)\). Logicians and computer scientists often “identify” formulas up to renaming of their bound variables, which is to say, they consider these formulas to be the same and rename bound variables freely.
You should take care to rename bound variables when carrying out substitution to avoid capture. For example, the formula \(\fa x \ex y y > x\) says that for every \(x\) there is some number greater than it. This is clearly true when we interpret the statement in the integers, but the statement is patently false when we substitute \(y\) for \(x\) in \(\ex y y > x\). If we rename the bound variable \(y\) to \(z\), we can substitute \(y\) for \(x\) without problems. You should never rename a free variable, however. Saying that there is something bigger than \(x\) is not the same as saying there is something bigger than \(z\). A formula without free variables is called a sentence.
Spelling out the nuances of bound variables precisely is one of the most annoying theoretical chores in mathematical logic and computer science. It can be done, but we will gloss over the details, and rely on your intuition and common sense to get by.
10.2. Using first-order logic
Learning to use the language of first-order logic takes some practice. Consider the following statements:
Every integer is even or odd, but not both.
A integer is even if and only if it is divisible by two.
If some integer, \(x\), is even, then so is \(x^2\).
A integer \(x\) is even if and only if \(x + 1\) is odd.
For any three integers \(x\), \(y\), and \(z\), if \(x\) divides \(y\) and \(y\) divides \(z\), then \(x\) divides \(z\).
Given the language of arithmetic described above, let’s write \(x + y\) instead of \(f(x, y)\) and \(x \cdot y\) instead of \(g(x, y)\). We can then write 2 for \(1 + 1\) and \(x^2\) for \(x \cdot x\), and we can define the following formulas:
\(\fn{even}(x) \equiv \ex y x = 2 \cdot y\)
\(\fn{odd}(x) \equiv \ex y x = 2 \cdot y + 1\)
\(x \mid y \equiv \ex z y = x \cdot z\).
With these, the statements above can be written as follows:
\(\fa x (\fn{even}(x) \lor \fn{odd}(x)) \land \neg (\fn{even}(x) \land \fn{odd}(x))\)
\(\fa x \fn{even}(x) \liff 2 \mid x\)
\(\fa x \fn{even}(x) \limplies \fn{even}(x^2)\)
\(\fa x \fn{even}(x) \liff \fn{odd}(x+1)\)
\(\fa x \fa y \fa z x \mid y \land y \mid z \limplies x \mid z\).
The statement with which we began this chapter might be written as follows:
When reading such formulas, we give the quantifiers the widest scope possible, and use parentheses to limit the scope. In other words, \(\fa x A \land B\) means \(\fa x (A \land B)\), and we write \((\fa x A) \land B\) if we want to limit the scope to \(A\). We can shorten the last example by writing \(\fa {x \, y \, z} x \mid y \land y \mid z \limplies x \mid z\).
Notice that quantifiers always range over the entire universe of objects, integers in the first set of examples and possibly people in the example involving Bill and Tracy. We can restrict the domain of a quantifier using propositional connectives:
To say “there is an even number between 1 and 3,” we write \(\ex x \fn{even}(x) \land 1 < x \land x < 3\).
To say “every even number greater than 1 is greater than 3,” we write \(\fa x \fn{even}(x) \land x > 1 \limplies x > 3\).
This process is known as relativization.
It is natural to consider variations on first-order logic with different sorts of variables. For example, a formal representation of Euclidean geometry might have variables \(p, q, r, \ldots\) ranging over points, other variables \(L, M, N, \ldots\) ranging over lines, and maybe even variables \(\alpha, \beta, \gamma, \ldots\) ranging over circles. A relation symbol \(\fn{on}(p, L)\) used to express that point \(p\) lies on the line \(L\) should come with a specification that the first argument is a point and the second argument is a line. This is known as many-sorted first-order logic. We will consider even more expressive generalizations of first-order logic later on. In the meanwhile, to keep the theoretical exposition simple, we will focus on first-order logic with only one variable sort.
10.3. Semantics
In order to evaluate a propositional formula \(A\), all we need to know is the assignment of truth values to variables occurring in \(A\). Given such an assignment, \(\tau\), we were able to define the truth value \(\tval{A}_\tau\).
In first-order logic, there are two things we need to evaluate, namely, terms and formulas. A term like \(f(x, g(y, z))\) has variables \(x\), \(y\), \(z\), that we think of as ranging over objects, like numbers, people, or whatever. So in this case our interpretation of the language has to specify the domain of objects that we have in mind. It also has to specify interpretations of all the function and relation symbols in the language as functions and relations on the corresponding set. Such a structure is known as a model.
Definition
A model \(\mdl M\) for a language consists of
A set of objects, \(|\mdl M|\), called the universe of \(\mdl M\).
For each function symbol \(f\) in the language, a function \(f^{\mdl M}\) from the universe of \(\mdl M\) to itself, with the corresponding arity.
For each relation symbol \(R\) in the language, a relation \(R^{\mdl M}\) on the universe of \(\mdl M\), with the corresponding arity.
Let \(\sigma\) be an assignment of elements of \(|\mdl M|\) to variables. Then every term \(t\) has a value \(\tval{t}_{\mdl M, \sigma}\) in \(|\mdl M|\) defined recursively as follows:
\(\tval{x}_{\mdl M, \sigma} = \sigma(x)\)
For every \(n\)-ary function symbol \(f\) and every tuple of terms \(t_1, \ldots, t_n\), \(\tval{f(t_1, \ldots, t_n)}_{\mdl M, \sigma} = f^{\mdl M}(\tval{t_1}_{\mdl M, \sigma}, \ldots, \tval{t_n}_{\mdl M, \sigma})\)
Remember that we can think of constant symbols as 0-ary function symbols, so \(\tval{c}_{\mdl M, \sigma} = c_{\mdl M}\) is implicit in the second clause.
We can also say what it means for a formula \(A\) to be true in \(\mdl M\) relative to the assignment \(\sigma\), which we write as \(\mdl M \models_\sigma A\):
\(\mdl M \models_\sigma t = t'\) if and only if \(\tval{t}_{\mdl M, \sigma} = \tval{t'}_{\mdl M, \sigma}\).
\(\mdl M \models_\sigma R(t_0, \ldots, t_{n-1})\) if and only if \(R^{\mdl M}(\tval{t_0}_{\mdl M, \sigma}, \ldots, \tval{t_{n-1}}_{\mdl M, \sigma})\).
\(\mdl M \models_\sigma \bot\) is always false.
\(\mdl M \models_\sigma \top\) is always true.
\(\mdl M \models_\sigma A \land B\) if and only if \(\mdl M \models_\sigma A\) and \(\mdl M \models_\sigma B\).
\(\mdl M \models_\sigma A \lor B\) if and only if \(\mdl M \models_\sigma A\) or \(\mdl M \models_\sigma B\).
\(\mdl M \models_\sigma A \limplies B\) if and only if \(\mdl M \not\models_\sigma A\) or \(\mdl M \models_\sigma B\).
\(\mdl M \models_\sigma A \liff B\) if and only if \(\mdl M \models_\sigma A\) and \(\mdl M \models_\sigma B\) either both hold or both don’t hold.
\(\mdl M \models_\sigma \fa x A\) if and only if for every \(a \in | \mdl M|\), \(\mdl M \models_{\sigma[x \mapsto a]} A\).
\(\mdl M \models_\sigma \ex x A\) if and only if for some \(a \in | \mdl M|\), \(\mdl M \models_{\sigma[x \mapsto a]} A\).
Most of the clauses are the same as for propositional logic. It’s the first two base cases and the clauses for the quantifiers that are new.
The fact that formulas can be interpreted in different models is central to modern logic. Take, for example, the sentence \(\fa x \ex y R(x, y)\). This is true of the real numbers with the less-than relation, but false of the natural numbers with the greater-than relation. What about the integers with the relation “\(x\) divides \(y\)?”
If \(\mdl M \models_\sigma A\), we also say that \(A\) is satisfied by \(\mdl M\) and \(\sigma\), and that \(A\) is satisfiable. Remember that saying that \(A\) is a sentence means that it has no free variables. In that case, we don’t have to talk about an assignment \(\sigma\), so a sentence is satisfiable if it is true in some model. If \(\mdl M \models A\), we also say that \(\mdl M\) is a model of \(A\).
We say that a sentence \(A\) is valid, written \(\models A\), if it is true in every model. (More generally, a formula with free variables is valid if it is true in every model with respect to every assignment of variables to elements in the universe of that model.) In analogy with propositional logic, if \(\Gamma\) is a set of sentences and \(A\) is a sentence, we say \(\Gamma\) entails \(A\) and write \(\Gamma \models A\) if every model of \(\Gamma\) is also a model of \(A\). This means that the symbol \(\models\) is overloaded, since \(\Gamma \models A\) and \(\mdl M \models A\) mean different things. Sometimes people also write \(\models_{\mdl M} A\) instead of \(\mdl M \models A\), and \(\models_{\mdl M, \sigma} A\) instead of \(\mdl M \models_\sigma A\).
10.4. Normal forms
The notion of a formula in negation normal form carries over to first-order logic, where now we allow quantifiers as well. Every formula can be put in negation normal form using the identities \((\lnot \fa x A) \liff \ex x \lnot A\) and \((\lnot \ex x A) \liff \fa x \lnot A\).
Is there anything similar to CNF? It helps that it is always possible to bring quantifiers to the front of a formula, using these identities, which hold when \(x\) is not free in \(B\):
We can ensure that these apply by renaming the bound variable if necessary. We can then put the inside formula in CNF if we want. But it turns out to be more useful to automated reasoning to eliminate the quantifiers entirely. We can do that using Skolem functions, which we will tell you about later on.