12. Decision Procedures for First-Order Logic

Given a propositional formula \(A\) and a truth assignment \(\tau\), we have seen that it is straightforward to test wither \(A\) is true under \(\tau\). We have also seen that a formula \(A\) is valid if and only if it is provable, and we have considered decision procedures for propositional logic that determine whether that is the case.

In a similar way, given a first-order sentence \(A\) and a model \(\mdl M\), we can ask whether \(A\) is true in \(\mdl M\). But for most fixed choices of \(\mdl M\), there is no algorithm to determine whether or not \(A\) is true. For example, there is no algorithm to determine whether a sentence in a language with two binary function symbols is true of the model \((\mathbb{Z}, +, \times)\).

Given a first-order sentence \(A\), we can also ask whether it is valid, that is, true in all models. Once we have suitable proof systems for first-order logic, this will be equivalent to the question as to whether \(A\) is provable. If the language has a binary relation symbol or two unary functions, this, too, is undecidable.

But all is not lost. For some interesting models, the question of truth is decidable. These include the theory of the real numbers \((\mathbb{R}, 0, 1, +, \times, <)\) with zero, one, addition, multiplication, and the less-than relation, and the theory of the integers \((\mathbb{Z}, 0, 1, +, <)\) in the same language except without multiplication.

We can also ask about validity for restricted classes of formulas. A formula is said to be quantifier-free if it has no quantifiers, and universal if it consists of any number of universal quantifiers \(\forall\) followed by a quantifier-free formula. Interestingly, the question as to whether a universal first-order formula is valid is decidable.

Another thing we can do is ask whether a formula \(A\) is provable from some axioms \(\Gamma\). For some fixed choices of \(\Gamma\), this question is decidable. For example, there are natural axioms that characterize truth in the two structures mentioned above, \((\mathbb{R}, 0, 1, +, \times, <)\) and \((\mathbb{Z}, 0, 1, +, <)\).

From the theoretical standpoint, before looking for a computational solution to a problem in logic, the first challenge is to determine whether or not the problem is decidable. If the answer is “yes,” we can look for implementations that are efficient in practice. If the answer is “no,” the best we can do is look for simplifications or approximations. We will see that SMT solvers focus on the case where the answer is positive. The goal of this chapter is to establish decidability in a few interesting cases, without worrying about efficiency and implementation.

12.1. Linear arithmetic

A linear expression is one of the form \(a_1 x_1 + a_2 x_2 + \cdots + a_n x_n + b\), where each \(a_i\) is a rational number, \(b\) is a rational number, and each \(x_i\) is a variable. We think of the variables \(x_i\) as ranging over the real numbers. A linear constraint is one of the form \(s = t\) or \(s < t\), where \(s\) and \(t\) are linear expressions. (In practice, we usually include constraints of the form \(s \le t\) and sometimes \(s \ne t\) as well, but let’s keep it simple for now.)

Notice that any linear constraint is equivalent to one of the form \(t = 0\) or \(t > 0\), since we can move all the terms to one side. For example, the constraint \(3 x + 2 y < 3y + 4z\) is equivalent to \(-3x + y + 4z > 0\). An important observation that we will use below is that any linear constraint that involves a variable \(x\) can be written as \(x = t\), \(x < t\), or \(t < x\), where \(x\) does not occur in \(t\). We do this by simply solving for \(x\). For example, the previous constraint can be expressed as \(x < (1/3)y + (4/3)z\). Remember that dividing both sides of an inequality by a negative number reverses the direction.

A set \(\Gamma\) of linear constraints is satisfiable if there is an assignment of real values to the variables that makes them all true. Our first goal is to prove the following.


The question as to whether a finite set of linear constraints is satisfiable is decidable.


We use induction on the number of variables. If there are no variables at all, \(\Gamma\) contains only expressions of the form \(b_0 < b_1\) or \(b_0 = b_1\) where \(b_0\) and \(b_1\) are constants, and we only need to perform the comparisons to see whether they are true. Remember that if \(\Gamma\) is the empty set, we take it to be trivially satisfied.

In the inductive step, \(\Gamma\) contains a variable. If \(\Gamma\) contains any false constant equations, it is unsatisfiable, and it it contains any true constant equations, we can remove them without affecting satisfiability. If \(\Gamma\) contains a nontrivial equation with a variable \(x\), we put it in the form \(x = t\) and then substitute \(t\) for \(x\) everywhere. The resulting set of constraints has one fewer variable, and clearly it is equisatisfiable with the original one. Given an assignment to the new set of constraints, we just assign \(x\) the value of \(t\).

So we can now assume that there are no equations in \(\Gamma\). We can divide the inequalities in \(\Gamma\) intro three kinds:

  • those that don’t contain \(x\) at all

  • those that can be expressed in the form \(s_i < x\)

  • those that can be expressed in the form \(x < t_j\)

Let \(\Gamma'\) be the set that results from removing the inequalities in the last two categories and replacing them with inequalities of the form \(s_i < t_j\). We claim \(\Gamma'\) is equisatisfiable with \(\Gamma\). Clearly any assignment that satisfies \(\Gamma\) also satisfies \(\Gamma'\). Conversely, suppose \(\sigma\) is an assignment that satisfies \(\Gamma'\). Then, under that assignment, the value of each \(s_i\) is less than the value of every \(t_j\). We obtain an assignment satisfying \(\Gamma\) by mapping \(x\) to any value between the largest \(s_i\) and the smallest \(t_j\). (If one of the last two categories is empty, we remove the constraints in the other category entirely, since they can be satisfied by taking \(x\) sufficiently large or sufficiently small.)

The procedure implicit in this proof is known as the Fourier-Motzkin procedure, since an incipient presentation of the idea can be found in the work of Jean-Baptiste Joseph Fourier in the early nineteenth century. (This is the same Fourier who gave us Fourier analysis.) In the worst case, every elimination step divides the number of equations in half and then squares it, resulting in doubly exponential behavior. The procedure works well in practice, though, since in many applications each variable is contained in only a few equations. (There are obvious heuristics, like choosing a variable at each stage that minimizes the number of equations at the next stage.) There is an implementation of the procedure in the file FourierMotzkin.lean in the Examples folder, modulo two components that we ask you to supply. SMT solvers use much more efficient methods based on the simplex algorithm from linear programming.

What does this theorem have to do with logic? Suppose the variables are labeled \(x_1, x_2, \ldots, x_n\) and the constraints are labeled \(c_1, c_2, \ldots, c_m\). Then what we are really asking as to whether the formula \(\ex {x_1, \ldots, x_n} c_1 \land c_2 \land \cdots \land c_m\) is true of the real numbers when the constraints are interpreted in the expected way. To make this more precise, consider the structure \((\mathbb R, 0, 1, +, <)\) in a language with symbols \(0\), \(1\), \(+\), and \(<\). All the constraints can be expressed in this language, albeit in a clunky way. For example, we can write \(3 x\) as \(x + x + x\), and express a constraint like \(x -(1/2)y + (4/3)z < 0\) as \(6x + 8z < 3y\). A slight expansion of the proof of the theorem above yields the following:


The question as to whether a sentence \(A\) is true in \((\mathbb R, 0, 1, +, <)\) is decidable.

We will only sketch the details here. The algorithm uses an important method known as “elimination of quantifiers.” The idea is to successively eliminate quantifiers, one by one, until we are left with a quantifier-free sentence. We can determine the truth of that by simply calculating.

We will show that any formula \(\ex x A\), where \(A\) is quantifier-free, is equivalent to a quantifier-free formula \(A'\) that does not include \(x\). Repeating the process and using the fact that \(\fa x A\) is equivalent to \(\lnot \ex x \lnot A\), we can eliminate all the quantifiers.

Given a formula \(\ex x A\), put \(A\) into disjunctive normal form. (We are not worrying about efficiency now, only trying to establish decidability in principle.) We can replace \(s \not< t\) by \(t < s \lor s = t\), and we can replace \(s \ne t\) by \(s < t \lor t < s\). Putting the result into disjunctive normal form again, we can assume that all the atomic formulas are of the form \(s < t\) or \(s = t\).

If we write \(A\) as \(A_1 \lor A_2 \lor \cdots \lor A_n\), then \(\ex x A\) is equivalent to \((\ex x A_1) \lor (\ex x A_2) \lor \cdots \lor (\ex x A_n)\). So we only need to show how to eliminate an existential quantifier from a conjunction of constraints of the form \(s < t\) or \(s = t\). But that is exactly what the proof of the first theorem in this section does, so we are done.

It is possible to write down axioms that justify every step of the transformation. The resulting set of axioms is known as the theory of linear arithmetic. The argument shows that the resulting set of axioms characterizes the structure exactly, and that the question of provability from those axioms is decidable.

Interestingly, the theorem remains true if we add multiplication. The resulting theory is known as the theory of real closed fields. The proof is much harder, however. The theorem was proved by Alfred Tarski before World War II, but it wasn’t published until 1948, after the war.

12.2. Linear integer arithmetic

What happens if we replace the real numbers by the integers? It turns out that truth in the structure \((\mathbb Z, 0, 1, +)\) is also decidable. This was established in 1926 by Mojżesz Presburger, a student of Tarski’s, who later died in the Holocaust. (The story has it that Tarski did not think the result was enough for a dissertation, and made him do more work.) The resulting theorem is known as Presburger arithmetic or linear integer arithmetic. In contrast to the reals, the order on the integers is discrete, since there is nothing between a value \(x\) and \(x + 1\). The decision procedure is more complicated than that for linear arithmetic, and we will not discuss it here. SMT solvers, however, use efficient implementations of the existential fragment of the theory, which is to say, the satisfiability problem for quantifier-free formulas.

In contrast to the case with the real numbers, however, the result is false if we add multiplication. In other words, truth in the model \((\mathbb Z, 0, 1, +, \times)\) is undecidable. This follows from the methods that Gödel used to prove the incompleteness theorems, and it is also a consequence of Tarski’s theorem on the undefinability of truth.

12.3. Equality

Fix a language, \(L\). We will consider equations \(s = t\) and disequations \(s \ne t\) between closed terms. The fact that we are considering closed terms mean that there are no variables to substitute for; computer scientists sometimes call these ground terms. (As with unification, in some contexts we may want to treat some variables as constant. What is important here is not whether we call them variables or constants, but, rather, the fact that we are not considering substitutions.)

The problem we are addressing here is this: given a set of equations and disequations, is it satisfiable? Note that the word “satisfiable” is used in a different sense than it was used in the previous two sections. In those sections, we are interested in whether a set of formulas is satisfied by a variable assignment in a particular model, namely, the reals and the integers, respectively. Here we are asking whether a set of sentences is satisfied by any model.

For example, consider the following set of sentences:

  1. \(f(a, a) = b\)

  2. \(g(c, a) = c\)

  3. \(g(c, f(a, a)) = f(g(c, a), g(c, a))\)

  4. \(f(c, c) \ne g(c, b)\)

Is it satisfiable?

Before we answer that, let’s make some general observations. A set that only has equations and no disequations is easily satisfiable, namely, in a model with a single element, where every expression is equal to every other one. Similarly, a set that only has disequations is easily satisfiable, unless one of the disequations is of the form \(t \ne t\). For that purpose, we can use the term model, where every term is interpreted as itself. The interesting cases fall in between these two extremes, where the equations and disequations balance one another.

Coming back to the question, the following considerations show that the answer is “no.” Each of the following is a consequence of the equations above:

  1. \(g(c, f(a, a)) = g(c, b)\) from 1

  2. \(f(g(c, a), g(c, a)) = f(c, c)\) from 2

  3. \(f(c, c) = g(c, b)\) from 3, 5, and 6.

This contradicts the disequation 4 above. To understand what is going on, it is helpful to think of \(f\) as addition, \(g\) as multiplication, \(a\) as the number 1, and \(b\) as the number 2. But the argument is fully abstract, and shows that the disequation cannot hold in any model in which all the equations are satisfied.

These considerations encapsulate the main ideas behind the proof of the following theorem:


The question as to whether a finite set of ground equations and disequations is satisfiable is decidable.

The idea behind the proof is to use a saturation argument: starting from the equations in question, we derive new equations until no more equations are derivable. If we manage to contradict one of the disequations, the original set is not satisfiable. In the case where no contradiction is found, we will argue that the original set is satisfiable.

To make all this precise, we need a set of rules for deriving equations.

\[\begin{prooftree} \AXC{$t = t$} \end{prooftree} \quad \quad \begin{prooftree} \AXC{$s = t$} \UIC{$t = s$} \end{prooftree} \quad \quad \begin{prooftree} \AXC{$r = s$} \AXC{$s = t$} \BIC{$r = t$} \end{prooftree} \quad\quad \begin{prooftree} \AXC{$s_1 = t_1$} \AXC{$\ldots$} \AXC{$s_n = t_n$} \TIC{$f(s_1, \ldots, s_n) = f(t_1, \ldots, t_n)$} \end{prooftree} \]

The first three rules express the reflexivity, symmmetry, and transitivity of equality, respectively. The last rule is called the congruence rule. You should convince yourself that using these rules we can derive

\[\begin{prooftree} \AXC{$r = s$} \UIC{$t[r/x] = t[s/x]$} \end{prooftree} \]

for any terms \(r\), \(s\), and \(t\) and variable \(x\). If we add relation symbols and atomic formulas, we would add the following rule:

\[ \begin{prooftree} \AXC{$s_1 = t_1$} \AXC{$\ldots$} \AXC{$s_n = t_n$} \AXC{$R(s_1, \ldots, s_n)$} \QuaternaryInfC{$R(t_1, \ldots, t_n)$} \end{prooftree} \]

Returning to our proof plan, we want to show that if applying these rules successively does not result in a contradiction, then there is a model in which the original equations and disequations are all true. But a problem arises: what if the original set contains an equation \(a = f(a)\)? Then our algorithm falls into an infinite loop, deriving \(a = f(a) = f(f(a)) = f(f(f(a))) = \ldots\). The solution is to restrict attention to subterms of terms appearing in the original equations and disequations. The theorem follows from the following lemma.


Let \(\Gamma\) consist of a set of equations and disequations. Let \(S\) be the set of subterms of all the terms occurring in \(\Gamma\). Let \(\Gamma'\) be the set of al equations between elements of \(S\) that can be derived from the equations in \(\Gamma\) using the rules above. Then \(\Gamma\) is satisfiable if and only if no disequation in \(\Gamma\) is the negation of an equation in \(\Gamma'\).

The algorithm implicit in this lemma is called congruence closure. We will see that it can be implemented efficiently (and is implemented efficiently in SMT solvers) using union-find data structures.


One direction of the lemma is easy. Since the equational rules preserve truth in any model, if we can derive a contradiction from the equations and disequations in \(\Gamma\), then \(\Gamma\) is unsatisfiable. The other direction is harder. Since there are only finitely many pairs of terms in \(S\), the algorithm necessarily terminates. We need to show that if it terminates without deriving a contradiction, then there is a model that satisfies \(\Gamma\).

Say two elements \(s\) and \(t\) are equivalent, written \(s \equiv t\), if they are proved equal from the equations in \(\Gamma\). The rules guarantee that this is an equivalence relation, which is to say, it is reflexive, symmetric, and transitive. It is also a congruence, which means that applying a function symbol to equivalent terms results in equivalent terms.

To each element \(t\), we associate its equivalence class \([t]\), defined by

\[[t] = \{ s \in S \mid s \equiv t \}.\]

In words, \([t]\) is the set of terms equivalent to \(t\). Assuming the algorithm terminates without a contradiction, define a model \(\mdl M\) whose universe consists of all the equivalence classes of elements of \(S\) together with a new element, \(\star\). For elements \(t_1, \ldots t_n\) in \(S\), interpret each \(n\)-ary function symbol \(f\) by function

\[\begin{split}f^{\mdl M}([t_1], \ldots, [t_n]) = \begin{cases} [f(t_1, \ldots, t_n)] & \text{if $f(t_1, \ldots, t_n)$ is in $S$} \\ \star & \text{otherwise} \end{cases}\end{split}\]

In other words, what \(f^{\mdl M}\) does to each equivalence class is determined by what \(f\) does to each of the elements. The fact that \(\equiv\) is a congruence ensures that this makes sense. This is just a truncated version of the term model, in which provably equal terms are all glued together.

It is not hard to show that for every term \(t\) in \(S\), \(\tval{t}_{\mdl M}\) is equal to \([t]\). But this is what we need. For every equation \(s = t\) in \(\Gamma\), \(s\) and \(t\) are in the same equivalence class, so they are equal in the model. And if \(s\) and \(t\) are not provably equal, then \([s]\) and \([t]\) are not the same, so every disequation \(s \ne t\) in \(\Gamma\) is true in \(\mdl M\) as well.

For examples of the algorithm in action, first let us show that the set

\[f^3(a) = a, \, f^5(a) = a, \, f(a) \ne a\]

is unsatisfiable, where \(f^n(a)\) abbreviates \(n\)-fold application \(f(f(\cdots f(a)))\). The set of all subterms is

\[a, \, f(a), \, f^2(a), \, f^3(a), \, f^4(a), \, f^5(a).\]

We start with the equivalence classes \(\{ a, f^3(a) \}\) and \(\{ a, f^5(a)\}\) as well as all the others subterms in singleton sets. From \(a = f^3(a)\) we derive \(f(a) = f^4(a)\) by congruence, giving rise to the set \(\{ f(a), f^4(a) \}\). Applying congruence again gives rise to the set \(\{ f^2(a), f^5(a) \}\), which is merged with \(\{ a, f^5(a)\}\) to yield \(\{ a, f^2(a), f^5(a) \}\). Applying congruence again yields \(\{ f(a), f^3(a) \}\). (We ignore the term \(f^6(a)\).) This is merged with the set \(\{ a, f^3(a) \}\) to yield \(\{ a, f(a), f^3(a)\}\). Applying congruence again yields \(\{ f(a), f^2(a), f^4(a)\}\), which is merged with \(\{ a, f(a), f^3(a) \}\) and \(\{ f^2(a), f^5(a) \}\) to yield \(\{ a, f(a), f^2(a), f^3(a), f^5(a) \}\). At this point, we have derived \(f(a) = a\), contradicting the disequality in the original set. So the set is unsatisfiable.

Suppose we start instead with the set

\[f^2(a) = a, \, f^4(a) = a, \, f(a) \ne a, \, f(a) \ne b\]

You can check that in this case, the algorithm terminates with the following three equivalence classes:

  • \([a] = \{ a, f^2(a), f^4(a)\}\)

  • \([f(a)] = \{ f(a), f^3(a) \}\)

  • \([b] = \{ b \}\).

We now construct a model \(\mdl M\) with these elements and an additional element \(\star\), with

\[\begin{split}f^{\mdl M}([a]) & = [f(a)] \\ f^{\mdl M}([f(a)]) & = [a] \\ f^{\mdl M}([b]) & = \star \\ f^{\mdl M}(\star) & = \star\end{split}\]

You can check that this satisfies the original set of equations and disequations.

Suppose we allow atomic formulas \(R(t_1, \ldots, t_n)\) and negated atomic formulas in \(\Gamma\). To test satisfiability, we do not have to change much. Using the congruence rule for relations, whenever we have derived \(R(s_1, \ldots, s_n)\) and \(s_i = t_i\) for every \(i\), we can conclude \(R(t_1, \ldots, t_n)\). The algorithm terminates when we contradict a disequality or another negated atomic formula. If the algorithm terminates without a contradiction, we build a model as before, where we simply declare that \(R^{\mdl M}([t_1], \ldots, [t_n])\) holds if and only if we have determined that \(R(t_1, \ldots, t_n)\) in a consequence of the original set.

Now suppose we are given an existential sentence \(\ex {x_1, \ldots, x_n} A\) where \(A\) is quantifier-free, and suppose we want to determine whether it is satisfiable. Replace \(x_1, \ldots, x_n\) in \(A\) with new constants \(c_1, \ldots, c_n\). The resulting quantifier-free sentence is satisfiable if and only if the existential one is. Put that sentence in DNF, and use the fact \(A_1 \lor \cdots \lor A_n\) is satisfiable if and only if one of the sentences \(A_i\) is. That reduces the task to determining whether a conjunction of literals is satisfiable, and we have just explained how to do that.

Since a sentence is valid if and only if its negation is satisfiable, and since the negation of a universal sentence is an existential sentence, we have shown the following.


The satisfiability of an existential sentence in first-order logic is decidable. Equivalently, the validity of a universal sentence is decidable.