12. Decision Procedures for Equality

In the terminology of logic and computer science, a decision procedure is an algorithm that accepts a class of yes/no questions and answers them correctly. Fix a set of first-order sentences, \(\Gamma\), which we can think of as a set of axioms of as a description of a possible state of affairs. Here are some questions we can ask:

  • Given a first-order sentence \(A\), does \(\Gamma\) prove \(A\), that is, \(\Gamma \proves A\)?

  • Given a first-order sentence \(A\), does \(\Gamma\) entail \(A\), that is, \(\Gamma \models A\)?

  • Given a first-order sentence \(A\), is \(\Gamma \cup \{ A \}\) is satisfiable, that is, is there a model \(\mdl M\) of \(\Gamma\) in which \(A\) is true?

The first two questions are equivalent. We have not yet presented deductive systems for first-order logic, but we will do so in Chapter 16, and by the soundness and completeness theorems for those systems, we will have that a sentence \(A\) is provable from a set of hypotheses \(\Gamma\) if and only if it is entailed by \(\Gamma\). The third question is equivalent to the first two by translation: as with propositional logic, we have that \(\Gamma \models A\) if and only if \(\Gamma \cup \{ \lnot A \}\) is unsatisfiable. So questions about entailment are equivalent to questions about satisfiability, negating the formula in question.

Whether or not a problem of the form above is decidable depends on \(\Gamma\). For example, if \(\Gamma\) is empty, the first question boils down to the question as to whether a sentence \(A\) is provable in first-order logic. If the language in question has a binary relation symbol or two unary function symbols, the question is undecidable. In the special case where the language has only unary predicate symbols and a single function symbol, the question is decidable. Neither of these facts are easy to prove.

Interestingly, adding axioms to \(\Gamma\) can make provability decidable. One way to think of this is that doing so constrains the types of models that have to be considered. Given a set of axioms \(\Gamma\), deciding whether or not provability is decidable of often hard, and, for that reason, generally interesting. To show that the answer is “yes,” one should describe an algorithm, and then there is the further question as to whether the algorithm can be made to run efficiently on the specific questions we care about. A “no” answer usually proceeds to showing that a solution would lead to a solution to the halting problem, or another problem that has been shown to be reducible to it.

The second type question above asks about the truth of a sentence \(A\) in all models of \(\Gamma\), and the third type of question above asks about the truth of a sentence \(A\) in some model of \(\Gamma\). There are other types of decision procedures we might consider. Fixing a model \(\mdl M\), we can ask:

  • Given a first-order formula \(A\), possibly with free variables, is there a variable assignment \(\sigma\) such that \(A\) is true in \(\mdl M\) under \(\sigma\), that is, \(\models_{\mdl M, \sigma} A\)?

  • Given a first-order formula \(A\), possibly with free variables, is \(A\) is true in \(\mdl M\) under every assignment \(\sigma\)?

In the first case, \(A\) is said to be satisfiable in \(\mdl M\), and in the second case, it is said to be \(A\) valid in \(\mdl M\). But be careful: the words are being used in a slightly different sense than before, when satisfiable meant “true in some model” rather than “true for some assignment” and valid meant “true in all models” rather than “true for all assignments.”

Remember that a sentence is a formula without free variables. If \(A\) is a sentence, both questions boil down to the question as to whether \(A\) is true in \(\mdl M\). Notice that if \(A\) has free variables \(x_1, x_2, \ldots, x_n\), then the following are all equivalent:

  • \(A\) is satisfiable in \(\mdl M\).

  • \(\ex {x_1, x_2, \ldots, x_n} A\) is true in \(\mdl M\).

  • \(\fa {x_1, x_2, \ldots, x_n} \lnot A\) is false in \(\mdl M\).

  • \(\lnot A\) is not valid in \(\mdl M\).

It is confusing that there are so many ways to ask the same question! But you have to get used to it: logicians and computer scientists slip back and forth between the different ways of thinking about a problem, sometimes even in the same sentence.

The relationship between the first group of questions, having to do with entailment and satisfiability, and the second group of questions, having to do with truth in models, is subtle. Sometimes, for a given model \(\mdl M\), there is a natural set of axioms \(\Gamma\) such that a sentence is true in \(\mdl M\) if and only if it is provable from \(\Gamma\). Sometimes, instead, one can show that there is no computable set of axioms that has this property. Life is complicated! The set of all true sentences of a model \(\mdl M\) is called “the theory of \(\mdl M\), so we can express the last property by saying that the theory of \(\mdl M\) is not computable.

To muddy the waters even further, instead of asking questions about all first-order formulas, we can consider restricted problems where we are only allowed to ask about formulas of a certain kind. A formula is said to be quantifier-free if it has no quantifiers, universal if it consists of any number of universal quantifiers \(\forall\) (possibly none) followed by a quantifier-free formula, and existential if it consists of any number of existential quantifiers followed by a quantifier-free formula. In some cases, So we can ask, for a decision procedure for the provability of universal formulas from a set of axioms or for the satisfiability of an existential formula in a model, and in some cases, we may have a positive answer even though the full problem is undecidable.

In this chapter, we will describe, in detail, a decision procedure for the validity universal formulas in pure first-order logic, that is, first-order logic without any axioms. The central component to this is a decision procedure for equality, and so we start with that.

12.1. Equality

Fix a language, \(L\). We will consider equations \(s = t\) and disequations \(u \ne v\) between closed terms. The fact that we are considering closed terms mean that there are no variables to substitute for; computer scientists sometimes call these ground terms. (As with unification, in some contexts we may want to treat some variables as constant. What is important here is not whether we call them variables or constants, but, rather, the fact that we are not considering substitutions.)

The problem we are addressing here is this: given a set of equations and disequations, is it satisfiable? Notice that here we are asking about satisfiability in any model. In particular, the set

\[\{ s_1 = t_1, \ldots, s_n = t_n, u_1 \ne v_1, \ldots, u_m \ne v_m \}\]

is unsatisfiable if and only if we have

\[s_1 = t_1, \ldots, s_n = t_n \proves u_1 = v_1 \lor \cdots \lor u_m = v_m\]

So we can think of the problem in either way.

For example, consider the following set of sentences:

  1. \(f(a, a) = b\)

  2. \(g(c, a) = c\)

  3. \(g(c, f(a, a)) = f(g(c, a), g(c, a))\)

  4. \(f(c, c) \ne g(c, b)\)

Is it satisfiable?

Before we answer that, let’s make some general observations. A set that only has equations and no disequations is easily satisfiable, namely, in a model with a single element, where every expression is equal to every other one. Similarly, a set that only has disequations is easily satisfiable, unless one of the disequations is of the form \(t \ne t\). For that purpose, we can use the term model, where every term is interpreted as itself. The interesting cases fall in between these two extremes, where the equations and disequations balance one another.

Coming back to the question, the following considerations show that the answer is “no.” Each of the following is a consequence of the equations above:

  1. \(g(c, f(a, a)) = g(c, b)\) from 1

  2. \(f(g(c, a), g(c, a)) = f(c, c)\) from 2

  3. \(f(c, c) = g(c, b)\) from 3, 5, and 6.

This contradicts the disequation 4 above. To understand what is going on, it is helpful to think of \(f\) as addition, \(g\) as multiplication, \(a\) as the number 1, and \(b\) as the number 2. But the argument is fully abstract, and shows that the disequation cannot hold in any model in which all the equations are satisfied.

These considerations encapsulate the main ideas behind the proof of the following theorem:

Theorem

The question as to whether a finite set of ground equations and disequations is satisfiable is decidable.

The idea behind the proof is to use a saturation argument: starting from the equations in question, we derive new equations until no more equations are derivable. If we manage to contradict one of the disequations, the original set is not satisfiable. In the case where no contradiction is found, we will argue that the original set is satisfiable.

To make all this precise, we need a set of rules for deriving equations.

\[\begin{prooftree} \AXC{$t = t$} \end{prooftree} \quad \quad \begin{prooftree} \AXC{$s = t$} \UIC{$t = s$} \end{prooftree} \quad \quad \begin{prooftree} \AXC{$r = s$} \AXC{$s = t$} \BIC{$r = t$} \end{prooftree} \quad\quad \begin{prooftree} \AXC{$s_1 = t_1$} \AXC{$\ldots$} \AXC{$s_n = t_n$} \TIC{$f(s_1, \ldots, s_n) = f(t_1, \ldots, t_n)$} \end{prooftree} \]

The first three rules express the reflexivity, symmetry, and transitivity of equality, respectively. The last rule is called the congruence rule. You should convince yourself that using these rules we can derive

\[\begin{prooftree} \AXC{$r = s$} \UIC{$t[r/x] = t[s/x]$} \end{prooftree} \]

for any terms \(r\), \(s\), and \(t\) and variable \(x\).

Returning to our proof plan, we want to show that if applying these rules successively does not result in a contradiction, then there is a model in which the original equations and disequations are all true. But a problem arises: what if the original set contains an equation \(a = f(a)\)? Then our algorithm falls into an infinite loop, deriving \(a = f(a) = f(f(a)) = f(f(f(a))) = \ldots\). The solution is to restrict attention to subterms of terms appearing in the original equations and disequations. The theorem follows from the following lemma.

Lemma

Let \(\Gamma\) consist of a set of equations and disequations. Let \(S\) be the set of subterms of all the terms occurring in \(\Gamma\). Let \(\Gamma'\) be the set of all equations between elements of \(S\) that can be derived from the equations in \(\Gamma\) using the rules above. Then \(\Gamma\) is satisfiable if and only if no disequation in \(\Gamma\) is the negation of an equation in \(\Gamma'\).

The algorithm implicit in this lemma is called congruence closure.

Proof

One direction of the lemma is easy. Since the equational rules preserve truth in any model, if we can derive a contradiction from the equations and disequations in \(\Gamma\), then \(\Gamma\) is unsatisfiable. The other direction is harder. Since there are only finitely many pairs of terms in \(S\), the algorithm necessarily terminates. We need to show that if it terminates without deriving a contradiction, then there is a model that satisfies \(\Gamma\).

Say two elements \(s\) and \(t\) are equivalent, written \(s \equiv t\), if they are proved equal from the equations in \(\Gamma\). The rules guarantee that this is an equivalence relation, which is to say, it is reflexive, symmetric, and transitive. It is also a congruence, which means that applying a function symbol to equivalent terms results in equivalent terms.

To each element \(t\), we associate its equivalence class \([t]\), defined by

\[[t] = \{ s \in S \mid s \equiv t \}.\]

In words, \([t]\) is the set of terms equivalent to \(t\). Assuming the algorithm terminates without a contradiction, define a model \(\mdl M\) whose universe consists of all the equivalence classes of elements of \(S\) together with a new element, \(\star\). For elements \(t_1, \ldots t_n\) in \(S\), interpret each \(n\)-ary function symbol \(f\) by the function

\[\begin{split}f^{\mdl M}([t_1], \ldots, [t_n]) = \begin{cases} [f(t_1, \ldots, t_n)] & \text{if $f(t_1, \ldots, t_n)$ is in $S$} \\ \star & \text{otherwise} \end{cases}\end{split}\]

In other words, what \(f^{\mdl M}\) does to each equivalence class is determined by what \(f\) does to each of the elements. If any of the arguments to \(f^{\mdl M}\) is \(\star\), then \(f^{\mdl M}\) returns \(\star\). The fact that \(\equiv\) is a congruence ensures that this makes sense. This is just a truncated version of the term model, in which provably equal terms are all glued together.

It is not hard to show that for every term \(t\) in \(S\), \(\tval{t}_{\mdl M}\) is equal to \([t]\). But this is what we need. For every equation \(s = t\) in \(\Gamma\), \(s\) and \(t\) are in the same equivalence class, so they are equal in the model. And if \(s\) and \(t\) are not provably equal, then \([s]\) and \([t]\) are not the same, so every disequation \(s \ne t\) in \(\Gamma\) is true in \(\mdl M\) as well.

In the construction of the counter-model, we don’t really need to introduce a new element \(\star\). You can check that everything works if we pick an arbitrary equivalence class \([t]\) and use that as the default instead. We simply find it clearer to use \(\star\) as a catch-all for everything that falls outside the scope of the finite set of terms we started with.

For examples of the algorithm in action, first let us show that the set

\[f^3(a) = a, \, f^5(a) = a, \, f(a) \ne a\]

is unsatisfiable, where \(f^n(a)\) abbreviates \(n\)-fold application \(f(f(\cdots f(a)))\). The set of all subterms is

\[a, \, f(a), \, f^2(a), \, f^3(a), \, f^4(a), \, f^5(a).\]

We start with the equivalence classes \(\{ a, f^3(a) \}\) and \(\{ a, f^5(a)\}\) as well as all the others subterms in singleton sets. From \(a = f^3(a)\) we derive \(f(a) = f^4(a)\) by congruence, giving rise to the set \(\{ f(a), f^4(a) \}\). Applying congruence again gives rise to the set \(\{ f^2(a), f^5(a) \}\), which is merged with \(\{ a, f^5(a)\}\) to yield \(\{ a, f^2(a), f^5(a) \}\). Applying congruence again yields \(\{ f(a), f^3(a) \}\). (We ignore the term \(f^6(a)\).) This is merged with the set \(\{ a, f^3(a) \}\) to yield \(\{ a, f(a), f^3(a)\}\). Applying congruence again yields \(\{ f(a), f^2(a), f^4(a)\}\), which is merged with \(\{ a, f(a), f^3(a) \}\) and \(\{ f^2(a), f^5(a) \}\) to yield \(\{ a, f(a), f^2(a), f^3(a), f^5(a) \}\). At this point, we have derived \(f(a) = a\), contradicting the disequality in the original set. So the set is unsatisfiable.

Suppose we start instead with the set

\[f^2(a) = a, \, f^4(a) = a, \, f(a) \ne a, \, f(a) \ne b\]

You can check that in this case, the algorithm terminates with the following three equivalence classes:

  • \([a] = \{ a, f^2(a), f^4(a)\}\)

  • \([f(a)] = \{ f(a), f^3(a) \}\)

  • \([b] = \{ b \}\).

We now construct a model \(\mdl M\) with these elements and an additional element \(\star\), with

\[\begin{split}f^{\mdl M}([a]) & = [f(a)] \\ f^{\mdl M}([f(a)]) & = [a] \\ f^{\mdl M}([b]) & = \star \\ f^{\mdl M}(\star) & = \star\end{split}\]

You can check that this satisfies the original set of equations and disequations.

Our analysis establishes an interesting property of first-order logic: it is possible to prove a disjunction \(u_1 = v_1 \lor \cdots \lor u_m = v_m\) from a set of equation if and only if it is possible to prove \(u_i = v_i\) for some \(i\). This is a property known as convexity. It relies on the fact that we allow only positive equations on the right-hand side. For example, \(a = b \lor a \ne b\) is provable in first-order logic, but clearly neither disjunct is provable on its own.

We have described the algorithm as working on closed terms, that is, terms with no variables. Of course, there is no harm if we allow variables in the terms and simply treat them as constants. The point is that in this formulation of the problem, a hypothesis \(f(x) = a\) is interpreted as a statement about one particular \(x\) that never changes in the statement of a problem. It is an entirely different problem to consider hypotheses like \(\fa x f(x) = a\) for which we are allowed to substitute any term for \(x\). For example, we might want to add axioms like \(\fa {x, y, z} (x + y) + z\) and \(\fa {x, y}. x + y = y + x\) as axioms for the integers or real numbers. The problem of determining whether a single equation follows from a set of universally quantified equations is known as the word problem, and, in general it is undecidable.

12.2. Implementing congruence closure

A high-level overview of the congruence closure algorithm is as follows:

  • Given a set of equations and disequations, collect the set of all subterms.

  • Start with each subterm in its own equivalence class.

  • For each equation in the original problem:

    • Merge the equivalence classes specified by the equation.

    • Keep merging classes according to the congruence rule, until it can no longer be applied.

The resulting equivalence classes characterize all the equations that are implied by the set of the equations in the original problem. If they contradict one of the disequations, the original set is unsatisfiable, and otherwise it is satisfiable. Of course, one can stop the procedure early if a contradiction is found.

For example, given the set \(\{ f(a,b) = a, f(f(a,b),b) \neq a \}\), the set of all subterms is \(\{a,b,f(a,b),f(f(a,b),b)\}\). The algorithm proceeds as follows:

  • Initially, the equivalence classes are \(\{a\}, \{b\}, \{f(a,b)\}, \{f(f(a,b),b)\}\).

  • Processing the equation \(f(a,b) = a\), we merge the equivalence classes of \(\{f(a, b)\}\) and \(\{a\}\) to get \(\{a, f(a,b)\}, \{b\}, \{f(f(a,b),b)\}\).

  • Considering the terms \(f(a, b)\) and \(f(f(a,b), b)\), we see that the first immediate subterms of each, \(a\) and \(f(a, b)\), and the second immediate subterms, both \(b\), are pairwise in the same equivalence classes. This means we can merge the equivalence classes containing \(f(a, b)\) and \(f(f(a,b), b)\) to get \(\{a, f(a,b), f(f(a,b),b)\}, \{b\}\).

  • This contradicts \(f(f(a,b), b) \neq a\), so the original set is unsatisfiable.

Congruence closure can be implemented efficiently (and is implemented efficiently in SMT solvers) using a union-find data structure, a generic means of partitioning a set of nodes into equivalence classes. Each node has a pointer to its parent, which is another element in the equivalence class. Each class has a distinguished element, called the representative, with the property that it is the parent of itself. To initialize the algorithm with a set of elements, each in its own class, we create a new node for each element and set its parent to itself. The union-find structure supports two operations:

  • The find operation returns the representative of the class of a given element by following the parent pointers. In pseudocode:

    def find(x):
      if x.parent == x:
        return x
      else:
        return find(x.parent)
    

    We can then determine whether two elements x and y are equivalent by checking whether find(x) and find(y) are the same.

  • The merge operation takes two nodes, x and y and ensures that they are in the same class, by finding the representatives of each class, and, if they are not the same, making one point to the other:

    def merge(x, y):
      xr := find(x)
      yr := find(y)
      if xr != yr:
        yr.parent := xr
    

In an efficient implementation, there are two important optimizations. First, the find operation can be modified to update the parent pointers of all nodes on the path to the representative, so that future finds are faster. Second, the merge operation can be modified to make the smaller class a child of the larger class, which ensures that the depth of the tree is logarithmic in the number of elements.

Using the union-find data structure, we can straightforwardly implement the congruence closure algorithm outlined at the beginning of this section. Because the set that is being partitioned into equivalence classes consists of the subterms of the terms in the original problem, there are two structures that are relevant:

  • We need to keep track of the term structure, for example, the fact that \(f(a, b)\) is a subterm of \(f(f(a, b), b)\).

  • We need to keep track of the equivalence classes at each stage of the algorithm.

Every time a merge identifies the classes represented by s and t respectively, we should consider each subterm u that has an immediate subterm equivalent to s and each subterm v that has an immediate subterm equivalent to t, and ask whether u and v should be identified by the congruence rule; in other words, whether u and v have the same function symbol and the arguments have, pairwise, been identified as equivalent. In the example above, when we merge the equivalence classes of \(f(a, b)\) and \(a\), we need to check whether \(f(f(a, b), b)\) should be identified with \(f(a, b)\), because the first has \(f(a, b)\) as an immediate subterm and the second has \(a\) as an immediate subterm. One option is to iterate through all pairs of subterms in the original problem, but this is inefficient. Instead, we can maintain, for each class representative s, a list of all subterms that have an immediate subterm in the same class as s. These are called the predecessors of s. Whenever we merge classes with representatives s and t, we check each pair consisting of a predecessor of s and a predecessor of t to see if they should be merged, according to the congruence closure rule, and the predecessors of the representative of the new class should be set to the union of the previous predecessors of s and t.

In the course repository for this textbook, there is an implementation of the union-find data structure in LAMR/Util/DisjointSet.lean, and there is an implementation of the congruence closure algorithm in the file CongruenceClosure.lean associated with this chapter. The latter closely follows the implementation in Harrison’s book, Handbook of Practical Logic and Automated Reasoning. Zohar and Manna’s book, The Calculus of Computation, is another good reference.

12.3. Deciding universal sentences

Remember that the following problems are all intertranslatable:

  • Given a universal sentence \(A\), is \(A\) valid, that is, true in every model? (Equivalently: if \(A\) provable?)

  • Given a quantifier-free formula \(A\), does \(\models_{\mdl{M}, \sigma} A\) hold for every model \(\mdl M\) and every variable assignment \(\sigma\)?

  • Given a quantifier-free formula \(A\), does \(\models_{\mdl{M}, \sigma} A\) hold for some model \(\mdl M\) and variable assignment \(\sigma\)?

  • Given an existential sentence \(A\), is \(A\) satisfiable, that is, true in some model?

In particular, a universal formula \(\fa {\vec x} A\) is valid if and only if \(A\) holds for every model and every variable assignment, which happens if and only if \(\lnot A\) never holds for any model and variable assignment, which happens if and only if \(\ex {\vec x} \lnot A\) is not satisfiable. Notice that in each case we are asking questions about pure first-order logic, without any axioms \(\Gamma\).

Remember that in first-order logic, the atomic formulas include equations \(s = t\) and formulas \(R(t_1, \ldots, t_n)\), where \(R\) is a relation symbol. The set of literals include the negations of those as well. It is not hard to extend congruence closure to an algorithm to determine the satisfiability of any finite set of literals. To start with, we add the following rule to our equational proof system:

\[ \begin{prooftree} \AXC{$s_1 = t_1$} \AXC{$\ldots$} \AXC{$s_n = t_n$} \AXC{$R(s_1, \ldots, s_n)$} \QuaternaryInfC{$R(t_1, \ldots, t_n)$} \end{prooftree} \]

Suppose \(\Gamma\) is a set of literals. To test the satisfiability of \(\Gamma\), we do not have to change much in the previous algorithm. Using the congruence rule for relations, whenever we have derived \(R(s_1, \ldots, s_n)\) and we have also derived equations \(s_i = t_i\) for every \(i\), we can conclude \(R(t_1, \ldots, t_n)\). The algorithm terminates when we contradict a disequality or another negated atomic formula. If the algorithm terminates without a contradiction, we build a model as before, where we simply declare that \(R^{\mdl M}([t_1], \ldots, [t_n])\) holds if and only if we have determined that \(R(t_1, \ldots, t_n)\) in a consequence of the original set. Another way to think about the algorithm is that we can replace each atomic formula \(R(t_1, \ldots, t_n)\) by an equation \(f_R(t_1, \ldots, t_n) = \top\) and each negated atomic formula \(\lnot R(t_1, \ldots, t_n)\) by a disequation \(f_R(t_1, \ldots, t_n) \ne \top\) and run the usual congruence closure algorithm on that.

Now suppose we are given an existential sentence \(\ex {x_1, \ldots, x_n} A\) where \(A\) is quantifier-free, and suppose we want to determine whether it is satisfiable. Write \(A\) in disjunctive normal form, that is, as a disjunction \(A_1 \lor \cdots \lor A_n\) of conjunctions of literals. Then \(\ex {x_1, \ldots, x_n} A\) is satisfiable if and only if one of the formulas \(A_1, \ldots, A_n\) is satisfied by some model \(\mdl M\) and variable assignment \(\sigma\). That reduces the task to determining whether a conjunction of literals is satisfiable, and we have just explained how to do that.

Since a sentence is valid if and only if its negation is satisfiable, and since the negation of a universal sentence is an existential sentence, we have shown the following.

Theorem

The validity of universal sentences in pure first-order logic is decidable. Equivalently, the satisfiability of existential sentences is decidable.