.. highlight:: lean

.. _chapter_using_sat_solvers:

Using SAT Solvers
=================

A satisfiability (SAT) solver determines whether a propositional formula has a satisfying assignment.
The performance of SAT solvers has improved significantly in the last two decades. In the late 1990s,
only formulas with thousands of variables and thousands of clauses could be solved. Today,
many propositional formulas with millions of variables and millions of clauses can be solved.
In this chapter, we will explain how to use SAT solvers and how to encode problems into
propositional logic.

.. _section_first_examples:

First examples
--------------

Remember that contemporary SAT solvers determine that satisfiability of propositional formulas
in conjunctive normal form. Specifically, they use a format for specifying such formulas known
as the *DIMACS* format. Our *LAMR* library proves a function that converts any CNF formula
to that format, sends it to a SAT solver called *CaDiCaL*, and parses the answer.
The following can be found in `Examples/using_sat_solver/examples.lean`:

.. literalinclude:: ../../LAMR/Examples/using_sat_solvers/examples.lean
  :start-after: -- textbook: SAT example
  :end-before: -- end: SAT example

It uses the same example CNF formulas defined in :numref:`section_implementing_propositional_normal_forms`.
You can change `exCnf0` to `exCnf1` or `exCnf2`,
or use any other CNF formula you want.
If you uncomment the two lines that begin `IO.println`,
the Lean output will show you the raw output from CaDiCaL.


.. _section_encoding_problems:

Encoding problems
-----------------

All NP-complete problems can be transformed in polynomial time into a SAT problem
(i.e., into a propositional formula). For many problems, such a transformation
is quite natural. For some other problems, the transformation can be complicated.
The transformation is not unique. Frequently there exist many way to encode a
problem as a propositional formula. The encoding can have a big impact on the
runtime of SAT solvers. Generally, the smallest encoding for a problem (in terms
of the number of variables and the number of clauses) results in relatively strong
performance. In this section we will describe a few encodings.

One way to encode a problem into propositional logic is to describe it first using some high-level constraints.
Let's consider a couple of high-level constraints: Constrain a sequence of literals such that
at least one of them is true (`atLeastOne`), or that at most one of the is true (`atMostOne`),
or that an odd number of them is true (`XOR`). Afterwards these constraints are encoded
into propositional logical to obtain, so a SAT solver can be used to solve the resulting
formula.

How to encode `atLeastOne`, `atMostOne`, and `XOR` as a set of clauses? The constraint
`atLeastOne` is easy: simply use the disjunction of all the literals in the sequence. The second
constraint is requires multiple clauses. The naive way generates a quadratic number of
clauses: for each pair of literals :math:`(l_i, l_j)` in the sequence, include the clause :math:`\lnot l_i \lor \lnot l_j`.
The naive way of encoding the `XOR` constraint results in an exponential number of clauses:
all possible clauses over the literals such that an odd number of them are negated. For example,
the encoding of `XOR`:math:`(l_1,l_2,l_3)` produces the following clauses:
:math:`l_1 \lor l_2 \lor \lnot l_3, l_1 \lor \lnot l_2 \lor l_3, \lnot l_1 \lor l_2 \lor l_3, \lnot l_1 \lor \lnot l_2 \lor \lnot l_3`

Although a quadratic number of clauses produced by can be acceptable `atMostOne` for a
reasonable small sequence of literals, the exponential number of clauses produced by `XOR`
would result in formulas that are hard to solve solely due to the size of the formula.
Fortunately, one can encode both `atMostOne` and `XOR` using a linear number of
clauses using the following trick: In case the sequence consists of more than four literals,
split the constraint into two such that the first uses the first three literals of the sequence appended by a new literal :math:`y`, while
the second uses the remainder of the sequence appended by the literal :math:`\lnot y`.
For example, `atMostOne`:math:`(l_1,\dots,l_n)` is split into `atMostOne`:math:`(l_1,l_2,l_3,y)`
and `atMostOne`:math:`(l_4,\dots, l_n,\lnot y)`. The splitting is repeated until none of the
constraints has a sequence longer than four.

Another approach to encode a problem into propositional logic is to express it
first as another NP-complete problem and afterwards transform the result into
propositional logic. Let's demonstrate this approach for graph coloring.
The graph coloring problem asks whether a given graph can be colored with
a given number of colors such that adjacent vertices have different colors. Graph coloring
problems can be easily encoded into a propositional formula, and SAT solvers
can frequently solve these formulas efficiently.

Given a graph :math:`G=(V,E)` and :math:`k` colors, the encoding uses :math:`k|V|` Boolean
variables :math:`x_{i,j}` with :math:`i \in \{1,\dots,|V|\}` and :math:`j \in \{1,\dots,k\}`
and :math:`|V| +k|E|` clauses.
If :math:`x_{i,j}` is assigned to true it means that vertex :math:`i` is assigned color :math:`j`.
The clauses encode two constraints: each vertex has a color and adjacent vertices have
a different color. The first constraint can be encoded using a single clause per vertex. For example,
for vertex :math:`i`, we have the following clause: :math:`x_{i,1} \lor \dots \lor x_{i,k}`.
The second constraint requires :math:`k`
binary clauses. For example, for an edge between vertices :math:`h` and :math:`i`, we have the
following binary clauses: :math:`(\lnot x_{h,1} \lor \lnot x_{i,1}) \land \dots \land (\lnot x_{h,k} \lor \lnot x_{i,k})`.

The CNF formulas for a triangle (a fully connected graph with three vertices) for two colors and
three colors is shown below. The first one is unsatisfiable, while the second one is satisfiable.

.. code-block:: lean

  def triangleCnf2 := !cnf{
    x11 x12,
    x21 x22,
    x31 x32,
    -x11 -x21, -x12 -x22,
    -x11 -x31, -x12 -x32,
    -x21 -x31, -x22 -x32
  }

  def triangleCnf3 := !cnf{
    x11 x12 x13,
    x21 x22 x23,
    x31 x32 x33,
    -x11 -x21, -x12 -x22, -x13 -x23,
    -x11 -x31, -x12 -x32, -x13 -x33,
    -x21 -x31, -x22 -x32, -x23 -x23
  }

Many problems, such as scheduling and planning, can naturally be expressed as a graph coloring problem. We can then transform the graph coloring
problem into a SAT problem using the encoding described above.

An example of a problem that can be expressed as a graph coloring problem is the popular puzzle Sudoku:
Place number is a grid consisting of nine squares subdivided into a further nine smaller squares in such a way that every
number appears once in each horizontal line, vertical line, and square. This puzzle can be seen as a graph coloring
where each small square is a vertex, and vertices are connected if and only if the corresponding small squares occur
the the same horizontal line, vertical line, or square. Below is one of the hardest sudoku puzzles with only 17 given numbers.

.. image:: ../figures/sudoku.*
  :width: 200 px
  :alt: sudoku puzzle
  :align: center

If you take a look at the file `sudoku.lean` in the `Examples` folder,
you will see that it is easily solved by a SAT solver.

Exercise: grid coloring
-----------------------

Ramsey Theory deals with patterns that cannot be avoided indefinitely. In
this exercise we focus on a pattern of coloring a :math:`n \times m` grid with :math:`k` colors:
Consider all possible rectangles within the grid whose length and width are
at least 2. We try to color the grid using :math:`k` colors so that no such rectangle
has the same color for its four corners. When this is possible, we say that
the :math:`n \times m` grid is :math:`k`-colorable while avoiding monochromatic rectangles. When
using :math:`k` colors, it is relatively easy to construct a valid :math:`k^2 \times k^2` grid.
However, only few valid grids that are larger than :math:`k^2 \times k^2` are known. An
example of a valid :math:`3`-coloring of the :math:`9 \times 9` grid is shown below.

.. code-block:: none

    0 0 1 1 2 2 0 1 2
    2 0 0 1 1 2 2 0 1
    1 2 0 0 1 1 2 2 0
    0 1 2 0 0 1 1 2 2
    2 0 1 2 0 0 1 1 2
    2 2 0 1 2 0 0 1 1
    1 2 2 0 1 2 0 0 1
    1 1 2 2 0 1 2 0 0
    0 1 1 2 2 0 1 2 0

.. previous text
  Step 1. Encode whether there exists a coloring of the grid using three colors
  so that no such rectangle has the same color for its four corners. The
  encoding requires two types of constraints. First, each square needs to have
  at least one color. Second, if four squares form the corners of a rectangle,
  then they cannot have the same color.

  Step 2. Solve the encoding for a :math:`10 \times 10` grid using a SAT solver and decode the
  solution into a valid coloring. Show the output of the SAT solver and a valid
  3-coloring similar to the one above of the :math:`9 \times 9` grid.

  Note that any valid coloring can be turned into another valid coloring by
  permuting the rows, columns, or colors. However, such valid colorings are
  isomorphic.

  Step 3. Use the tool Shatter to (partially) break the symmetries of the
  encoding in Step 1 and count the number of solutions of the resulting
  formula.

Write a Lean function that takes as input three natural numbers
:math:`n`, :math:`m`, and :math:`k`, which returns a CNF formula of
Lean data type ``CnfForm`` which is satisfiable if an only if there
exists a valid :math:`k`-coloring of the :math:`n \times m` grid, that is,
a coloring without monochromatic rectangles. (Hint: The
encoding requires two types of clauses. First, each square needs to have one color. Second, if four squares form the corners of a rectangle,
then they cannot have the same color.)

Next, use the Lean interface to CaDiCaL to solve the formula
with :math:`n=10`, :math:`m=10`, and :math:`k=3` and the formula with :math:`n=9`, :math:`m=12`, and :math:`k=3`.
Both formulas should be satisfiable. The answer should consist of two lists (one for each formula)
using Lean data type ``List Lit`` containing only all positive literals assigned to true (:math:`\top`).

Finally, given a ``List Lit`` containing only all positive literals assigned to true
for a grid-coloring problem, decode it into a grid of numbers similar to the :math:`9 \times 9` grid shown above.
Use the function to display the solutions of :math:`n=10`, :math:`m=10`, and :math:`k=3` and of :math:`n=9`, :math:`m=12`, and :math:`k=3`.
You can assume that the decoding function knows :math:`n`, :math:`m`, and :math:`k` of the grid-coloring problem.


Exercise: NumberMind
--------------------

The game Number Mind is a variant of the well known game Master Mind.
Instead of colored pegs, you have to guess a secret sequence of digits.
After each guess you're only told in how many places you've guessed the
correct digit. So, if the sequence was 1234 and you guessed 2036, you'd
be told that you have one correct digit; however, you would *not* be told
that you also have another digit in the wrong place.

For instance, given the following guesses for a 5-digit secret sequence,

.. code-block:: none

    90342 ;2 correct
    70794 ;0 correct
    39458 ;2 correct
    34109 ;1 correct
    51545 ;2 correct
    12531 ;1 correct

the correct sequence 39542 is unique.

Given the following guesses,

.. code-block:: none

    5616185650518293 ;2 correct
    3847439647293047 ;1 correct
    5855462940810587 ;3 correct
    9742855507068353 ;3 correct
    4296849643607543 ;3 correct
    3174248439465858 ;1 correct
    4513559094146117 ;2 correct
    7890971548908067 ;3 correct
    8157356344118483 ;1 correct
    2615250744386899 ;2 correct
    8690095851526254 ;3 correct
    6375711915077050 ;1 correct
    6913859173121360 ;1 correct
    6442889055042768 ;2 correct
    2321386104303845 ;0 correct
    2326509471271448 ;2 correct
    5251583379644322 ;2 correct
    1748270476758276 ;3 correct
    4895722652190306 ;1 correct
    3041631117224635 ;3 correct
    1841236454324589 ;3 correct
    2659862637316867 ;2 correct

find the unique 16-digit secret sequence, as follows.

First, write a Lean function that takes as input a ``List Lit`` and a natural number :math:`k`
and returns a propositional formula of Lean data type ``CnfForm`` that is satisfiable if and
only if exactly :math:`k` literals of list are satisfied. Use propositional variables :math:`p_{i, j}` which are
true if and only if position :math:`i` has the digit :math:`j`. No other variables are required.
(Hint: The easiest way to encode that exactly :math:`k` literals are true is by splitting it in at-least
:math:`k` are true and at-most :math:`k`` are true. No auxiliary propositional variables are required.)

Then write a Lean program that can solve Number Mind puzzles. The input is a list of
exactly-:math:`k` constraints with a constraint being a list of literals and a number :math:`k`. The meaning
of a constraint is that exactly :math:`k` of the literals are true. The program should
return a formula of Lean data type ``CnfForm`` that is satisfiable if and only if
all constraints are satisfied. Note that Number Mind puzzles require i) constraints enforcing how many
digits are correct *and* ii) constraints enforcing that for each position exactly one digit can be
present in a solution.

Finally, run the program to solve the Number Mind puzzle shown above with 16 digits
using the CaDiCaL interface. Transform the solution into a sequence of digits.

The solution of the Number Mind puzzle shown above is claimed to be unique.
use CaDiCaL to show that this claim is correct.


.. previous text
  Find the unique 16-digit secret sequence.

  Step 1. Encode finding the correct sequence as a SAT problem.
  Use Boolean variables :math:`x_{i,j}` which are true if and only if
  at position i there is the digit j. The encoding consists of
  two parts: i) at each position there is exactly one digit; and
  ii) the correct number of digits from each line is matched.
  The encoding should *only* use these :math:`x_{i,j}` variables.

  Step 2. Show that the correct sequence is unique. Which clause
  do you need to add to the encoding of Step 1?

  Step 3. Reduce the size of the encoding by replacing the
  cardinality constraints from ii) in Step 1 by using the
  Sinz encoding.

..
  Exercise: Strassen
  ------------------

  In linear algebra, the Strassen algorithm, named after Volker Strassen, is
  an algorithm for matrix multiplication. It is faster than the standard
  matrix multiplication algorithm and it is useful in practice for large matrices.

  Naive matrix multiplication of two :math:`n \times n` matrices requires :math:`n^3` operations.
  Strassen observed that this is not optimal. More specifically, he showed
  how to multiply two :math:`2 \times 2` matrices using :math:`7` multiplications (instead of :math:`8`).
  His construction can be used for larger matrices as well. In this exercise,
  we going to reproduce Strassen's construction using satisfiability solving.


  TODO: Explain the Brent equations

  Step 1. Encode the Brent equations for :math:`2 \times 2` matrices as a SAT problem.
  The Tseitin transformation is required to limit the size of the encoding.

  Step 2. Decode the solution from the SAT formula to a multiplication
  scheme.

  The scheme obtained from solving the Brent equations only holds for Boolean
  matrices, while Strassen's construction hold for integer matrices as well.

  Step 3. Prove in Lean that the scheme from Step 2 is also a valid scheme
  for integer matrices.