Introduction

The P vs. NP problem is undoubtedly the most important open problem in the theory of computer science today. Loosely speaking, it can be stated as follows: \(\def\poly{\text{poly}}\)

Is there an efficient algorithm that solves the satisfiability problem?

In this context, an algorithm is said to be efficient if its worst-case running time is bounded by a polynomial in the input size. While most researchers feel that the answer to the question is “no”, this feeling is not backed up with a rigorous mathematical proof, and in fact, the field seems to be mysteriously far away from such a proof.

In this course, we will focus on a more subtle and more modest question:

Is there an algorithm that’s faster than exhaustive search?

We will answer this question affirmatively for different variants of the satisfiability problem. We will also be stunned by the beauty of theoretical computer science when we encounter this amazing (and informally phrased) mantra: Faster-than-exhaustive-search algorithms for the satisfiability problem can actually provide evidence that there is no polynomial-time algorithm for the satisfiability problem! Finally, we will also hypothesize about possible limitations that algorithms for the satisfiability problem may face.

Boolean circuits

Computation can be modeled by Boolean circuits. A Boolean circuit is a directed acyclic graph \(C=(V,E)\) whose vertices are called gates and whose directed edges are called wires. Input gates are each labeled by a variable \(x_i\) and do not have any incoming wires, NOT-gates have exactly one incoming wire that they compute the negation of, AND-gates have two or more incoming wires that they compute the logical AND of, and we may also allow other gates to be present, such as OR-gates. Finally, each gate that does not have any outgoing wires is an output gate; we generally assume there to be only one output gate.

A Boolean function is a function \(f:\{0,1\}^n\to\{0,1\}\). Let \(C\) be a Boolean circuit on the variables \(x_1,\dots,x_n\). The circuit computes a Boolean function, which we also denote by \(C:\{0,1\}^n\to\{0,1\}\), in the following way: For any particular input \(x\in\{0,1\}^n\), the input gates are initialized with the given bits, then the information moves through the acyclic graph, bits interact according to the specifications of the individual gates, and the bit that ends up at the output gate is taken to be the value \(C(x)\) of the function computed by the circuit.

The satisfiability problem

A Boolean circuit \(C\) is satisfiable if there is a vector \(x\in\{0,1\}^n\) such that \(C(x)=1\). The satisfiability problem is the decision problem in which we want to decide whether a given Boolean circuit \(C\) is satisfiable. We may denote this problem by Circuit-SAT or simply SAT.

There is an \(O(2^n \cdot m \cdot \poly\log m)\)-time algorithm for the satisfiability problem: For each \(x\in\{0,1\}^n\), we check whether \(C(x)=1\) holds. The latter check can be done with an algorithm that runs in time polynomial in the number \(m=|C|\) of wires, and in fact, it can be done in quasilinear time \(O(m \cdot \poly\log m)\).

It is open whether there is an algorithm for Circuit-SAT that is significantly faster than exhaustive search. In this course, we study restricted classes of circuits for which such algorithms are known. Ideally, we would like to find algorithms of the form \(O(c^n)\) for some constant \(c\in(1,2)\); however, even algorithms that achieve a seemingly tiny improvement over exhaustive search, with running times such as \(O(2^n / m^{10})\), will turn out to have profound implications.

3-CNF-SAT

A Boolean formula \(F\) is a Boolean circuit in which all non-input gates have at most one outgoing wire. Such a formula is in conjunctive normal form (CNF) if

In other words, \(F\) can be written as \(F=C_1\wedge C_2 \wedge \dots \wedge C_m\) where each \(C_i\) is a clause of the form \(C_i= (\ell_{i1}\vee\dots\vee\ell_{id})\). When the length of each clause is exactly \(d=3\), then \(F\) is a 3-CNF formula.

Randomized local search for 3-SAT

We now turn to a randomized algorithm by Schöning (2002), which solves 3-CNF-SAT in time \(O(1.334^n)\). In order to achieve this significant improvement over exhaustive search, we use some kind of gradient descent to find the satisfying assignment. Our measure for how “close” our current candidate is to a solution uses the Hamming distance \(d_H\) between two strings \(x,y\in\{0,1\}^n\), which is defined as follows:

\[d_H(x,y) = \#\big\{ i\in\{1,\dots,n\} \,\colon\, x_i \neq y_i \big\}\,.\]

The local search algorithm receives as input a 3-CNF formula \(F\) and a current candidate solution \(a\in\{0,1\}^n\). The algorithm will only ever declare that the formula \(F\) is satisfiable if it successfully found a satisfying assignment; therefore we can focus now on the case in which there exists a satisfying assignment \(a^*\in\{0,1\}^n\) for the formula \(F\), that is, we have \(F(a^*)=1\).

The main idea for the local search algorithm is to iteratively flip individual bits of \(a\) in order to find the global minimum of the cost function \(a\mapsto d_H(a,a^*)\). This minimum is achieved when \(d_H(a,a^*)=0\), which means that \(a=a^*\). There are two problems with this approach:

  1. The cost function \(d_H(a,a^*)\) is far from convex, and so the algorithm may get stuck in a local minimum.
  2. We don’t know at all how \(a\mapsto d_H(a,a^*)\) could possibly be computed efficiently since the algorithm doesn’t know \(a^*\) in advance.

To overcome the first problem, we will restart the local search several times, and each time we will start from a uniformly chosen assignment \(a\in\{0,1\}^n\). For the second problem, we make the following critical observation.

Observation. Let \(F=C_1\wedge\dots\wedge C_m\) be a 3-CNF formula, let \(a^*\) be a satisfying assignment of \(F\), and let \(a\) be an assignment with \(F(a)=0\). Then there exists an unsatisfied clause \(C_i\), that is, we have \(C_i(a)=0\). Since \(C_i=(\ell_{i1}\vee\ell_{i2}\vee\ell_{i3})\), this implies \(\ell_{i1}=\ell_{i2}=\ell_{i3}=0\) in the assignment \(a\). On the other hand, since \(a^*\) is a satisfying assignment, we have \(\ell_{i1}=1\) or \(\ell_{i2}=1\) or \(\ell_{i3}=1\) for \(a^*\). This implies that we can flip in \(a\) the value of one of the literals \(\ell_{i1}\), \(\ell_{i2}\), or \(\ell_{i3}\) in order to decrease the Hamming distance between \(a\) and \(a^*\)! We don’t know which literal we should flip, but one of the three choices will bring us closer to \(a^*\).

With this observation in mind, we can now formulate the main idea of the randomized local search algorithm: We simply select one of the three literals uniformly at random and flip its value in \(a\). Before we think about how likely it is that we will actually find a solution, and how long it will take, let us formally state the local search algorithm LocalSearch\((F,a)\):

As you can see, we loop at most \(3n\) times; the rationale for this is that we assume that we are stuck in a local minimum if we haven’t found a solution within \(3n\) steps.

Due to the cut-off after iteration \(3n\), the running time of LocalSearch is polynomial. However, since 3-CNF-SAT is NP-complete, we don’t think that 3-CNF-SAT has a polynomial-time algorithm, not even one that is randomized and has a large success probability. Therefore, we expect the probability \(p\) that LocalSearch succeeds to be exponentially small. In fact, we will prove that it is roughly \((3/4)^n\).

Lemma. Let \(F\) be a fixed 3-CNF formula with a satisfying assignment \(a^*\). We choose the initial \(a\) uniformly at random from \(\{0,1\}^n\). Then we have:

\[p:=\Pr( {\sf LocalSearch} (F,a) = 1) \geq (3/4)^n / \poly(n)\,.\]

Proof. Let \(X_t = d_H(a_t,a^*)\) where \(a_t\) is the value of the variable \(a\) after iteration \(t\). Due to the initial random choice of \(a_0\) and the random choices throughout the algorithm, \(a_t\) and \(X_t\) are random variables. Note that the expectation of \(X_0\) is \(E[X_0]=n/2\) and that \(X_0\) is binomially distributed, that is, we have

\[\Pr( X_0=j ) = 2^{-n} \binom{n}{j}\,.\]

By the observation that we made above, we have

\[\Pr( X_{t+1} = X_t - 1 \;|\; X_t>0 ) \geq \frac{1}{3}\]

and

\[\Pr( X_{t+1} = X_t + 1 \;|\; X_t>0 ) \leq \frac{2}{3}\,.\]

Note that, because the latter inequality is an upper bound on the probability, it holds regardless of the conditioning. For simplicity, we analyze the random process where the two inequalities hold with equality:

\[\Pr( X_{t+1} = X_t - 1 \;|\; X_t>0 ) = \frac{1}{3}\]

and

\[\Pr( X_{t+1} = X_t + 1 \;|\; X_t>0 ) = \frac{2}{3}\,.\]

Exercise. Explain why we can do this simplification and where we are using it below.

We want to lower bound the probability

\[ p=\sum_{t=0}^{3n}\Pr( X_{t} = 0 \;|\; \forall k<t.X_k>0 )\,. \]

For this, we condition on \(X_0=j\) in each term, which yields

\[ \begin{align*} p&=\sum_{t=0}^{3n}\sum_{j=0}^n \Pr(X_0=j)\cdot \Pr( X_{t} = 0 \;|\; X_0=j \wedge \forall k<t.X_k>0 )\\ &=\sum_{t=0}^{3n}\sum_{j=0}^n 2^{-n}\binom{n}{j}\cdot q(t,j)\\ &=\sum_{j=0}^n 2^{-n}\binom{n}{j}\cdot \sum_{t=0}^{3n} q(t,j) \end{align*} \] Here we defined \(q(t,j)\) as the probability that, for a fixed \(a\) with \(d_H(a,a^*)=j\), the algorithm LocalSearch finds the solution \(a^*\) after \(t\) iterations and not any earlier. As an example, consider \(q(j,j)\): The only way in which LocalSearch could find the solution after \(j\) steps is to decrement the distance from \(a\) to \(a^*\) in every iteration; this happens with probability \(q(j,j) = 3^{-j}\) as the probability of decrementing is exactly \(1/3\) in each iteration independently. For the general case, note that the distance from \(a\) to \(a^*\) must be decremented in \(j+(t-j)/2\) iterations and incremented in \((t-j)/2\) iterations, and it must always be bigger than \(0\). Thus we have \(q(t,j)= (\frac{1}{3})^{(t+j)/2} (\frac{2}{3})^{(t-j)/2} \cdot Q(t,j)\), where \(Q(t,j)\) is the number of such sequences.

Exercise. Let \(t,j\in\mathbf{N}\). Prove that the number \(Q(t,j)\) of sequences \(s\in\{-1,1\}^t\) such that \(j + \sum_{i=1}^k s_i>0\) for all \(k<t\) and \(j + \sum_{i=1}^t s_i=0\) hold is \(\binom{t}{(t-j)/2} \cdot \frac{j}{t}\).

Exercise. Is the corresponding section on wikipedia wrong? If so, fix it.

We now continue our estimation of \(p\) by first proving \(\sum_t q(t,j) \sim 2^{-j}\), where \(\sim\) indicates equality up to factor that is polynomial in \(t\):

\[ \begin{align*} \sum_{t=0}^{3n} q(t,j) &= \sum_{t=j}^{3n} q(t,j)\\ &\sim \sum_{t\geq j} \binom{t}{(t-j)/2} \frac{2^{(t-j)/2}}{3^{(t+j)/2 + (t-j)/2}}\\ &=\sum_{t\geq j} \binom{t}{\alpha t} \cdot 2^{\alpha t} \cdot 3^{-t}\\ &\sim \max_{t\geq j} \binom{t}{\alpha t} \cdot 2^{\alpha t} \cdot 3^{-t}\\ &\sim \max_{t\geq j} 2^{H(\alpha)t} \cdot 2^{\alpha t} \cdot 3^{-t}\,. \end{align*} \]

Here we wrote \(\alpha = \frac{1}{2} - \frac{j}{2t}\) and used the following important fact about the binomial coefficient:

Exercise. Use Stirling’s approximation of \(n!\) to prove that \(\binom{t}{\alpha t} \sim 2^{H(\alpha) t}\) holds for all \(\alpha\in(0,\frac{1}{2})\) and \(t\in\mathbb N\), where \(H(\alpha)\) is the binary entropy of \(\alpha\):

\[H(\alpha) =-\alpha\log \alpha - (1-\alpha)\log (1-\alpha)\,.\]

By setting \(\frac{\partial}{\partial t} (H(\alpha) t + \alpha t - t)= 0\), we find the value of \(t\) for which the maximum is achieved, and we find that the value of the maximum is \(\sim 2^{-j}\)

Exercise. Perform this calculation to find the maximum.

Finally, for our estimate for \(p\), we get

\[ \begin{align*} p &= 2^{-n} \cdot \sum_{j=0}^n \binom{n}{j} \sum_t q(t,j)\\ &\sim 2^{-n} \cdot \sum_{j=0}^n \binom{n}{j} 2^{-j} \cdot 1^{n-j}\\ &= 2^{-n} \cdot (\frac{1}{2}+1)^n\\ &= (3/4)^n\,. \end{align*} \]

This finishes the proof of the lemma.

Probability amplification

We conclude with a general technique. Let \(A\) be a randomized algorithm for SAT such that \(A\) succeeds to find a satisfying assignment with probability \(p>0\). Then we can boost the success probability by repeating \(A\) several times, each time with independent random bits. Let us denote the new algorithm by \(A^r\), where \(r\in\mathbf N\) is the number of repetitions. Of course, if \(A\) has worst-case running time \(T\), then the running time of \(A^r\) is \(O(r\cdot T)\).

We make the following observation about the success probability: \(A^r\) fails to find a satisfying assignment if and only if all runs of \(A\) fail to do so. This happens with probability \((1-p)^r\leq e^{-p r}\), where we used the important inequality \(1+x\leq\exp(x)\).

Exercise. Prove \(1+x \leq \exp(x)\) using a suitable definition of \(\exp(x)\).

To boost the success probability to at least \(1-e^{-20}\), we set \(r=20/p\).

Theorem. There is a randomized algorithm for 3-CNF-SAT that has running time \((4/3)^n\cdot \poly(n)\) and success probability \(1-e^{-20}\).