Recap

Last time, we have seen a randomized polynomial-time algorithm for 3-CNF-SAT that has success probability roughly \(p \sim (3/4)^n\). The algorithm was based on randomized local search. In particular, if the formula is unsatisfiable, the algorithm will correctly report this fact, and if it is satisfiable, the algorithm will find a satisfying assignment with probability \(p\). We then used a general strategy for turning a polynomial-time algorithm with exponentially small success probability into an exponential-time algorithm with large success probability: By repeating the algorithm \(t=\ln (1/\epsilon) / p\) times, we can reduce the error probability to \((1-p)^t \leq \exp(-pt) = \epsilon\). Note that even if we aim for a negligibly small error probability \(\epsilon=2^{-n^2}\), the running time of the amplified algorithm is still only dominated by \(t\sim 1/p\sim (4/3)^n\) up to factors that are polynomial in \(n\).

The algorithm for \(3\)-CNF-SAT can be extended to \(k\)-CNF-SAT in a straightforward manner. The running time obtained after probability amplification becomes \((2-\frac{2}{k})^n\).

The Strong Exponential Time Hypothesis

Let \(L\subseteq \{0,1\}^*\) be a language, which is equipped with a structural parameter \(n\in\{0,1\}^*\to\mathbf N\); in the case of \(3\)-CNF-SAT, we usually use the parameter \(n=n(F)\) to be the number of variables, and we have an additional parameter \(m=m(F)\), which is the number of clauses. Note that the number of bits needed to encode a \(3\)-CNF formula is \(O(m \cdot \log n)\) or \(O(n^3)\), whichever is smaller – the first encoding is used when the formula is sparse and the second encoding is used when the formula is very dense.

Exercise. Describe the two encodings.

An algorithm has growth rate \(c\) (with respect to \(n\)) if its worst-case running time is bounded from above by a function of the form \(c^n \cdot \text{poly}(m)\). The \(k\)-CNF-SAT algorithm we described in the previous lecture has growth rate \(c_k=2-\frac{2}{k}\). Note that \(\lim_{k\to\infty} c_k = 2\) holds for this particular sequence of algorithms. It is an important open problem whether this phenomenon is inherent to the complexity of \(k\)-CNF-SAT, or whether there exist algorithms that can do better. For lack of progress on this question, Imagliazzo and Paturi (2001) have introduced the Strong Exponential Time Hypothesis (SETH). For this, they define \(c_k\) to be the smallest growth rate that any algorithm for \(k\)-CNF-SAT can achieve:

\[c_k := \inf\Big\{ c \;\Big|\; k\text{-CNF-SAT has an algorithm of growth rate } c \Big\}\,.\]

SETH is the hypothesis that \(\lim_{c_k\to\infty} c_k = 2\) holds. In contrast, the (“weak”) Exponential Time Hypothesis (ETH) states that \(c_3>1\), that is, that the complexity of \(3\)-CNF-SAT has to be strictly exponential.

Exercise. As far as we know, why does ETH not follow directly from the hypothesis P\(\neq\)NP?

We will prove in a later lecture the non-trivial fact that SETH implies ETH. Note that one has to be a bit careful since one may obtain different hypotheses based on whether one allows the algorithms to be randomized or not.

Deterministic Local Search for \(3\)-CNF-SAT

Despite the fact that the error probability of the randomized local search algorithm can be made negligible, it is still interesting to consider whether there is an algorithm that doesn’t use any randomness at all. It turns out such an algorithm indeed exists; however, the deterministic algorithm we get has a slightly larger growth rate \(3/2\) for \(k=3\), and \(2-\frac{2}{k+1}\) for general \(k\).

Recall that deterministic local search starts with some assignment, finds an unsatisfied clause, and randomly flips one of the three variables it contains. The idea for the deterministic algorithm is to perform this choice deterministically by branching on the three cases exhaustively:

Algorithm. Branch\((F, a)\)

Trivially, the algorithm Branch runs in time \(\sim 3^r\).

Exercise. \(F\) has a satisfying assignment at distance at most \(r\) from \(a\) if and only if Branch\((F,a)\) finds it in at least one of its branches.

To get a simple deterministic algorithm running in time \(c^n\) for some \(c<2\), note that every satisfying assignment has Hamming distance at most \(r=n/2\) to either \(a_1=0^n\) or \(a_2=1^n\). Therefore, we can solve \(3\)-CNF-SAT in time \(2\cdot 3^r \sim \sqrt{3}^n\).

In general, let \(A\subseteq\{0,1\}^n\) so that every point of \(\{0,1\}^n\) is at distance at most \(r\) from at least one point in \(A\). Such a set \(A\) is called a covering code of radius \(r\). The running time of our deterministic \(3\)-CNF-SAT algorithm becomes \(|A| \cdot 3^r\). For each \(r\in[n] := \{1,\dots,n\}\), we will choose a set \(A\) that is essentially as small as possible. To get the best possible algorithm, we will then choose \(r\) as to minimize the running time.

As happens often in coding theory, random sets yield codes that have good parameters in expectation. In the following lemma, we use a probabilistic argument to prove the existence of good covering codes. We will later see how to construct a good covering code deterministically. The parameters obtained are essentially optimal, i.e., no code can achieve the same covering radius with a much smaller size.

Lemma. Let \(r\leq n\). There exists a covering code \(A\) of radius \(r\) and size \(\vert A \vert \sim 2^{(1-H(r/n))n}\).

Proof. The proof uses the probabilistic method. Let \(A=\{a_1,\dots,a_t\}\) with \(t\sim 2^{(1-H(r/n)) n}\) where we sample all \(a_i\in_U \{0,1\}^n\) uniformly at random. We compute the radius of the code: Let \(x\in\{0,1\}\) and \(i\in[t]\) be fixed. The probability that \(a_i\) is at distance at most \(r\) from \(x\) is

\[ q:=\Pr_{a_i}[d_H(x,a_i) \leq r]= 2^{-n} \cdot \sum_{j=0}^r \binom{n}{j} \sim 2^{(H(r/n)-1)n} \]

where the estimate for the binomial coefficient follows from an exercise in Lecture 1. Then the probability that \(A\) is not a covering code is

\[ \begin{align*} &\Pr_{A}[\exists x\in\{0,1\}^n. \forall a_i \in A. d_H(x,a_i) > r]\\ & \leq 2^n \cdot \Pr_{A}[\forall a_i \in A. d_H(x,a_i) > r]\\ & = 2^n \cdot (1-q)^t \leq 2^n \cdot e^{-q/t} \leq 2^n/e^{-n} < 1\,. \end{align*} \]

Thus, there exists a covering code of radius \(r\) and size \(2^{(1-H(r/n))n}\).

Theorem. For each \(r\in\mathbf N\), there is a deterministic algorithm for \(3\)-CNF-SAT that runs in time \(2^{(1-H(r/n))n}\cdot 3^r\). Optimizing over \(r\) yields a growth rate of \(1.5\).

Exercise. Prove that the growth rate of the above algorithm adapted to \(k\)-CNF-SAT is \(2-\frac{2}{k+1}\).