## Recap

Last time we discussed an algorithm for CNF-SAT that runs in time \(2^{n(1-\frac{1}{2\log m})}\). Today we will use this algorithm to improve upon exhaustive search for Quantified Boolean Formulas. The lecture closely follows a paper of Santhanam and Williams (2015).

## Quantified Boolean Formulas

The Quantified Boolean Formula Problem \(q\)-QBF is the complete problem for \(q\)-th level of the polynomial-time hierarchy, that is, either \(\Sigma_q^p\) or \(\Pi_q^p\) depending on the order of quantification. For all \(i\in [q]\), let \(X_i\) be a tuple of \(n_i\) variables, and let \(F(x)\) be a formula in these variables. Then the input is a sentence

\[Q_1 X_1 Q_2 X_2 \dots Q_q X_q . F(X_1,\dots,X_q)\,,\]

where \(Q_i \in \{\exists, \forall\}\) for all \(i\in[q]\). The task is to determine whether the sentence is true or false. Restricted to sentences with \(Q_1=\exists\), the problem is \(\Sigma_q^p\)-complete. We let \(q\)-QB-CNF be \(q\)-QBF restricted to formulas \(F\) in CNF, and \(q\)-QB-\(k\)-CNF be \(q\)-QBF restricted to formulas \(F\) in \(k\)-CNF.

In this lecture, we will devise algorithms for \(q\)-QBF that are significantly faster than exhaustive search. But first we observe how the structure of the formula interacts with the quantifiers.

**Observation.** \(\exists\forall\)-\(3\)-CNF is in NP.

To see this, let \(F=C_1\wedge\dots\wedge C_m\) be a CNF formula. Let \(x\) be a universally quantified variable. Any clause \(C\) that contains a literal of \(x\) must be satisfied for \(x=0\) and for \(x=1\). But this means that we can remove the literal from the clause. Thus the hard case of \(q\)-QB-CNF occurs when the innermost quantifier is existential.

**Exercise.** Prove that \(\forall\exists\)-\(3\)-CNF is \(\Pi_2^p\)-complete.

## Two Quantifier Block 3-CNF

We will now construct an algorithm for \(\forall\exists\)-sentences over \(3\)-CNF formulas. The algorithm makes use of a fast \(3\)-SAT algorithm (e.g. the deterministic \(1.5^n\) time algorithm from Lecture 2) as well as of a fast algorithm for CNF-SAT (e.g. the randomized \(2^{n\left(1-\frac{1}{2\log m}\right)}\) algorithm from Lecture 5).

**Theorem.** \(2\)-QB-\(3\)-CNF can be solved in time \(2^{n-\gamma\sqrt{n}}\) for some constant \(\gamma>0\).

### The algorithm

The input of the algorithm is a formula \(F\) with \(u\) universally quantified variables and \(e\) existentially quantified variables.

- If a clause is empty or contains only universally quantified variables, then the sentence \(\forall X_u \exists X_e. F(X_u,X_e)\) is false.
- If \(e>.001\cdot\sqrt{n}\), then try all \(2^{u}\) assignments \(A\) for \(X_u\) and solve the \(3\)-CNF-SAT instance \(\exists X_e. F(A,X_e)\). This takes time \(2^{u} 1.5^{e} \leq 2^{n-e+\log(1.5) e} \leq 2^{n-\Omega(\sqrt{n})}\).
- Let \(C=(u_1\vee u_2 \vee e)\) be a clause with two universal literals \(u_1,u_2\) and one existential literal \(e\). We branch on the clause, that is, we run the algorithm recursively on the following three restricted formulas, and we accept if and only if all three branches return that the respective sentence is satisfiable:
- \(F\vert_{u_1=1}\)
- \(F\vert_{u_1=0,\; u_2=1}\)
- \(F\vert_{u_1=u_2=0,\; e=1}\)

- If all clauses contain at most one universal literal, we transform the \(\exists X_e.F(X_u,X_e)\) part of the sentence into a DNF formula \(F'\) over \(X_u\) by expanding the existential quantifier into a big OR: \[\exists X_e.F(X_u,X_e) \equiv F'(X_u) := \bigvee_{A \in \{0,1\}^e} F(X_u,A)\,.\] Since every clause contains at most one universal literal, \(F(X_u, A)\) is the AND of literals of \(X_u\); that is, \(F'(X_u)\) is a \(u\)-DNF formula. We determine the validity of the sentence \(\forall X_u.F'(X_u)\) using the CNF-SAT algorithm from last lecture. Because \(F'\) has \(m'\leq 2^e\) terms, this takes time \[2^e\cdot 2^{u(1-1/\log m')}\leq 2^{e+u-u/e}\,.\]

### Analysis

We analyze the branching tree of the algorithm. We only consider the three branching cases from step 2 of the algorithm. First note that, \(e\leq .001\sqrt{n}\) holds after step 1. Therefore, any path in the branching tree can have at most \(.001\sqrt{n}\) branchings of type c. Let \(f(d,i)\) be the number of leaves at depth \(d\) that are reached by \(i\) branchings of type c. That is, \(d\) branchings have occurred in total and \(i\) of them are of type c. We have:

- \(f(0,0)=1\).
- \(f(d,0)=f(d-1,0)+f(d-2,0)\).
- \(f(d,i)=f(d-1,i)+f(d-2,i)+f(d-3,i-1)\).

**Exercise.** Prove that \(f(d,i)\leq \binom{d}{i} O(\phi^d)\) holds, where \(\phi=\frac{1+\sqrt{5}}{2}\) is the golden ratio.

The number \(N_d\) of leaves at depth \(d\) in the branching tree is \[N_d\leq\sum_{i=0}^{i_\max} f(d,i) = O(\phi^d)\cdot \sum_{i=0}^{i_\max} \binom{d}{i}\,.\] where \(i_\max=\min(d,.001\sqrt{n})\). Note that we always have \(N_d\leq 3^d\), and for \(i_\max\leq d/2\) we have \(N_d\leq 2^{H(i_\max/d)\cdot d}\cdot O(\phi^d)\) using the binary entropy estimate for the binomial coefficient. The total number of leaves is \(N=\sum_d N_d \leq (\phi+.01)^{n}\) for \(n\) large enough.

The total running time is a combination of the number of leaves and the running time \(2^{e+u-u/e}\) of the algorithm executed at the leaves. Simply multiplying these two number gives an upper bound on the total running time, but this estimate is too rough. Instead, we perform a more careful bookkeeping where we consider leaves level by level.

*Leaves that are close to the root.*Let us consider leaves at depth \(d\leq 100\sqrt{n}\). By the trivial estimate, there are at most \(N_{d}\leq 3^d\) of them. Moreover, at such leaves we have \(u+e \leq n-d\), \(u \geq (n-100\sqrt{n})\), and \(e \leq .001\sqrt{n}\). Thus \(u/e\geq 100 \sqrt{n}- O(1)\), and the total running time contribution of these leaves is \[N_{d}\cdot 2^{e+u-u/e}\leq 3^d \cdot2^{n-d-100\sqrt{n}+O(1)} = O(1.5^d 2^{n-100\sqrt{n}})\leq 2^{n-\Omega(\sqrt{n})}\,,\] where the latter inequality holds for \(d \leq 100\sqrt{n}\) since \(1.5^{100}<2^{100}\).*Leaves that are far away from the root.*Let us consider leaves at depth \(d>100\sqrt{n}\). For these leaves, the binary entropy estimate applies, and we get \[N_d\leq 2^{H(1/100)\cdot d} \cdot \phi^d \leq (\phi+0.01)^d\,.\] Moreover, we have \(e+u \leq n-d\) since the leaf is at depth \(d\) and every branching removes one or more variables. Therefore, the running time of all leaves at depth \(d\) is bounded by \[N_d \cdot 2^{u+e} \leq (\phi+0.01)^d 2^{n-d} \leq 2^{n-\Omega(\sqrt{n})}\,,\] where we use the fact that \(d\geq \Omega(\sqrt{n})\); note that we do not necessarily need to use the fast CNF-SAT algorithm here since trivial exhaustive search suffices in this case.

This concludes the analysis of the algorithm.

## Many Quantifiers

We also mention the following observation.

**Theorem.** \(q\)-QB-CNF can be solved in time \(2^{n-\Omega(q)}\).

This observation is useful when \(q=\omega(1)\), and in particular, when \(q=\omega(\log n)\). We give the algorithm here:

- If the sentence is trivial, we return its truth value.
- If the sentence has the form \(\exists x_e. S(x_e)\) where \(S\) is a sentence and \(x_e\) is a single variable, then we do the following:
- Set \(x_e\) to a random value \(b_e\in\{0,1\}\). If we recursively find \(S(b_e)\) to be true, we return true, and otherwise, we recurse on \(S(1-b_e)\) and return its return value.

- If the sentence has the form \(\forall x_u. S(x_u)\) where \(S\) is a sentence and \(x_u\) is a single variable, then we do the following:
- Set \(x_u\) to a random value \(b_u\in\{0,1\}\). If we recursively find \(S(b_u)\) to be false, we return false, and otherwise, we recurse on \(S(1-b_u)\) and return its return value.

Clearly the algorithm is correct. Moreover, the expected running time is reduced by a constant factor for every other quantifier alternation.

**Exercise.** Prove this formally.