Notes on NP Completeness

So far we have focused on what problems computers can solve efficiently. Now we look at those problems that can’t be solved efficiently. Note, this seems to be essentially Chapter 8 from Tardos et al.’s Algorithm Design so I should probably go through and take notes from that.

Polynomial time reductions

A common theme in this course (and 251) has been to take a new problem and formulate it as a problem we already know how to solve.


You can formulate this problem ⇝ That problem

Bipartide matching ⇝ Max Fow Bipartide Vertex Cover ⇝ Min Cuit Max Flow ⇝ LP Knapsack, Longest common subsequence ⇝ Dynamic Program

We found a polynomial reduction from the new problem to an old problem we could solve in polynomial time. The problem on the left is essentially a special case of the problem on the left with roughly the same size (i.e. the new problem isn’t exponentially larger in size).

Claim 1 If A ⇝ B then A is a polytime solvable if B is polytime solvable.

Proof If x is an instance of A, then f(x) is an instance of B. Solve f(x) in polytime and map it back to a solution of A.

More specifically, we showed that every problem we can solve in polytime reduces to an LP (or, in fact, a Dynamic Program). But, we can also use reductions to show a problem is hard.

Showing something is hard with reduction

Claim 2 If A ⇝ B, then B cannot be solved in polytime if A cannot be solved in polytime.

Proof By contradiction if B can be solved in polytime then A can be too by Claim 1.

Also if B ⇝ A and A is hard, this doesn’t tell us anything about B. You cannot make the assumption that B is easy or hard.

Reduction I - Example Problem

See p.454 of Tardos et al.

Independent (Stable) Set ⇝ Vertex Cover

Independent Set: Does G cover a set of $\ge k$ vertices that are mutually non-adjacent? i.e. Is there a set of size k such that none of the vertices in the set are connected by an edge. YES/NO PROBLEM

Vertex Cover: Does G contain $\le c$ vertices s.t. every edge touches one of those vertices (c)? Can also be formulated as: Given a graph G and a number k, does G contain a vertex cover of size at most k?

Now, suppose $ S \le V $ is an independent set in G then $V-S$ is a vertex cover and vice versa. You can see this in any graph the nodes not an independent set must form a vertex cover. Think about if you just take an edge (u, v). If u is in S (the independent set), then v cannot be in S because of the definition of an independent set, and thus must be in V-S. Both, however can be in V-S, but then this just ensures coverage.


$\exists$ ind. set of size $\ge k$

$\exists$ a vertex cover of size $\le n-k$

So if we can solve V.C., we can solve Ind. Set. (and vice versa). The point of this (see the book for more details on the proofs) is to show that the two are of relatively equal complexity.

Reduction II - Example

p. 456 of Tardos et al.

Vertex Cover ⇝ Set Cover

Set Cover: Given n set $S_1, S_2, \dots, S_n \le W$. Is there a collection of k sets that cover every element in W?

So, we’re going to take each vertex and represent it as a set in the Set Cover problem. Thus we have a set $S_v$ for each $v \in V$

W is going to be the set of edges and $S_v = { e : e \in \delta({v})}$ Then a set cover of size k is a VC of size k. $\delta(v)$ are the edges which connect the sets.

Reduction with SAT

p. 459 of Tardos et al.

3-SAT ⇝ Ind. Set.


  • Given n boolean variables $x_1, \dots, x_n$ (that can be true or false)
  • $\bar x_i$ is false or $x_i$ is true
  • A clause C is a disjunction of terms:
    • e.g. $ C = (x_1 \lor {\bar x_3} \lor {\bar x_7} \lor x_9)$
  • Given a collection $C_1, C_2, \dots, C_m$ of clauses is there a T/F assignment to the variables s.t. every clause is satisfied?

In 3-SAT every clause has 3 literals as follows:

We reduce 3-Sat to Ind. Set. as follows…

We build a graph G with a triangle for each clause:

You have edges between any vertex corresponding to $x_i$ and any vertex corresponding to ${\bar x_i}$

There is a satisfying assignment iff $\exists$ independent set of size m = # clauses.

We have a triangle for each clause in an independent set. We can pick $\le 1$ vertex from each triangle. So the maximum independent set $\le m$. If you can get one of size m, then it’s actually going to be a satisfying assignment.

If we choose $x_i$ once, then we can choose no ${\bar x_i}$ and vice versa. I.e. a stable set gives no conflicts. I.e. It is a valid truth assignment.

This valid assignment is satisfying iff the stable set has size m.

Integer (Linear) Programming

An IP is an LP in which the feasible solutions are required to be integral. I kind of missed this, so should probably look it up, it’s not in Tardos… go figure

Efficient Certification

Say the input to a computational problem is encoded as a finite binary string s. Length of s = s . Decision problem X is a set of strings on which the answer is “yes”. We say that B is an efficient certifier for a problem X if the following properties hold:
  • B is a polytime algorithm that takes two input arguments s and t
  • There is a polynomial function p so that for every string s, we have $s \in X$ if and only if there exists a string $t$ such that $|t| \le p(|s|)$ and $B(s,t)=\text{ yes}$

This basically is just stating that there is some efficient check B which can validate a solution is correct in polytime. You still cannot find the solution itself in polytime.

NP: A class of problems

NP problems are the set of all problems for which there exits and efficient certifier. This we can observe:

$$ \cal{P} \subseteq \cal{NP}$$

Ah, but here is the interesting part. Note the equal part of this law. We can’t actually prove that $\cal{P} \ne \cal{NP}$ because it actually could. Theoretically there could be a way to solve every single NP problem in polytime, we just don’t know it yet and can’t prove otherwise.

NP-Complete Problems

Theorem: Suppose X is an NP-complete problem. Then X is solvable in polynomial time if and only if $\cal{P} = \cal{NP}$. Where X is the problem which all NP-Complete problems can be reduced to.

An important consequence of this theorem means, with previous theorems. That if there is any NP-complete problem which cannot be solved in polytime then there is no NP-complete problem that can be solved in polytime.

Non-Deterministic Algorithms

The original way to define NP is via non-deterministic algorithms. In a deterministic algorithm, every operation is uniquely defined. In a non-deterministic algorithm some operations are non-unique. Specifically:

  • Given a set of choices, the non-deterministic algorithm matches the BEST choice
    • i.e. the choice that leads to a YES answer in the quickest time (if such a choice exists)
    • i.e. unbounded parallel computation?

In particular, given a finite set $S$ of choices, we have an operation:

$$ x \leftarrow \text{Choice}(S)$$

which sets x to be the best choice $s \in S$ in O(1) time.

The total time required by a N-D algorithm is the minimum time to obtain a YES answer if one exists.

e.g. The following is an O(n+m) time N-D algorithm for SAT.

For i=1 to n
      $x_i \leftarrow \text{Choice}(\lbrace0,1\rbrace)$

if $\lbrace x_1, \dots, x_n \rbrace$ is satisfying
      output YES
      output NO

This is more powerful than a quantum computer. The problem with quantum computers is that it takes ages to extract an answer, but in this you can read the answers quickly and do the task in parallel.

So now we have:

IP : class of problems solvable in polytime by a deterministic algorithm
NP (Non-Deterministic Polynomial) : class of problems solvable in polytime by an N-D algorithm

Now SAT is NP-Complete, hence we give a “proof”.

Cooke’s Theorem    SAT is NP-Complete

We have to show every problem $Q \in NP$ has a $Q \rightsquigarrow SAT$

By definition, if $Q \in NP$, then $\exists$ non-deterministic algorithm $A$ that solves $Q$ in polytime (on any instance $x \in Q$). We will show that we can mimic any non-deterministic algorithm using poly-sized SAT formulation! Specifically, given $A$ and an instance $x \in Q$ find a SAT instance $S$ s.t. $S$ is satisfiable iff $A$ outputs YES on $x \in Q$

How can we possibly prove this?

First, we create a simple programming language for NP algorithms and show SAT can implement it. The language has the following properties:

  1. It is word based
    • Physical memory requirements
      • There are W words
      • There are B bits/word
    • The running time is $T \le poly(n)$
    • There are I instructions
  2. Operations
    • Logical operations
      • e.g. -, +, /, AND, OR, NOT, etc.
    • Relational operators
      • e.g. $ \le, \ge, = $
  3. Assignment Statements
    • variable assignment
    • simple expression
    • $ \text{Choice}(S) $
  4. Conditional statements
    • if ____ then (goto) ____ else endif
    • Using this we can get while, repeat, switch/case, etc.
  5. Output Statements
    • We just need to be able to output YES/NO in this case.

How does this relate to SAT?!?! (the prof wrote this on the board, not me)

Given such an algorithm $A$ we need to describe what are the variables and what are the clauses.

  • Boolean Variables
    1. $X(w, b, t)$ (memory)
      • $1 \le w \le W$ = word
      • $1 \le b \le B$ = bit
      • $0 \le t \le T$ = time
      • $X(w, b, t) =$ ${T \text{ Bit of word w is 1 after the t timestep}}\brace{F \text{ bit of word w is 0 after the t timestep}}$
    2. $Y(i,t)$
      • $1 \le i \le I$
      • $0 \le t \le T$
      • $Y(i,t) = $ ${T \text{ instruction i executed at time t}}\brace{F \text{ otherwise}}$
  • Clauses
    1. Input Clauses
      • $Z(w, b, 0)$ where $X(w, b, 0) = $ ${X(w,b,0) \text{ if input 1}}\brace{ {\bar X } (w, b, 0) \text{ if input 0}}$
      • Make sure to set all inputs to correct value
    2. Instruction Clause
      • start instruction 1 at time 1
        • $Y(1,1) \land {\bar Y} (2, 1) \land \dots \land {\bar Y} (I, 1)
      • only one instruction per time period
        • at least one
          • $[ Y(1, t) \lor Y(2, t) \lor \dots \lor Y(I, t) ]$
        • at most once
          • $[ {\bar Y} (i, t) \lor {\bar Y}(i’, t) ]$
    3. Conditional Statements
      • $Z(i, t)$
        • $ Z(i, t) = {\bar Y}(i, t) \lor [ X(w\ast, b\ast, t-1) \land Y(y\ast, t+1) ] \lor [ {\bar X} (w\ast, b\ast, t-1) \land Y(i+1, t+1) ] $
        • where instruction instruction i if $(w\ast, b\ast) $ then goto $ i\ast $
        • Use distributive laws, etc. to write in Conjuctive normal form (i.e. SAT form)
    4. Assignment clauses
      • Let $ V(w, b, t) = [ X(w, b, t-1) \land X(w, b, t) ] \lor [ {\bar X}(w, b, t-1) \land {\bar X} (w, b, t) ] $
      • i.e. (w, b) doesn’t change at time t.
      • e.g. $(w, b, t) \leftarrow \text{Choice}(\lbrace 0, 1 \rbrace)$
        • $ [ X(w, b, t) \lor {\bar X} (w, b, t) ] \land [ \all_{b\prime \ne b} \and V(w, b\prime , t) ] \and [ \all_{w\prime \ne w} \land V(w\prime , b\prime , t) $
        • here the first part (the oring of the X’s) is the choice and the rest of the ands are so the rest of the bits don’t change?
    5. Output clauses
      • $ Z(\widehat{i}, t) = {\bar Y}(\widehat{i}, t) \lor Y(\widehat{i}, t+1)$
      • If you’re running an instruction at time t, then you’re running it at time t+1?

So we can implement A using a polysized SAT instance $S$. So you can implement almost any problem as a SAT, so SAT is NP-Complete.

Written on October 16, 2014