The shortest proof for every theorem

Posted on June 15, 2013 by Aaron Rotenberg

Suppose we want to show that there is a number \(n\) such that no upper bound on the busy beaver number \(\Sigma(n)\) is provable from, say, the axioms of ZFC. One possible proof is as follows: Assume that no such \(n\) exists. That is, assume that for every \(n\), there is a proof in first-order logic that uses only the axioms of ZFC and derives a sentence \(\Sigma(n) \leq m\) for some \(m\). Then we could compute an upper bound for \(\Sigma(n)\) by exhaustively searching all of the possible valid ZFC proofs until we found one that deduced a theorem of the form \(\Sigma(n) \leq m\); by assumption, we must eventually find such a proof. But we know that we cannot compute an upper bound on \(\Sigma(n)\) for arbitrary \(n\); thus, we have derived a contradiction, and the original assumption must be false.

Enumerating proofs until you find one with the desired properties is one of those grand CS traditions that, like calling an \(O(n^{12})\) algorithm “efficient” while dismissing an \(O(1.001^n)\) algorithm as “intractable”, is handy for theoretical purposes but has essentially no bearing on the real world whatsoever. After all, no one would really write a program that loops through all possible proofs and tests each one.


I decided to experimentally test what happens if you actually try to enumerate proofs, starting with the shortest and working up to ever-more-complicated ones. To keep things simple, I’ll restrict the allowed sentences to propositional logic (no first-order predicates or quantifiers). Applying the Curry–Howard correspondence, every natural deduction proof corresponds to a lambda calculus term, with the type of each term being the theorem it proves. So all we have to do is enumerate well-formed lambda calculus expressions, applying Hindley–Milner type inference to determine whether each term is well-typed and, if it is, the most general theorem that it proves.

The lambda calculus by itself only gives you the implicational fragment of intuitionistic propositional logic. To get all of classical propositional logic, you have to add either additional syntax elements or additional axioms. I’ll use the following set of axioms:

\[\text{Unit} : \top\] \[\text{DoubleNegElim} : \forall p.\ ((p \rightarrow \bot) \rightarrow \bot) \rightarrow p\] \[\text{Pair} : \forall p, q.\ p \rightarrow q \rightarrow p \wedge q\] \[\text{Fst} : \forall p, q.\ p \wedge q \rightarrow p\] \[\text{Snd} : \forall p, q.\ p \wedge q \rightarrow q\] \[\text{Left} : \forall p, q.\ p \rightarrow p \vee q\] \[\text{Right} : \forall p, q.\ q \rightarrow p \vee q\] \[\text{Case} : \forall p, q, r.\ p \vee q \rightarrow (p \rightarrow r) \rightarrow (q \rightarrow r) \rightarrow r\]

(Note that I’m assuming that \(\neg p\) is a synonym for \(p \rightarrow \bot\).)

I’ll use Haskell for this, since it’s my favorite programming language and it’s particularly well-suited to this sort of task. It isn’t too hard to write a corecursive function that spits out a list containing each lambda calculus term exactly once (up to alpha-renaming):

newtype VarID = VarID String
    deriving (Eq, Ord)
data Expr
    = Var VarID
    | Expr `Ap` Expr
    | Lambda VarID Expr

allExprs :: [Expr]
allExprs = exprs [] (map VarID goodNames)
exprs :: [VarID] -> [VarID] -> [Expr]
exprs boundVars availableVars@(var:vars) = atomicExprs ++ interleave apExprs lambdaExprs
  where atomicExprs = map Var $ boundVars ++ Map.keys defaultVarTypes
        apExprs = [(exprs boundVars availableVars !! li) `Ap`
            (exprs boundVars availableVars !! ri) | (li, ri) <- pairingList]
        lambdaExprs = map (Lambda var) $ exprs (var:boundVars) vars
exprs _ [] = error "Shouldn't happen."

goodNames :: [String]
goodNames = [[base] | base <- ['a'..'z']] ++ [base : show num | num <- [1..], base <- ['a'..'z']]

pairingList :: [(Int, Int)]
pairingList = [(a, b - a) | b <- [0..], a <- [0..b]]

interleave :: [a] -> [a] -> [a]
interleave (x:xs) (y:ys) = x : y : interleave xs ys
interleave _ _ = error "Shouldn't happen."

This code is easier to understand from its results than it is from the code itself:

> mapM_ print allExprs
Case Case
\a. a
Case DoubleNegElim
\a. Case
DoubleNegElim Case
\a. DoubleNegElim
Case Fst

The position of each expression in this list is a Gödel numbering of the well-formed lambda terms. It loosely corresponds with the “length” of the expression, so you could say that the first proof in this list proving a particular theorem is the “shortest” proof of that theorem, under a particular definition of “shortest”. “Least complex” might be a better description.

One thing we can do with this function is generate every proof up to a certain complexity, then sift through to see if there are any interesting theorems. We might call an “interesting” theorem one with a short description. If there are multiple valid proofs of a theorem, we pick the least-complex one. That’s what this next bit of code does:

listTheorems :: Int -> IO ()
listTheorems maxIndex = do
    let coveredExprs = take maxIndex allExprs
        theoremsList = foldl' (\best e ->
            case principalType defaultEnv e of
                Left _ -> best
                Right t
                    | isNew (map fst best) t -> (t, e):best
                    | otherwise -> best
          ) [] coveredExprs
        sortedTheorems = sortBy (comparing (typeLength . fst)) theoremsList
    forM_ sortedTheorems $ \(t, e) ->
        putStrLn $ show e ++ " : " ++ show (prettify t)

isNew :: [Polytype] -> Polytype -> Bool
isNew existing maybeNew = not $ any (\pt -> isMoreGeneralThan defaultEnv pt maybeNew) existing

Notice how we omit a proof if we’ve already given a simpler proof of a more general result, but include a proof of a more general theorem even if it’s more complex to show than previously-discovered restricted cases.

The complete Haskell source code for this toy is available here. I compiled it with ghc -O2, which helps a lot speed-wise relative to just running it in GHCi. Running listTheorems 4000, here are the top 15 most elegant tautologies (according to the program) and their simplest proofs in this particular axiom system:

Unit : True
\a. DoubleNegElim (\b. a) : forall a. False -> a
Right Unit : forall a. a || True
Left Unit : forall a. True || a
\a. Unit : forall a. a -> True
\a. a : forall a. a -> a
\a. \b. DoubleNegElim (\c. b) : forall a b. a -> (False -> b)
Right (\a. Unit) : forall a b. b || (a -> True)
Left (\a. Unit) : forall a b. (a -> True) || b
\a. Right Unit : forall a b. a -> (b || True)
\a. Left Unit : forall a b. a -> (True || b)
Right (\a. a) : forall a b. b || (a -> a)
\a. a Unit : forall a. (True -> a) -> a
Left (\a. a) : forall a b. (a -> a) || b
Pair Unit : forall a. a -> (True && a)

The proof of the principle of explosion on the second line is particularly nice, but the program’s ranking of what constitutes a “nice” theorem starts getting weird after about line 6. The ranking rule of shortest theorem is very simplistic, not taking into account things like how each theorem has an impact on making others easier to prove. In fact, the program never adds theorems to its “library”, which means that finding a proof of a sentence that depends on several other theorems with complex proofs is very hard, because it has to re-prove them every time.

Another interesting thing to try with this code is to search for the least complex proofs for a given theorem. This is easy enough to do:

listProofs :: Monotype -> IO ()
listProofs mt = do
    let targetTheorem = universalClosure defaultEnv mt
    forM_ allExprs $ \e ->
        when (isProofOf targetTheorem e) $
            print e

isProofOf :: Polytype -> Expr -> Bool
isProofOf pt e =
    case principalType defaultEnv e of
        Left _ -> False
        Right t -> isMoreGeneralThan defaultEnv t pt

Running listProofs on an easy theorem rapidly generates an infinite list of proofs of increasingly-unnecessary complexity. For example, running it on \(\bot \rightarrow \bot\) produces a list that starts with:

\a. a
DoubleNegElim DoubleNegElim
\a. (\b. b) a
(\a. a) (\a. a)
\a. (\b. a) a
\a. (\b. a) Case
\a. DoubleNegElim (\b. a)
\a. (\b. a) DoubleNegElim
\a. (\b. a) Fst
\a. (\b. a) Left
(\a. a) (DoubleNegElim DoubleNegElim)

On the other hand, I still haven’t gotten a search for the law of the excluded middle \(p \vee (p \rightarrow \bot)\) or Peirce’s law \(((p \rightarrow q) \rightarrow p) \rightarrow p\) to succeed from this axiom set. Since the axioms are complete for classical logic, such a search must terminate eventually, but the shortest proof may be way down the list.

Get the source code and play with it yourself!