Here is a description of an abstract data type: a confluently persistent set data type containing a subset of the integers in a range \([1..n]\) and supporting \(O(\log(n)^c)\) time creation, \(O(\log(n)^c)\) time size query, \(O(n \log(n)^c)\) time conversion to a duplicate-free list, and \(O(\log(n)^c)\) time merging, for some constant \(c\). Here, “creation” takes as arguments the range of the element domain \([1..n]\) and an optional single element in this range, and “merging” takes two sets with the same domain and returns a new set representing the union of the two input sets. In Haskell, the type signatures for these operations might look like this:

1 2 3 4 5 6 7 8 |
data MergeableSet = ... type Elem = Int empty :: (Elem, Elem) -> MergeableSet singleton :: (Elem, Elem) -> Elem -> MergeableSet size :: MergeableSet -> Int toList :: MergeableSet -> [Elem] union :: MergeableSet -> MergeableSet -> MergeableSet |

Seems fairly reasonable, right? I’m going to show that **it is likely that no such data structure exists**.

First, note that some very similar data structures do in fact exist. Haskell’s Data.Set can be used to implement this interface with \(O(1)\) `singleton`

and `size`

, \(O(\log(n))\) membership testing (which is obviously much more powerful than `toList`

), and \(O(n)\) `union`

. Brodal, Makris, and Tsichlas (2006) presented a purely functional data structure that has \(O(1)\) `singleton`

, \(O(\log(n))\) membership testing, and \(O(1)\) “`join`

“, which is the same as `union`

but requires every element in the first set to be strictly less than every element in the second set.

So why is the variant above so implausible?

## Argument 1: Fast transitive closure

If a `MergeableSet`

data structure with the given time bounds exists (even without the `size`

operation), then **it is possible to find the transitive closure of an \(n\)-vertex graph in near-optimal time \(O(n^2 \log(n)^c)\)**.

The algorithms for computing the transitive closure of a graph with the current best known worst-case runtime are based on algorithms for fast matrix multiplication. In particular, transitive closure of a \(n\)-vertex graph can be computed in time \(O(n^\omega)\) where \(\omega < 2.373\) is the best known exponent for matrix multiplication. A faster algorithm for transitive closure would actually give a faster algorithm for Boolean matrix multiplication as well, as noted by Fischer and Meyer (1971).

Now, the problem of finding the transitive closures of a general graph can be reduced to the problem of finding the transitive closure of a directed acyclic graph. We can just compute the strongly connected components of the graph using any of the several linear-time algorithms, then compute the transitive closure of the resulting kernel DAG. Looping over the pairs of vertices in the original graph to move back to the starting domain takes \(O(n^2)\) time, but since the size of the output is \(n^2\) bits anyway there’s no additional asymptotic cost.

Suppose then that `MergeableSet`

exists and we want to find the transitive closure of a DAG. We can associate to each vertex the set of vertices reachable from that vertex, stored as a `MergeableSet`

. By traversing the graph in reverse topological order and using `union`

to combine the sets of all of the vertices adjacent to each vertex, we can compute `MergeableSet`

s of reachable vertices for all vertices in \(O(n^2 \log(n)^d)\) time. Then we just loop over all \(n\) vertices and obtain their lists of reachable vertices using `toList`

, which also takes \(O(n^2 \log(n)^d)\) time. A Haskell implementation of this idea (adding a slight \(O(\log(n))\) overhead by using `Data.Map`

so that I don’t have to get mutable arrays involved) looks like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import qualified Data.Array as Array import Data.Graph import qualified Data.Map as Map dagTransitiveClosure :: Graph -> Graph dagTransitiveClosure g = buildG (Array.bounds g) transitiveClosureEdges where rs = reachableSets g transitiveClosureEdges = [(v1, v2) | v1 <- vertices g, v2 <- toList (rs Map.! v1), v1 /= v2] type ReachableSets = Map.Map Vertex MergeableSet reachableSets :: Graph -> ReachableSets reachableSets g = foldl addVertex Map.empty $ topSort $ transposeG g where addVertex :: ReachableSets -> Vertex -> ReachableSets addVertex rs v = Map.insert v reachableSet rs where reachableSet = foldl union (singleton (Array.bounds g) v) $ map (rs Map.!) $ g Array.! v |

## Argument 2: Faster Boolean satisfiability

If a `MergeableSet`

data structure with the given time bounds exists (even without the `toList`

operation), then **Cnf-Sat, the Boolean satisfiability problem for formulas in conjunctive normal form, has a \(2^{\delta n} \cdot \text{poly}(m)\) algorithm for some \(\delta < 1\)**.

Pătrașcu and Williams (2010) gave several hypotheses under which Cnf-Sat would have substantially faster algorithms than brute-force search. One of their theorems is as follows: if a certain problem 2Sat+2Clauses can be solved in time \(O((n + m)^{2 – \epsilon})\) for any \(\epsilon > 0\), then Cnf-Sat with \(n\) variables and \(m\) clauses can be solved in time \(2^{\delta n} \cdot \text{poly}(m)\) for some \(\delta < 1\). They note in passing that 2Sat+2Clauses reduces in linear time to the following problem:

Given a directed graph \(G = (V, E)\) and subsets \(S, T \subseteq V\), determine if there is some \(s \in S\) and \(t \in T\) with no path from \(s\) to \(t\).

By computing the strongly-connected components of \(G\), we can again without loss of generality assume that \(G\) is acyclic.

Now suppose that `MergeableSet`

exists. Then it is possible to solve this problem in time \(O((n + m) \cdot \log(n)^c)\) for a graph with \(n\) vertices and \(m\) edges. First, we compute the set of vertices in \(T\) reachable from each vertex, using essentially the same algorithm as the one for transitive closure from before. Then we loop over each vertex in \(S\) and use `size`

to test whether the size of its reachable set is less than \(|T|\). If we find a vertex \(s\) where this is the case, then return true; otherwise, return false. (We can also find a specific vertex \(t\) with no path from \(s\) to \(t\) by depth-first search from \(s\).)

So, to summarize, `MergeableSet`

would dramatically improve upon the known upper bounds for graph reachability problems. It’s probably too good to be true.