Mathematics · Combinatorics · Probabilistic Argument

Demigod's Number — a high-probability 36-move bound for 3×3

Arturo Merino · Bernardo Subercaseaux arXiv:2501.00144 PDFSubmitted 2024-12-30

In 2010 Rokicki et al. used 35 CPU-years on a Google cluster to prove 3×3 God's number is exactly $20$ HTM — a result almost no one has ever independently reproduced. In late 2024, Merino & Subercaseaux proposed a fundamentally different approach: settle for the weaker $\leq 36$ bound, in exchange for needing only 500,000 random samples and ~100 hours of single-machine compute, each one certified by a short Kociemba solution. Anyone with a laptop and a Kociemba solver can replicate it overnight. This page walks through the paper end-to-end — proof, concentration bound, sampling algorithm, and its place beside God's number.

God's number

20 HTM

Proven exact: Rokicki et al. 2010, 35 CPU-years

SIAM Rev. 2014

Demigod's number

36 HTM

High-probability bound: Merino & Subercaseaux 2024, ~100 hr laptop

arXiv:2501.00144

Human's number

205 HTM

Beginner's method ceiling: Lemma 8, 7-step LBL

Appendix A

In one lineFor any vertex-transitive graph

G

D < 2 μ

. The 3×3 Cayley graph is vertex-transitive; 500k samples + Hoeffding give

\overset{μ}{^} \approx 18.32 \pm 0.1

, so

D \leq 2 \cdot 18.4804 < 37

, and since

D \in Z

⇒

D \leq 36

TL;DR

What: a re-derivation of the 3×3 diameter, relaxed from the exact $20$ down to a probabilistic $\leq 36$ , but reproducible on a single laptop. The authors dub this "trade compute for a weaker conclusion" pattern a demigod number.
How: (i) on any vertex-transitive graph, $D < 2 μ$ (basic triangle inequality); (ii) the 3×3 Cayley graph is vertex-transitive; (iii) Hoeffding pins $μ$ close to the empirical $\overset{μ}{^}$ .
Key trick: Plain Hoeffding with $C = 205$ (Human's number as worst case) demands ~ $3 \times 1 0^{8}$ samples; proving " $\geq 99.97%$ of states need $\leq 20$ moves" lets $C = 20$ , cutting samples to $5 \times 1 0^{5}$ .
Experiment: MacBook Pro M3 ran 500k samples in $\approx$ 5 hours. Every sample solves in $\leq 20$ , mean $18.3189$ , total error probability $< 1 0^{- 10}$ .
Generalises: applies to any vertex-transitive graph that admits uniform sampling — Pyraminx, Megaminx, 5×5, the 15-puzzle on a torus — even where God's number is unknown.

3. Preliminaries

Model the cube as a group: every position is a permutation of $S_{54}$ (54 stickers); face turns ${R, L, U, D, F, B}$ are generators; the six fixed centres let us embed it in $S_{48}$ . Classical setup:

Definition 1 — Cayley graph

Given group $G = (A, ⋆)$ and generators $S \subseteq A$ , the Cayley graph $G_{S}$ has vertex set $A$ with edge $(u, v)$ iff some $s \in S$ gives $u ⋆ s = v$ .

For the cube, this gives

S_{R} := {R, R^{'}, R^{2}, L, L^{'}, L^{2}, U, U^{'}, U^{2}, D, D^{'}, D^{2}, F, F^{'}, F^{2}, B, B^{'}, B^{2}}

18 generators — counting

R^{2}

as a single move is HTM; counting it as two gives QTM, where God's number is

26

Definitions 2 / 3 — Diameter and mean

D = u, v \in (2 V) max d (u, v), μ = \frac{1}{( 2 ∣ V ∣ )} u, v \in (2 V) \sum d (u, v) .

"God's number" = $D$ ; the demigod's number is a high-probability upper bound on $2 μ$ .

4. Core theorem: vertex-transitive ⇒ D < 2μ

For general graphs $D / μ$ can blow up — the paper gives the clique-plus-tail example reaching $Ω (n^{1/2})$ , which Wu et al. 2011 prove tight. Vertex transitivity, however, forces the ratio below 2.

Theorem 2 (paper)

For any vertex-transitive graph $G$ with diameter $D$ and mean distance $μ$ , we have $D < 2 μ$ .

Proof

First a lemma: in a vertex-transitive graph, the mean distance equals the average distance from any fixed vertex:

μ = \frac{1}{∣ V ∣ - 1} \sum_{v \in V} d (x, v) for any fixed x \in V .

(For each $u$ there is an automorphism $ϕ_{u}$ sending $u$ to $x$ , and automorphisms preserve distances; double-summing and using this collapses the $\sum_{u}$ .)

Now take $u, v$ with $d (u, v) = D$ . Apply Lemma 4 twice:

2 μ = \frac{\sum _{x \in V} d ( u , x )}{∣ V ∣ - 1} + \frac{\sum _{x \in V} d ( v , x )}{∣ V ∣ - 1} \geq △ \frac{\sum _{x \in V} d ( u , v )}{∣ V ∣ - 1} = \frac{∣ V ∣ \cdot D}{∣ V ∣ - 1} > D .

Triangle inequality: $d (u, x) + d (v, x) \geq d (u, v) = D$ . ∎

Tight?Yes. On

C_{2 n}

we have

D = n

and

μ = n /2 + o (1)

, so

D /2 μ \to 1

. Same for the hypercube

Q_{n}

D = n

μ = n /2

5. Why Cayley graphs are vertex-transitive — interactive

This is the step that hooks the cube onto Theorem 2. The proof is a one-liner and makes "group structure = symmetry" concrete.

Lemma 3

Every Cayley graph is vertex-transitive.

Proof

Given $u, v$ , set $ϕ (x) = v ⋆ u^{- 1} ⋆ x$ . It sends $u$ to $v$ , has inverse $x \mapsto u ⋆ v^{- 1} ⋆ x$ , and preserves edges:

ϕ (x)^{- 1} ⋆ ϕ (y) = (v ⋆ u^{- 1} ⋆ x)^{- 1} ⋆ (v ⋆ u^{- 1} ⋆ y) = x^{- 1} ⋆ y,

so $(x, y) \in E ⟺ x^{- 1} ⋆ y \in S ⟺ ϕ (x)^{- 1} ⋆ ϕ (y) \in S ⟺ (ϕ (x), ϕ (y)) \in E$ . ∎

Watch Theorem 2 act on the simplest non-trivial Cayley graph, $C_{2 n}$ — drag $x$ to see the triangle inequality flexing:

Cycle size

∣ V ∣ = 2 n

Third vertex

x

x = 3

D = d (u, v) = 8

d (u, x) = 3

d (v, x) = 5

d (u, x) + d (v, x) = 8 \geq 8 = D

Exact mean

μ = \frac{1}{2 n - 1} \sum_{k = 1}^{2 n - 1} min (k, 2 n - k) = 4.2667

Compare

2 μ = 8.5333 > 8 = D

✓

Drag $x$ around the cycle: every position satisfies $d (u, x) + d (v, x) \geq D$ (triangle inequality). Summing over all $x$ and dividing by $∣ V ∣ - 1$ gives $2 μ \geq ∣ V ∣ D / (∣ V ∣ - 1) > D$ — that's Theorem 2 on $C_{2 n}$ , on a single page.

…

6. Theorem 1: the master probability bound + Hoeffding interactive

The headline. Combine $D < 2 μ$ with Hoeffding's inequality.

Lemma 5 — Hoeffding's inequality

For i.i.d. $X_{1}, \dots, X_{s} \in [0, C]$ with mean $μ$ and empirical mean $\overset{μ}{^}$ , for any $t > 0$ :

Pr [∣ \overset{μ}{^} - μ ∣ \geq t] \leq 2 exp (- \frac{2 s t ^{2}}{C ^{2}}) .

Theorem 1

For a state $s$ , let $d (s)$ be its distance to the identity; let $S$ be a uniformly sampled set with empirical mean $\overset{μ}{^}_{S}$ . Then:

Pr_{S} [D \geq 2 \overset{μ}{^}_{S} + 0.36] < 2 exp (\frac{- ∣ S ∣}{1 541 939}) .

Proof sketch

Lemma 8 (Human's number) yields $d (s) \in [0, 205]$ . Plug $C = 205$ , $t = 0.1$ into Hoeffding:

Pr [μ \geq \overset{μ}{^} + 0.18] \leq 2 exp (\frac{- 2∣ S ∣ \cdot 0. 1 ^{2}}{20 5 ^{2}}) < 2 exp (\frac{- ∣ S ∣}{1 541 939}) .

Then by Theorem 2 ( $D < 2 μ$ ):

Pr [D \geq 2 \overset{μ}{^} + 0.36] \leq Pr [2 μ \geq 2 \overset{μ}{^} + 0.36] < 2 exp (\frac{- ∣ S ∣}{1 541 939}) .

∎

The panel below exposes the three knobs: sample size $∣ S ∣$ , tolerance $t$ , and the Hoeffding constant $C \in {205, 20}$ . Two curves show the naive vs refined decay; the dot marks your current selection.

Sample size

∣ S ∣

500,000

10³10⁹

Tolerance

t

0.100

0.020.50

Bound constant

C

Refined: on "close pairs" d(s) ∈ [0,20]

Naive (C=205, plain Hoeffding)Refined (C=20, after close-pair argument)

Pr[fail] now

2.78×10^-11

Need |S| for 1% err

105,967

Need |S| for 10⁻⁷ err

336,225

Single-core (0.2 s/sample)

1.2 d

8-core wall

3.5 h

Implied D ≤

37 HTM

How to read: at $∣ S ∣ = 500, 000$ , $t = 0.10$ , Hoeffding gives Pr[ $∣ \overset{μ}{^} - μ ∣$ ≥ $0.10$ ] ≤ 2.78×10^-11. Dropping $C$ from 205 to 20 cuts the required samples by $(205/20)^{2} \approx 105$ × for the same accuracy.

…

Where does 1,541,939 come from?It packs the bound's parameters: starting from

Pr [2 μ \geq 2 \overset{μ}{^} + 0.36] \leq Pr [μ \geq \overset{μ}{^} + 0.18]

and applying the one-sided Hoeffding

Pr [\overset{μ}{^} - μ \leq - t] \leq exp (- 2 t^{2} ∣ S ∣/20 5^{2})

with

t = 0.1

gives the coefficient

20 5^{2} / (2 \cdot 0.01) = 2, 101, 250

; the slightly looser

1, 541, 939

matches the symmetric two-sided form in the paper. Intuition: each extra order of magnitude in

∣ S ∣

hammers the fail probability by

exp (- 1 0^{x} /1.5 \times 1 0^{6})

7. The Human's-Number trick — squeezing 205 down to 20

Theorem 1 with $C = 205$ demands $\sim 3 \times 1 0^{8}$ samples for a 1% error — about a year of single-core compute. The fix: most states actually solve in $\leq 20$ moves; only a vanishing fraction are "far apart".

Definition — "far apart"

For a pair $(u, v)$ , call it far apart if $d (u, v) \geq 21$ , otherwise close.

Let $φ :=$ "at least 0.03% of pairs are far apart". Lemma 6: if $φ$ holds, the probability of 500,000 samples turning up zero far-apart pair is:

(1 - 0.0003)^{500, 000} < 7.02 \times 1 0^{- 66} .

Yet they observed exactly zero. So $φ$ is overwhelmingly false: $\leq 0.03%$ of pairs are far apart, and on the close pairs $d \leq 20$ , hence $C = 20$ . Hoeffding's coefficient drops from $20 5^{2}$ to $2 0^{2}$ , cutting sample size by $\approx 105$ times.

Where the Human's number comes from: 7-step beginner's method

Appendix A presents a generous-but-rigorous beginner's-method bound. Step by step:

White cross

White corners

Edges of the second layer

Yellow cross

Permuting yellow edges

Permuting yellow corners

Orienting yellow corners

Total (Lemma 8 = Human's Number)

265

Comically loose — any beginner averages $< 100$ moves — but as a worst case that you can prove in a single page it's plenty.

After the trick

\overset{μ}{^}_{c l ose} \leq 18.4189

(Lemma 7, 500k samples) +

μ \leq μ_{c l ose} + 0.03% \cdot 205

, so

μ \leq 18.4804

, hence

D < 2 \cdot 18.4804 = 36.9608

; integer ⇒

D \leq 36

. Union-bound total error

\leq 1 0^{- 10}

8. How to sample cube states uniformly

The theorem is proven, but you still need a true uniform sampler. The paper invokes the classical Fundamental Theorem of Cubology:

Theorem 3 — Cubology

If you take all corners + edges off and reattach them, the reassembly is reachable from solved iff (a) corner sign = edge sign (parity); (b) corner twists sum to $0 (mod 3)$ (twist sum); (c) edge flips sum to $0 (mod 2)$ (flip sum).

The three constraints partition $Z_{2} \times Z_{3} \times Z_{2} = 12$ equivalence classes; exactly one is legal. Naive "uniform permute + reject" averages 12 retries. The paper picks the smarter route: after one random reassembly, apply at most three deterministic correctives:

Flip the F-U edge (toggles edge-flip parity)
Swap F-U-L and F-U-R corners (flips corner parity)
Twist the D-U-R corner clockwise (shifts twist sum mod 3)

The 12 invalid operations fibre every state above the legal subgroup; reassemble once + apply the three correctives gives 1 effective sample per pass.

Compare: WCA scramblersThis is identical to what cubing.js / TNoodle do at every WCA competition — pick a random state, solve via Kociemba, reverse the solution as the scramble. The live-sampler section below taps the same code path.

9. Experiment: 500k samples, 5 hours, Kociemba

The paper's experiment ran on a MacBook Pro M3 (36 GB RAM, 16 cores) using the open-source Kociemba implementation efrantar/rob-twophase. Results:

Total samples |S|: $500, 679$
Empirical mean $\overset{μ}{^}$ : $18.3190$
Longest solution: 20 HTM (matches Rokicki's 2010 exact diameter)
Wall time: ≈ 5 hours
Total error probability: $< 1 0^{- 10}$

The exact counts of Figure 5, redrawn with a log-toggle and per-bin hover:

Y axis

Paper

∣ S ∣ = 500, 679

\overset{μ}{^} \approx 18.3190

Merino & Subercaseaux 2024 (500k samples)

Of 500,000 random states, the longest Kociemba solution is exactly 20 HTM — matching Rokicki's proven diameter. Note the knife-edge: depths 18 and 19 alone account for ≈ 86%, while depth 11 appears just once. The paper combines this with Hoeffding to lock in $μ \leq 18.4804$ .

…

10. Run it yourself — in-browser Kociemba

The button below spins up a Web Worker that, in a loop, (i) draws a uniformly random 3×3 state, (ii) solves via Kociemba, (iii) histograms the solution length. Watch $\overset{μ}{^}$ creep toward the paper's $18.3189$ . Your samples are overlaid in green on the histogram above.

Samples |S|

Running μ̂

—

Implied D ≤ (D < 2μ̂+0.36)

—

Hoeffding fail (t=0.1, C=20)

—

How it works: each sample calls cubing.js's randomScrambleForEvent('333') — picks a uniformly random state in a worker, solves it with Kociemba, returns the solution length as a bound on $d (s)$ . cubing.js clamps WCA scrambles to $\geq 18$ moves so the running mean here biases slightly upward; the true $μ$ is ≈ 18.32. ~50–300 ms per sample.

…

11. History of bounds (paper Table 1)

From Thistlethwaite's 52-move bound in 1981 to the 20/20 closure in 2010 — three decades of gap-shrinking. Each cell is one historical (lower, upper) snapshot:

1981

18–52

1990

18–42

1992

18–39

1992

18–37

1995

18–29

1995

20–29

2005

20–28

2006

20–27

2007

20–26

2008

20–25

2008

20–23

2008

20–22

2010

20–20

The three "numbers" side by side

Name	Value	Tightness	Compute to verify	Who can audit	Source
God's	20	exact	35 CPU·yr	must re-run the Google-scale solve	Rokicki 2010
Demigod	36	upper, >10⁻¹⁰ confidence	~5 h MacBook	one laptop overnight	Merino & Subercaseaux 2024
Human's	205	upper, deterministic but loose	pen and paper	anyone who can solve a Rubik's cube	Paper App. A

12. Generalisation — what else gets a demigod number

Two ingredients: (i) vertex transitivity (so $D < 2 μ$ holds); (ii) efficient uniform sampling. Every Cayley graph gives (i) for free; (ii) depends on the group's computable structure. The paper names several candidates:

Puzzle	\|G\|	Known diameter	demigod doable?
2×2	3.67×10⁶	11	BFS finishes the full graph — no need
Pyraminx	933,120	11	BFS suffices
Skewb	3.15×10⁶	11	BFS suffices
Square-1 (face)	2.4×10¹⁴	31	Chen 2017 used 722GB BFS; demigod is the citizen-grade alternative
Megaminx	1.0×10⁶⁸	48–~110 (HTM)	✓ — needs a megaminx solver; D unknown, demigod gives an upper bound
5×5×5	2.83×10⁷⁴	?–~120 (OBTM)	✓ — reduction solvers exist; gap is wide, demigod immediately useful
15-puzzle on a torus	~10¹³	?	✓ — paper explicitly cites this as a target

The 5×5 lower/upper-bound gap is wider than a factor of 2 — exactly where demigod buys the most. So Hirata's 2024 papers (refs [12,13] in the paper) attack the same problem via girth and branching factor — complementary to this approach.

13. A side note on probabilistic vs traditional proof

The paper's closing argument is worth recording: it places demigod-class results on the spectrum of plausibility alongside probabilistic primality tests and machine-assisted proofs (Kepler's conjecture). Whether $N$ is prime is a deterministic fact, but in practice we accept "Miller-Rabin gives error $\leq 2^{- 128}$ " as if it were one.

The paper's closing summary: "if the cube's diameter were actually greater than 36, the probability of observing an empirical mean of ~18.3 and zero $> 20$ samples in 500,000 draws is $< 1 0^{- 10}$ ". Compare this to the chance of someone finding a SHA-256 collision overnight on a 2024 laptop — which is more believable?

Open question (paper poses)"Are there properties of finite mathematical objects that can only be certified efficiently to a high degree of confidence by probabilistic algorithms, but that we never can be certain of through a short proof?"

References

Merino & Subercaseaux — "A Demigod's Number for the Rubik's Cube" (arXiv:2501.00144, 2024) — the subject of this page
Rokicki, Kociemba, Davidson & Dethridge — "The Diameter of the Rubik's Cube Group Is Twenty" — SIAM Review 2014, peer-reviewed God's number = 20
cube20.org — Rokicki team's hub
Kociemba — Two-Phase Algorithm — 1992 original description
efrantar/rob-twophase — the actual solver the paper used
Herman & Pakianathan — "On the distribution of distances in homogeneous compact metric spaces" (2015) — the original of Theorem 2 in the more general metric-space setting
Demaine et al. — "Algorithms for Solving Rubik's Cubes" (ESA 2011, arXiv:1106.5736) — rigorous Θ(N²/log N) for NxN
Mulholland — "Permutation Puzzles: Rubik's Cube" — standard reference for the Fundamental Theorem of Cubology
Paper supplementary materials (anonymous git) — 500k sample data + solution certificates
Wikipedia — Vertex-transitive graph
Wikipedia — Hoeffding's inequality
/math/god — back to the all-WCA-events God's number overview