How I broke AES (Advanced Encryption Standard) – if I did it

by ~~O.J. Simpson~~ Warren D. Smith http://rangevoting.org/

The paper: http://math.temple.edu/~wds/homepage/works.html #100

We describe a new simple but more powerful form of linear cryptanalysis. It appears to break AES-256 (and undoubtably other cryptosystems too, e.g. SKIPJACK).

But the break is "nonconstructive."
Even if this break is bogus (due to the underlying models inadequately approximating the real world) we explain how AES still could contain "trapdoors" which would make cryptanalysis unexpectedly easy for anybody who knew the trapdoor.

The fix: ("fundamental theorem of cryptography"?)

We then discuss how to use the theory of BLECCs to build cryptosystems provably
1. not containing trapdoors of this sort,
2. secure against our strengthened form of linear cryptanalysis,
3. secure against "differential" cryptanalysis,
4. secure against D.J.Bernstein's timing attack.
Using this technique we prove a fundamental theorem: it is possible to thus-encrypt N bits with security (against known attacks) 2^cN, via an circuit Q_N containing ≤cN two-input logic gates and operating in ≤c·logN gate-delays, where Q_N is constructible in polynomial(N) time.

"Nonconstructive break" of AES?? What the heck is that?

We make it plausible* break-algorithm exists – but do not construct it. More precisely: construct parameterized class of algorithms A(p), & make it plausible some values of params p exist, causing A(p) to efficiently break AES.
A(p) & p both have small bit-length.
I.e: If God told you p once, you could use A(p) to break arbitrary AES instances quickly from then on.
Be better if there were no efficient small algorithm to break AES – then still helpless even with hint from God – and this is usual way security goal is worded.

*Asterisk: We do not prove, merely make plausible. Why? Proving security of cryptosystems will be impossible at least until can prove P≠NP. Proving crypto-breaking algorithms work also usually is not feasible:

You can program break, try it out, but that will not be feasible if break-time takes 2¹⁰⁰ steps, thus breaking supposed 2¹²⁸ security; also will not be feasible if nonconstructive break;
(Our approach): You can prove break works in low expected runtime in probabilistic model; but these models are not genuinely true and you just have to hope are true enough.

History of DES & AES, brought to you by your government

IBM & NSA 1974: DES proposed. During proposal, NSA reduces key length to 56 bits.
EFF 1998: DES broken by brute force (≤1 day to break; one-time $250,000 machine cost).
M.Matsui 1993/4: DES broken by intelligence (known plaintext attack, prob. model). Theory verified by programming & testing. About 2⁴⁰ DES evals now suffice. Also the same method can provide ciphertext-only break if have huge amount of ciphertext.
1993: Clipper chip sinks...
2001: J.Daemen & V.Rijmen win NIST competition with AES proposal.
2005: D.J.Bernstein breaks large number of implementations of AES via "cache timing" attack. Attack employs (plaintext, ciphertext, time) triples, deduces key in ≤1 day. DJB argues his attacks should apply versus essentially every cryptosystem employing "S-boxes" & implemented in reasonably efficient and straightforward manner. I.e. almost all proposals ever. But attack always against specific hardware & software and depends on timing info; underlying algorithm unbroken.
2007: AES algorithm broken by W.D.Smith's theoretical "nonconstructive break" (known-plaintext attack, same properties as Matsui). Highly general attack method applies against large number of cryptosystems.

My verdict: AES should be replaced. Does not live up to goals as cryptosystem suitable for all purposes. Bernstein/Smith's attacks devastate that goal in practice/theory. New approaches needed.

BLECCs = Binary Linear Error Correcting Codes (quick review)

[n,k,d] Binary linear code = set of 2^k binary words, each n bits long
Code is closed under mod-2 vector addition ("GF2-linear"), i.e. wordwide XOR.
Such that any two words differ in at least d positions ("Hamming distance≥d")
Equivalently, the min-weight (nonzero) word has weight d. ("Weight"≡number of 1-bits.)
Generated by the k rows of a k×n Boolean "generator matrix."
"Dual code" is the set of 2^n-k words which have dot product zero with words in original code. "Geometrically" is orthogonal subspace hence also linear code.

About AES-256

Secret key cryptosystem, supposedly 2²⁵⁶-secure in all imaginable ways
Encrypts 128-bit plaintext to yield 128-bit ciphertext
14 rounds successively transform the 128 message bits
Each round ≡
1. XOR with next 128-bit chunk of expanded key.
2. 16 parallel invocations of 8-to-8-bit bijective "S-box." (Always same S-box: one 256-byte table).
3. Do some GF2-linear stuff (always same stuff).
Step 2 (Sbox) ≡ the only nonlinear steps.
Employs 1920-bit "expanded key" generated once from 256-bit actual secret key.
1920=15·128. 15=14+1. 128=8·16.
Break will deduce the 1920-bit expanded key from numerous plaintext-ciphertext pairs.

Ingredients of our attack

Matsui's idea: "linear cryptanalysis" (we re-build it, but M's idea)
Combine with theory of BLECCs, introduce "code of the code" concept
The weight w of the min-weight words in Code of the Code is crucial; attack's expected runtime depends exponentially on w
Probabilistic analysis of w for the "worst" code of the code suggests sufficiently small w (highly probably) exists for AES.
The "hint from God" you need to be able to break AES efficiently is: this BLECC (described by generator words) and some of its low weight words.

Linear Cryptanalysis simplified – Matsui ingredients (blackboard-aid)

Unbalance=U ≡ Boolean signal mean value is M with U=2|M-½|; convenient since 0≤U≤1.
Example: a signal that is one 75% of the time and zero 25% has M=0.75 and U=0.5.
A "balanced" (50-50) signal has U=0; a constant bit has unbalance U=1.
Logic Circuits: Any boolean function can be built from 2-input NAND gates...
GF2-linear functions ≡ circuits made of XOR(+), NOT(¬), and constant-bits only.
"Linear approximation" circuits: N-input 1-output Boolean function B is well "approximated" by another A if outputs disagree few (out of the 2^N) times. Equivalently: A+B has large unbalance (and mean value<½). Interested in good & best approximations by GF2-linear functions.
Reversibility lemma: NOT and XOR gates are reversible; "inputs" & "outputs" = artificial labelings.
Noise gates also reversible.
Noise gate mobility lemma.
Noise gate unbalance lemma (O.Rothaus 1976): For best linear approx A of (≤2n)-input Boolean function B, unbalance(A+B)≥2^-n.
Piling up lemma: For N noise gates in series with unbalances U₁, U₂, ..., U_N, net effect is U=∏U_j if statist'ly independent; if all noise gates have change-prob<½ then so does whole shebang. (If pos'ly correlated noises, then U≥∏U_j.)
Statistics: to learn bit (distorted by unbalance=U noise with change-prob<½) it suffices to do c·U^-2 experiments to get confidence exponentially(-c) near 100%.

Linear Cryptanalysis – Algorithm

Write down the equivalent circuit of the cryptosystem.
Replace all nonlinear S-boxes by "noise"+{best linear approx circuit}.
Use "noise gate mobility" to slide all "noise gates" along wires until reach key-bit inputs.
Use "reversibility" – view plaintext & ciphertext bits as "inputs," key bits + "noise" as "outputs".
From plaintext-ciphertext pairs, deduce linear relations (over GF2) satisfied by the K_n+R_n.
Characteristic vectors of the linear equation LHSs generate "code of the code" BLECC by vector XORing. The min-weight binary vectors in this code (fewest #1s) correspond to the sparsest linear relations among the K_n+R_n.
The sparsest relations involve the fewest number of "noise"-bits R_n.
Use "piling-up lemma" & statistics: deduce sparse linear relations among the K_n by averaging out the noise using many plaintext-ciphertext pairs. (If #bits involved in relation=weight=w, noise-unbalance=U, then need order U^-2w pairs to get high confidence.)
Deduced enough GF2 linear relations among the key-bits K_n (using, perhaps, many different "codes of the code")? And they are lin-independent enough? Now deduce key itself by Gaussian elimination over GF2.
Declare victory. Code is cracked – we know the (extended) key.

"Code of the code," & Boolean "noise" that actually is deterministic

The "noise" bits R_n for good cryptosystems act random, but actually are deterministic (1 if linear approx circuit for Sbox got wrong value disagreeing with true Sbox's bit-value; 0 if got it right). Hence they can be canceled out in linear relations over GF2, by row operations. (I.e. R+R=0 is true if R is deterministic bit, would not have been true for repeated instances of genuine "noise.")

Some linear relations like

K1+R1  +  K3+R3  +  K7+R7            = 1    (weight=3)
           K3+R3  +  K7+R7  +  K9+R9 = 0    (weight=3)

imply by row-ops others such as

K1+R1  +                        K9+R9 = 1    (weight=2)

The left hand sides can be viewed as generator vectors

1 0 1 0 0 0 1 0 0              (weight=3)
0 0 1 0 0 0 1 0 1              (weight=3)

of a BLECC, and the min-weight word in the BLECC is

1 0 0 0 0 0 0 0 1              (weight=2)

For AES we get a [1920, 128, w] binary linear code. Different sets of lin-approximation circuits ⇒ different codes. The minimum distance w is crucial; governs runtime for crack.

Combining GF2-linear relations on key bits to deduce the key

Each plaintext-ciphertext pair yields 128 linear relations among key+noise bits using any given set of Sbox lin-approximation circuits. That's 128 generator words for a BLECC with 2¹²⁸ codewords, each 1920 bits long. Some of those 2¹²⁸ binary words hopefully have small weight, i.e. correspond to sparse linear relations involving few "noise bits." We need 1920 (or more) independent linear relations, with noise averaged out of each (and it is only feasibly-fast to average-out noise if lin-reln is sparse) to deduce entire 1920-bit expanded key.

What weight do we expect? Model as "random code." Other randomness assumptions.

Good news:

"Noise" bits really do (& must) look random for good cryptosystems.
Indeed, good crypto-designers make sure of that with stat'l tests.
Noise bits tend if anything to be positively correlated (one approx'n bit-error makes it more likely a second happens) because AES S-boxes are bijective... this tends to make piling-up lemma work better than for true random noise, empirically need fewer plaintext-ciphertext pairs to crack.
No explicit BLECC family has ever been found that is better than (or that even equals) random codes. Big open problem in coding theory. So I doubt the AES code-of-code is better than random codes. Probably worse, i.e. probably min-weight(AES c.o.c.) < min-weight(random code with same parameters).
AES Sbox has 8 "inputs" & 8 "outputs." Fact (found by my computer): Each linear combo of outputs has ≥5 different best-lin-approx functions, each of which gets it right 144/256 times for unbalance U≥1/8.
So there are at least 5¹⁷⁹²≈2⁴¹⁶¹ ways cryptanalyst could replace the 224 Sboxes in AES-encoder by best-lin-approximations. I.e. there are at least this many different AES "codes of the code" cryptanalyst could consider using; naturally he chooses the "worst" of all these codes (i.e. with smallest min-distance) to get easiest cracking.

The crux (but this one is more dubious) conjecture:

All of these enormous number ≥2⁴¹⁶¹ of different BLECCs, all act "enough like" independently chosen random [1920, 128, ?] BLECCs.

If so: then (trivially since 2⁴¹⁶¹>>>2¹⁹²⁰) tons of them will exist with tiny min-weights w, and their min-weight codewords will have tons of linear independence, and AES is dead meat.

Why is this a "nonconstructive" crack?

I do not know (but God does know) the "hint" ≡

which exact combination of lin-approx circuits we should use for each of the AES S-boxes, to get a code-of-the-code, which magically happens to have small min-distance w.
I don't even know what that min-distance w is, and what codeword achieves that min-weight.
I'd really have to know this not just for one code-of-code and min-weight codeword, but for at least 15 codes and at least 1920 min-weight (or low-weight) codewords.

But if ever did know that hint, then I could, from then on, crack arbitrary AES instances by rapidly deducing expanded-keys from order 64^w sample plaintext-ciphertext pairs.

And I would know the hint if I did a huge one-time pre-computation – or if God told me – or if I were the AES-designer and I'd started with this "trapdoor" knowledge – and my knowledge then could be written down in at most a few Mbytes

Trapdoor?

A cryptosystem which is easy for its designers to break, but hard for everybody else, is said to have a "trapdoor."

If I give you a generator matrix for a BLECC, it can be very hard for you to know if the BLECC contains a low-weight word, or find it! (NP-hard!) But it would be very easy for me to give you that word, if I felt like revealing it. I.e. if the AES designers have a trapdoor,

It might be very hard for us to find it, or to know whether it exists.
If trapdoor exists then would easy for AES designers to convince us they have it.
If trapdoor does not exist, then I currently see no feasible way for AES designers to convince us of that.

How can we build cryptosystems immune to my (& Bernstein's...) nasty attacks? BLECC theory to the rescue!

A way to build a big (but secure) S-box.

2N input bits. K output bits (1≤K<N).

Compute N "intermediate bits" by taking AND (or OR or NAND or NOR...) of disjoint input-bit pairs.
Compute K GF2-linear combinations of these N bits according to the rows of K×N Boolean generator matrix for [N,K,D] binary linear code.
Output them.

Security Theorem:

Totally immune to "differential cryptanalysis."
Totally immune to Bernstein "timing attacks" since always same runtime regardless of data.
Fast (can do with wordwide bitshifting, XORing, and ANDing, and ORing) if "cyclic" or "multicyclic" binary code. Also asymptotically fast if use "linear time information theory" BLECC constructions.
Resistant (but not immune) to our "improved linear" cryptanalysis (here) since U<2^-D for any linear-approximant to any linear combination of output bits.

Proof: Any nonconstant GF2-linear combo of output bits is a GF2-linear combo of the intermediate bits with weight≥D; every intermediate bit has unbalance=1/2 and all are independent; "piling up lemma" does rest.

"Fundamental theorem of cryptography"

There is an ∞family of polynomial(N)-time constructible logic-circuits Q_N, where Q_N encrypts 5N plaintext bits into 5N ciphertext bits, in O(N) bit-operations requiring O(logN) gate-delays, using a 46N-bit enlarged secret key, which achieves security level ≥2^2D where D>0.04N against all the attacks we've mentioned (and conjecturally we get security against all attacks for some appropriate values of the constants 5, 46, 0.04).

Proof employs the secure S-box above and "linear time info theory" (Justesen, Spielman, Zemor, Barg, etc). Argue even if attacker gets plaintext-ciphertext pairs that come with, as a free extra bonus, all the internal bits at the outputs of all Sboxes inside the encryptor, then still secure.

Is this practical? Sample design.

185749DE CF747EFF 4749F011 8BE9914C 0ADC7233
3B272FC6 EEB7F0FB 7604D328 59457C7E C54C7B2E
86905       (in hexadecimal)

is the generator of a [511, 175, 95] cyclic (BCH) code. Extend it with overall parity check bit to get [512, 175, 96] code. That yields 1024-input, 175-output, Sbox with all unbalances≤2^-96 and ≤8 gate delays between each input & output. (Fast in software too since program using wordwide shifts and XORs since cyclic code.) Use in 21 rounds, each round alters 175 message-bits and XORs with 1195 expanded-key-bits, to get cryptosystem for encrypting 1195 bits at once and presumed security level≥2¹⁹².

Other stuff (in the paper but not in this talk)

Big tables of useful-for-crypto binary codes
More theorems... numerical data on random codes... the dual binary code also is useful for getting other kinds of security guarantees...
Another attack ("bounded polynomial degree attack") which actually breaks the "secure Sbox"-based cryptosystems described above if they are implemented too simply (but we have a fancier way to use these Sboxes that avoids this attack, so OK).

New (upcoming paper!) results by W.D. Smith

Have "universal" secret-key and public-key cryptosystems. These are provably as hard to crack (up to polynomial factors – also will work versus crackers equipped with "quantum computers") as any secret-key (or public-key) crytosystem. The basic idea to build the universal secret-key cryptosystem is you compose every secret-key cryptosystem with algorithm-length≤L. (Make L grow slowly with N.) The resulting function-composition "chain" is at least as strong as its "strongest link." (Warning: actually tougher than it just sounded to make this work, need to be careful on the definitions and the construction. But I've done it so it can be done.)