Proofs as Programs

From Curry-Howard to a Deal with the Devil and the Future of Mathematics

Jun 01, 2026

Dependency Graph for Liquid Tensor Experiment - Xenaproject

A short disclaimer before we begin. I am very much a student of this subject, and this essay reflects my current understanding—including the edges where that understanding is still forming. Mathematicians talk a lot about intuition, yet their texts often feel designed to prioritize rigor over teaching. I believe there is value in an essay that could conceivably be read in an hour, serving as a companion before, during, or after the formal learning process. Finally, writing this was an exercise in the Feynman technique; I built this essay as a personal tool to make myself understand the material. Interspersed within this essay is some of my cognitive scaffolding that a more experienced student might have torn down. Perhaps it will be of use.

I. The History of the Curry-Howard Correspondence

(https://en.wikipedia.org/wiki/Curry%E2%80%93Howard*correspondence)

The beginnings of the Curry–Howard correspondence lie in several observations:

In 1934, Curry observes that the types of the combinators could be seen as axiom-schemes for intuitionistic implicational logic.

In 1958, he observes that a certain kind of proof system, referred to as Hilbert-style deduction systems, coincides on some fragment with the typed fragment of a standard model of computation known as combinatory logic.

In 1969 Howard observes that another, more high-level proof system, referred to as natural deduction, can be directly interpreted in its intuitionistic version as a typed variant of the model of computation known as lambda calculus.

I-1. Combinators - SKI

Well that is a lot of words that I don’t know much about. Lets break it down.

Curry observes that the types of the combinators could be seen as axiom-schemes for intuitionistic implicational logic.

Combinators : Functions that move data around. They have no free variables and thus can only be aware of what is handed to them.
Types : In programming, a type tells you what kind of data you are working with (e.g., an integer, a string, or a function that turns an integer into a string)
Intuitionistic Logic : Logic that requires you to build an example to prove something is true. Most notably you can’t prove by contradiction (apply the Law of Excluded Middle)

Curry noticed if you look at the types of those basic programming building blocks, they look exactly like the axioms used by mathematicians to build logical proofs.

This still feels to free floating. Let’s see what ‘types of the combinators could be seen as axiom-schemes’ actually means

We begin with the Identity Combinator, the I Combinator. It takes x and returns x

\(A \rightarrow A\)

Next we look at the Constant Combinator, the K Combinator that takes a first input, then a second input. It ignores the second input and returns only the first input.

\(A \rightarrow (B \rightarrow A)\)

You can take a moment to stop and check that this indeed is the correct form of the function. Instead of taking both inputs as the same time, after taking the first input we return a function that takes a second input.

Now constructing the S combinator would require a bit of work. But I will state it’s form here.

\((A \rightarrow (B \rightarrow C)) \rightarrow ((A\rightarrow B) \rightarrow (A \rightarrow C))\)

To begin breaking down the S combinator , notice that for all its complexity you can begin reading it from the middle arrow. It takes a function and yields another function, where the latter distributes A to B and C.

If we read the arrows as implies we recover Hilbert’s Axiom System for Propositional Logic, or a fragment of it as we can only recreate the parts that can be written with arrows (pure implication).

This clarifies what the Implicational part of Intuitionistic Implicational logic means. It’s the part with arrows.

Nice.

I-2. SKK = I

Moving from 1934 to 1958.

Curry observes that a certain kind of proof system, referred to as Hilbert-style deduction systems, coincides on some fragment with the typed fragment of a standard model of computation known as combinatory logic.

Hilbert-style deduction system: A way of writing out mathematical proofs step-by-step. Notorious for being clunky to write.
Combinatory Logic: A mathematical model of how computers calculate things built using combinators.

Curry realized the structure of a Hilbert-style proof was identical to the structure of a typed program written in combinatory logic.

The most famous example of the Curry-Howard correspondence at this level is the Identity function (proving that A implies A).

Both the logician and the programmer are restricted to using only two tools. We have however, already defined these.

The K Combinator / Axiom 1:
\(A \rightarrow (B \rightarrow A)\)
The S Combinator / Axiom 2:
\((A \rightarrow (B \rightarrow C)) \rightarrow ((A \rightarrow B) \rightarrow (A \rightarrow C))\)

In a Hilbert system, the Logician can write down axioms and use Modus Ponens (If you have X → Y and you have X, you can deduce Y)

Write Axiom 2—Substitute C = A, B = (B → A)

\((A \rightarrow ((B \rightarrow A) \rightarrow A)) \rightarrow ((A \rightarrow (B \rightarrow A)) \rightarrow (A \rightarrow A))\)

Write Axiom 1—Substitute B = (B → A)

\(A \rightarrow ((B \rightarrow A) \rightarrow A)\)

Use Modus Ponens to Write Formula 3. Since the LHS of Axiom 2 is Axiom 1

\((A \rightarrow (B \rightarrow A)) \rightarrow (A \rightarrow A)\)

Write Axiom 1

\(A \rightarrow (B \rightarrow A)\)

Use Modus Ponens to Write Formula 4. Since the LHS of Formula 3 is Axiom 1

\(A \rightarrow A\)

Since Formula 4 is the Identity Function we are done.

The programmer is asked to write a program that takes an input and returns it unmodified (Type: A → A).

They write the program S K K.

The only rule the compiler uses is Function Application. If a function expects input X and returns Y, and you hand it X, the result is Y.

Let’s see it compile program S K K

The compiler sees S. It expects a specific function shape

\((A \rightarrow (B \rightarrow C)) \rightarrow ((A \rightarrow B) \rightarrow (A \rightarrow C))\)

The compiler sees K has type:

\(A \rightarrow (B \rightarrow A)\)

The compiler checks if K fits into the first slot of S. It does (It forces C to equal A). The compiler evaluates (S K)

\((A \rightarrow B) \rightarrow (A \rightarrow A)\)

The compiler sees K has type:

\(A \rightarrow (B \rightarrow A)\)

The compiler checks if K fits into the first slot of (S K). It does (It forces B to equal B → A). The compiler evaluates (S K K).

\(A \rightarrow A\)

The Program compiles successfully.

The key here is that correspondence between two actions. Writing down an Axiom is the same as declaring a Combinator. Modus Ponens is the same as Function Application.

Why bother deriving the identity function?

To show K and S or Axiom 1 and Axiom 2 can prove it rather than having it be some Axiom 0.

Schönfinkel and Curry proved that the S and K combinators, in their untyped form, are computationally universal—meaning they can calculate anything a modern computer can.

I = SKK can be thought as a simplest possible example of this.

When wikipedia says ‘coincides on some fragment with the typed fragment’ what does it mean specifically? That he literally found fragments of paper laying about?

In this case fragment means subset.

We are still missing tools in propositional logic like And / Or / Not. We are still on the Implicational Fragment we found ourselves in 1934.

On the computational side, we are looking at combinators that have strictly defined Type signatures.

In our derivation of I = SKK we did some substitution. How do know we didn’t accidentally violate our strictly defined typing?

In programming you are probably quite familiar with concrete types likes integer or string.

Our Combinators are not using concrete types. They are using Type Variables.

In formal logic there is a for all symbol sitting in front of these axioms

Because they are variables, you are legally allowed to plug anything into them—including other complex types (like another function)—under one condition.

You can substitute a type variable with anything you want, as long as you replace every instance of that variable exactly the same way within that scope.

We must apply Consistent Substitution at each step.

If we were to abandon these strict typing rules, we get examples like M = SII and Omega = M M that create evaluation that never conclude.

I-3. STLC

Moving from 1958 to 1969.

Howard observes that another, more high-level proof system, referred to as natural deduction, can be directly interpreted in its intuitionistic version as a typed variant of the model of computation known as lambda calculus.

Natural deduction: A later, more intuitive way of writing mathematical proofs that closely matches how humans actually think and argue logically.
Lambda calculus: A more powerful and expressive model of computation invented by Alonzo Church. (This is the foundation of modern functional programming languages, like Haskell, LISP, or even parts of JavaScript and Python).

Howard took Curry’s observations to the next level. Curry showed that variable-free combinators mapped to Hilbert proofs. Howard showed that if a mathematician writes a proof using the assumption-based style of natural deduction, it translates into a computer program written in Simply Typed Lambda Calculus—which uses named variables just like we do in modern code.

Does it feel like by now we’re seeing that same thing over and over again? Well that very much is the spirit of the Curry-Howard correspondence. You don’t “prove” it. It’s not a theorem or a conjecture.

It is a family of discoveries. Each instance of Curry-Howard is an isomorphism (a structural equivalence) between two separate games. You formally define the syntax and inference rules of a logical system (e.g., Natural Deduction). You formally define the syntax and typing rules of a computational system (e.g., the Simply Typed Lambda Calculus). You show that every valid move in Game A maps exactly to a valid move in Game B, and vice versa.

This the start of modern type theory, leading to a new class of formal systems designed to act both as a proof system and as a typed programming language based on functional programming.

Simply Typed Lambda Calculus

There are only three rules to the syntax:

Variables: x, y, z (Representing data).
Abstraction: λx:A. M (Creating a function. The λ means “this is a function”, x is the input parameter, A is the type of the input parameter and M is the body.).
Application: M N (Calling a function. You are plugging the input N into the function M).

Let’s write the Identity function (Type: A → A).

Combinators: SKK
Lambda Calculus: λx:A. x (Take x, return x).

The Example of the K Combinator

K Combinator : A → (B → A)

Step 1:

Logician: Let’s assume A is true. I’ll label this assumption x.
Programmer: I will declare an input variable x of type A.
Code: x : A

Step 2:

Logician: Now, let’s make a second, nested assumption. Let’s assume B is true. I’ll label this assumption y.
Programmer: Inside my scope, I’ll declare a second input variable y of type B.
Code: y : B

Step 3:

Logician: Looking at my assumptions, I currently have A (thanks to assumption x).
Programmer: I want to return the value x.
Code: return x

Step 4:

Logician: I am going to discharge my second assumption (y). Since assuming B allowed me to reach A, I have logically built the implication B → A.
Programmer: I am going to wrap up my inner scope into a function. It takes y:B and returns x:A.
Code: λy:B. x (This function has type B → A)

Step 5:

Logician: I am going to discharge my first assumption (x). Since assuming A allowed me to reach (B → A), I have logically built the final implication A → (B → A).
Programmer: I am going to wrap my outer scope into a function. It takes x:A and returns the inner function.
Code: λx:A. (λy:B. x)

II. Modern Type Theory

Once Howard published his paper, the floodgates opened. People realized that if they added more powerful features to their programming languages, they could prove more complex mathematical theorems.

In 1991, Henk Barendregt created a visualization to categorize this explosion of theories called the Lambda Cube. At the bottom-back-left corner (the origin), you have Howard’s Simply Typed Lambda Calculus. From there, you can move along three distinct axes to add computational/logical power:

Axis 1: Terms depending on Types (Polymorphism)

Programming Concept: Generics (e.g., generic identity function: id<T>(x: T): T).
Logic Concept: Universal Quantification (∀). “For all propositions X, X → X”.
The System: System F.

Axis 2: Types depending on Types (Type Operators)

Programming Concept: Type constructors (e.g., mapping a Type A to a Type List<A>).
Logic Concept: Higher-Order Propositional Logic.
The System: System Fω.

Axis 3: Types depending on Terms (Dependent Types)

Programming Concept: A type that changes based on a value. (e.g., An array of exactly length N. If N=3, it’s a different type than if N=4).
Logic Concept: Predicate Logic. “For all even numbers N, N+1 is odd.” You can’t express this without dependent types.
The System: Martin-Löf’s Intuitionistic Type Theory (MLTT).

If you travel to the far, top-front-right corner of the cube, you combine all three axes. This is Thierry Coquand’s Calculus of Constructions (CoC). This specific mathematical system is the underlying engine for Coq, and heavily influenced Lean (which uses a slightly modified version called the Calculus of Inductive Constructions).

II-0. The Order of Logic

In logic, the order simply dictates the domain of quantification. It tells you exactly what kinds of things you are allowed to bind with the quantifiers ∀ and ∃. Every time you move up an order, you are granting your quantifiers permission to range over a more abstract layer of mathematical objects.

Zero-Order Logic (otherwise called Propositional Logic): What you quantify over: nothing. This is logic where you are not allowed to quantify over anything at all.
- Example: You can state that “If it rains, the ground is wet,” but you cannot say “For every day it rains...”

To generate the hierarchy of higher-order logics, you apply a simple recursive rule: Order n allows you to quantify over the entities defined in Order n-1.

First-Order Logic (FOL): What you quantify over: things in a domain (e.g., numbers, people, nodes in a graph).
- Example: ∀x ∃y (x < y). (For every number x, there is a larger number y).
Second-Order Logic (SOL): What you quantify over: Sets, properties, or relations of individuals (entities from First-Order).
- Example: ∀P (P(x) ∨ ¬P(x)). (For every property P, an individual x either has that property or does not).
Third-Order Logic: What you quantify over: Properties of properties, or sets of sets.
- Example: Quantifying over a concept like transitivity. Transitivity is a property of a binary relation. Since a binary relation is a set of ordered pairs of individuals (Second Order Logic), a property of that relation belongs to Third-Order Logic.

The Simple Typed Lambda Calculus we examined is an example of Zero-Order Logic due to lacking quantifiers entirely even if it has variables. To be more precise it is an example of Zero-Order Intuitionistic Propositional Logic.

Intuitionistic Propositional? Why do people call it Intuitionistic rather than the far more clear constructive?

Path Dependence mostly. The term Intuitionistic comes from L.E.J. Brouwer. Constructive mathematics has not become a broad umbrella term for doing math with an algorithmic requirement. Logicians keep the term Intuitionistic Logic to distinguish Heyting’s (Brouwer’s student) specific formulation from other constructive systems.

Is there any Mathematical relevant Non-Propositional Logic that doesn’t secretly smuggle Propositional Logic in? Why not just call it Logic?

Equational Logic serves as an example. In Equational Logic, you do not have ∧, ∨, or → logical connectives. The only propositions you are allowed to make are identities with the equality symbol (=).

The only statements are identities like x · (y · z) = (x · y) · z.
The only rules of inference are reflexivity, symmetry, transitivity, and the ability to substitute equals for equals.

People have done a great deal of work in this rather limited space.

II-1. Bottom-Back-Left Corner

Simply Typed Lambda Calculus sits at the bottom-back-left corner of the Lambda Cube, it is simple because it lacks three distinct dimensions of expressive power.

No Polymorphism (Axis 1): You cannot quantify over Type Variables. In a simply typed language, if you want a function that reverses a list of Integers, you have to write a specific reverseInts function. If you want to reverse a list of Strings, you have to write a completely separate reverseStrings function. You cannot write a generic reverse<T> function because STLC doesn’t allow “For All” types.
No Type Operators (Axis 2): You cannot have a type that takes another type as a parameter to build a new type (like the concept of a List waiting for an Int to become a List<Int>).
No Dependent Types (Axis 3): Your types cannot depend on runtime values. You cannot express a type like ‘an Array of length N’ because STLC has no way to let the integer N influence the type system.

Propositional logic is the simplest form of logic. You have basic statements (A, B) and ways to connect them (And, Or, Implies). It lacks Quantifiers.

You can say: “If Socrates is a man, then Socrates is mortal” (A → B).
But you cannot say: “For all X, if X is a man, then X is mortal” (∀x. M(x) → D(x)).

To say ‘For all Types’, you need Second-Order Logic (Axis 1: Polymorphism). But to say ‘For all X’ where X is a specific entity like Socrates, you need First-Order Predicate Logic (Axis 3: Dependent Types).

Let’s look at a modern example of this in action. Using the syntax of Lean 4. Prove the logical theorem that A ∧ B ⇒ B ∧ A (If A and B are true, then B and A are true).

We declare a theorem.

“Given propositions A and B, and a hypothesis h that A AND B is true, prove B AND A”

Theorem swap_logic (A B : Prop) (h : A ∧ B) : B ∧ A :=

The proof: We extract the right side of h, then the left side of h.

⟨h.right, h.left⟩

We declare a function.

“Given data types A and B, and a variable pair containing an A and a B, return a B and an A”

def swap_data (A B : Type) (pair : A × B) : B × A :=

The function body: We extract the second item, then the first item.

(pair.2, pair.1)

So a formula is type. A proof is a term. Formula is true if it has a proof. Type has a term?

In type theory, this is called Inhabitation. If a type has at least one valid program (term) that can be written for it, the type is inhabited, and the corresponding logical theorem is proven.

Lean 4 is built on the Calculus of Inductive Constructions (CIC), which sits at the pinnacle of the Lambda Cube (combining all three axes, plus inductive types).

Because pure STLC lacks Axis 1 (Polymorphism), you actually cannot write a generic swap_data function in strict STLC. By writing (A B : Type), you are passing types as variables. This is the definition of polymorphism (∀A, B).

In STLC, you could not prove the general theorem. You would have to write specific proofs for specific propositions:

swap_Int_String (pair : Int × String) : String × Int
swap_Bool_Float (pair : Bool × Float) : Float × Bool
And infinitely many other examples.

II-2. Deal with the Devil

For decades, the Curry-Howard correspondence only worked for Intuitionistic Logic (the logic where you must construct a proof, and you aren’t allowed to say “This is true because the opposite is impossible”).

It wasn’t until 1990 that Timothy Griffin realized that the Law of Excluded Middle corresponds to programming feature called Continuations (specifically call/cc in Scheme/LISP, or try/catch exception handling in modern languages) that the Curry-Howard correspondence was extended to include Classical Logic.

The Deal with the Devil Analogy:

(famously coined by Philip Wadler)

Imagine the Devil demands you give him an instance of type P ∨ ¬P.

In type theory, ¬P is defined as a function that takes P and returns a contradiction (an impossible crash): P → ⊥.
You don’t have a P. So, you give the Devil the right-hand side: ¬P.
You hand the Devil a time-traveling function. You say: “I don’t have a P. But if you ever find a P and try to feed it into this function to cause a crash, this function will rewind the universe back to this conversation, and hand you the P you just found instead.”

Is this analogy helpful? Personally I prefer returning to my earlier observation of the Curry-Howard Correspondence as an isomorphism (a structural equivalence) between two separate games. call/cc then becomes the right rule to create to make the two systems Classical Logic and Programming equivalent.

What is the Impossible Crash (⊥)?

In Type Theory, we map logical concepts to data types.

True (⊤) maps to the Unit type. It is a type with exactly one basic, empty value (like void in C, () in Rust, or undefined in TypeScript). It is trivially easy to create.
False (⊥, called “Bottom”) maps to the Empty type. It is a data type that has no valid values. It is completely uninhabited. In languages like Rust, this is the ! (Never) type. In TypeScript, it is the never type.

Because the never type has no values, it is impossible to write a function that actually returns one. So, if you define ¬P as a function P → never, you are writing a function that accepts a P, but promises to return a value that doesn’t exist.

How does a computer handle returning a value that doesn’t exist? It can’t. If the function is ever actually executed, it must abruptly halt the program’s normal flow. It has to throw an uncatchable exception, loop infinitely, or violently crash the process (panic! in Rust, or process.exit()).

Therefore, in logic, ¬P means “If you assume P is true, you reach a contradiction.” In programming, P → never means “If you pass me a P, I will crash your program.”

Proving Double Negative Elimination

Showing the inhabitation of the following Type:

((P ⇒ ⊥) ⇒ ⊥) ⇒ P

Scheme

(define (proof-by-contradiction f)
  (call/cc
    (lambda (escape-hatch)
      (f escape-hatch))))

Here f represents the left side of the implication, and the return value of call/cc represents the final P.

Consider then how the program executes the logical proof:

We feed the premise (the logical assumption (P ⇒ ⊥) ⇒ ⊥) into our proof block.
call/cc captures the current execution context and passes it into f.
f has the type: (P ⇒ ⊥) ⇒ ⊥. “If you give me a function that turns a P into a ⊥, I promise to successfully execute and give you back a ⊥.”
⊥ is the Empty type. It has no values. It cannot be constructed.
The only way f can return ⊥ is through the escape-hatch (which has the type P ⇒ ⊥). Which means a valid P has to be presented to it as an argument.
However the moment f passes P to the escape-hatch, call/cc triggers. Returning P and discarding the execution context of f. Much like how one would discard all the work downstream of an assumed falsehood in the proof by contradiction.

How does f discover a valid P? Can’t it be kept waiting forever?

In a standard, Turing-complete programming language (like Scheme, Rust, or Python), f could simply enter an infinite while(true) loop.

In programming, an infinite loop acts as a bottom value—it satisfies any return type because the function simply never reaches the return statement. If f loops forever, the program hangs, and we never get our P.

In the world of Formal Logic and Type Theory, infinite loops are strictly forbidden.

These logical systems enforce a property called Strong Normalization. This is a mathematical guarantee that all programs must eventually halt in a finite number of steps. There is no Turing-completeness, and there are no infinite loops.

Because f is mathematically cornered by these rules:

It must halt.
It must return a ⊥.
It cannot invent a ⊥ from nothing.

It is guaranteed that f cannot wait forever. It is forced to eventually construct a P and trigger our continuation. By forbidding infinite loops and proving that the Empty type cannot be instantiated, the type system forces the universe to cough up a P.

Native intuitionistic proof assistants like Lean, Coq, and Agda cannot compile this program. Because their core calculi are strictly constructive, they treat classical Double Negation Elimination (or the Law of Excluded Middle) as an axiom rather than an executable function. (c.f. Parigot’s λμ-calculus—a structural extension designed to control flow jumps while preserving Strong Normalization. Think Multi Threading, Throw / Catch)

Strong Normalization in Practice

Proof assistants enforce SN primarily through two mechanisms:

Totality Requirements: Every function must cover all possible inputs. You cannot leave a pattern-match blank or fall through.
Strict Structural Recursion: You are not allowed to use while(true). If a function calls itself recursively, the compiler’s termination checker must prove that the argument passed into the recursive call is a strictly smaller sub-piece of the original input.

Does this solve the problem? Not at all.

Algorithmic Explosion

The Ackermann Function is the classical example of a finite but impossible to compute function.

A(0, n) = n + 1
A(m, 0) = A(m - 1, 1)
A(m, n) = A(m - 1, A(m, n - 1))

Because m and n are strictly decreasing across function calls, a proof assistant will happily accept this as a valid, strongly normalizing program.

Data Representation Explosion

Consider defining the natural numbers inductively using Peano Axioms: Zero and Succ(n).

If you write a strongly normalizing function to calculate 50! (50 factorial) using Peano arithmetic, the system will attempt to construct a nested chain of Succ data constructors that is 3.04 × 10⁶⁴ layers deep. (c.f. Binary representations in proof assistants)

Type-Checking Hangs

In a system with dependent types (Axis 3 of the Lambda Cube), types can contain executing code. For example, you might have a type Vector(A(4, 2)), which is a list whose length is determined by the Ackermann function.

If you try to pass that vector to a function expecting Vector(X), the compiler must check if X = A(4, 2). To verify this, the compiler’s type-checker must mentally evaluate (normalize) the terms.

Because the logic is Strongly Normalizing, the compiler is programmed with the assumption: “I can safely evaluate any term to its conclusion to see if they match, because I know it will never infinite loop.” (c.f. Implicit Computational Complexity)

II-3. Logical Systems and Their Differences

Differences between Classical and Constructive Logic

In Classical Set Theory (ZFC), the Axiom of Choice (AoC) is a somewhat controversial assumption. It says that if you have an infinite number of buckets, you can pick one item from each bucket, even if there is no mathematical rule defining how to pick them.

In Constructive Logic and Type Theory (specifically Martin-Löf Type Theory), the Axiom of Choice is trivially true—it is a theorem you can prove, not an axiom you have to assume.

Why? Because in type theory, if you claim that “for every bucket X, there exists an item Y”, the only way you are allowed to make that claim is if you have already written the computational function that calculates how to extract Y from X. Since you already wrote the extraction program to make the claim, the Choice is already computed.

Universe A (Classical Set Theory/Topos Theory): Here, propositions are just abstract truth values. The Axiom of Choice says you can pick elements from sets without defining how. Because you are pulling items out of abstract truth-sets, Diaconescu proved if you accept the Axiom of Choice in classical mathematics, you are automatically forced to accept the Law of Excluded Middle (P ∨ ¬P).
Universe B (Martin-Löf Type Theory / Constructive Math): Here, “Propositions are Types.” The phrasing of the Axiom of Choice changes. It becomes “Type-Theoretic Choice.” In this universe, to even state that a bucket has items, you must have already written the algorithm to generate them. Because you are dealing with literal algorithms, not abstract sets, Choice doesn’t require any magic. Therefore, in Type Theory, Choice does not imply Excluded Middle.

Dependent Pairs or Sigma Types

In classical math, when you say “There exists a number greater than 10” (∃x > 10), you are just stating an abstract truth. You aren’t obligated to tell me which number it is.

In constructive Type Theory, abstract truth is illegal. To prove ∃x, you must actually build a data structure. This structure is called a dependent pair (or Σ-type):

The Witness: The actual, literal data (e.g., the number 11).
The Proof: The logical verification that the data meets the condition (a proof that 11 > 10).

Because the envelope is transparent, the witness is openly exposed. Any other function in your program can look inside the envelope, see the number 11, and use it for further calculations.

As Type Theory evolved into Homotopy Type Theory (HoTT), mathematicians realized that having completely transparent envelopes all the time causes problems. Sometimes, you want to treat ∃x purely as a logical statement of truth, without caring which specific witness is inside. You might have multiple different witnesses (like 11, 12, and 13 all being greater than 10), but you want the proof to be treated as a single, uniform concept of True.

To do this, HoTT introduces Propositional Truncation.

You take the transparent envelope (the Σ-type) and you squash it. It becomes an opaque, locked box denoted mathematically as ‖Σx‖.

When you squash a type:

You still mathematically know a valid witness is inside.
The compiler forbids you from ever extracting the witness.

What happens if you try to apply the Axiom of Choice to these squashed, opaque boxes? Well you can’t. Unless you want to find yourself back in Classical Logic.

Summary: The Curry-Howard Dictionary

II-4. Categories and Homotopies

Curry-Howard-Lambek correspondence.

Lambek discovered both logic and computation are specific manifestations of abstract algebraic structures called Cartesian Closed Categories.

He proved a three-way isomorphism:

Intuitionistic Logic (The Logic System)
Simply Typed Lambda Calculus (The Computational System)
Cartesian Closed Categories (The Category Theory System)

In any Category, you have Objects and Morphisms (arrows between objects). Lambek proved that for Cartesian Closed Categories, if you label the Objects as Propositions and Morphisms as Proofs, you get Intuitionistic Logic. If you label the Objects as Types and Morphisms as Programs, you get the Lambda Calculus.

What makes a category Cartesian Closed? It must have three specific features. These map to our logical and computational tools:

A Terminal Object: An object that maps to the logical True (⊤), or the Unit type in programming.
Cartesian Products: A way to combine two objects (A × B). This maps to the logical AND (A ∧ B), and the computational Tuple/Pair.
Exponentials (Closure): An object representing morphisms from A to B (written as B^A). This maps to logical Implication (A → B), and computational Functions.

Why is A → B written as B to the power of A?

If type A has 2 possible values (a boolean), and type B has 3 possible values, the total number of unique functions you can write that take an A and return a B is exactly 3² = 9. This logic is the source of the notation.

Homotopy Type Theory (HoTT)

Vladimir Voevodsky, one of the primary architects of HoTT, wanted to use computers to verify complex proofs in modern abstract mathematics (like algebraic geometry). However, standard foundations based on Zermelo-Fraenkel set theory (ZFC) proved incredibly clunky for this task.

In ZFC, everything is a set. This leads to junk theorems that are technically true but mathematically meaningless. For example, in ZFC, it is valid to ask if the number 3 is an element of the number 4, or if 1 ∈ ℝ, and the answer depends entirely on the arbitrary way those numbers were encoded into sets.

Furthermore, mathematicians routinely treat isomorphic structures (e.g., two vector spaces of the same dimension) as identical and interchangeable. ZFC does not allow this natively; you have to manually translate proofs between isomorphic structures. Voevodsky wanted a foundation for mathematics—the Univalent Foundations—that mirrored how mathematicians actually think.

In the 2000s, logicians (such as Steve Awodey and Michael Warren) and topologists (like Voevodsky) independently noticed a similarity between Intensional Martin-Löf Type Theory and Homotopy Theory.

Instead of treating multiple different proofs of equality that naturally arise in Type Theory like overhead, these proofs of equality could be thought of as paths between points in a space, something Homotopy Theory was well suited to deal with.

In type theory, types are usually thought of as sets, and terms are elements of those sets. HoTT shifts this paradigm:

Types are viewed as topological spaces.
Terms are viewed as points within that space.
Identity Types (proofs that x = y) are viewed as paths between point x and point y.
Proofs of Identity between Identities (proofs that p = q, where p and q are already paths) are viewed as homotopies—continuous 2D surfaces that deform path p into path q.

This creates an infinite hierarchy of paths, surfaces, and higher-dimensional volumes, corresponding to the concept of an ∞-groupoid in category theory.

The Univalence Axiom formalizes the mathematical intuition that “isomorphic structures are equal.”

Formally, for any two types A and B in a universe U, the Univalence Axiom states that an equivalence between A and B (written as A ≃ B) is equivalent to an identity (equality) between A and B:

\((A =_{\mathcal{U}} B) \simeq (A \simeq B)\)

In standard type theory, you define a type by declaring its constructors (the basic elements that generate the type). For example, the natural numbers are defined by the element 0 and a successor function.

HoTT introduces Higher Inductive Types, which allow you to define a type by specifying not just its base points, but also its paths (equalities).

For example, in HoTT, you can define a Circle (S¹) by giving it:

A base point: base : S¹
A path from the base point to itself: loop : base = base

One might be confused at this “Circle” lacking a center point, radius or an infinite number of points. That is because this is a topological circle not a geometric one. It is also worth noting that loop provides a new, distinct proof that the base point equals itself by taking a journey around a circle, something HoTT cares to differentiate as a new piece of information.

HoTT stratifies all types into a hierarchy based on their topological complexity, known as h-levels:

h-level 0 (Contractible Spaces): This is a type that has exactly one point (a base point), and every other point in the space is connected to it by a path. Topologically, the space can be continuously shrunk down to a single point. Logically, it represents a singleton truth that is trivially unique.
h-level 1 (Propositions): Types where all elements are strictly equal. (Like a space with only one point, or a statement that is simply True or False).
h-level 2 (Sets): Types where there are distinct elements, but all paths between elements are trivial. This recovers standard Sets from within the theory.
(When earlier we took our transparent envelope (a Σ-type, which contains propositions meaning it acts like a Set at h-level 2) and squashed into an opaque box (‖Σx‖), we are forcing it down into h-level 1).
h-level 3 (Groupoids): Types where there are distinct elements and distinct paths between them, but no distinct surfaces between the paths.
Higher levels: Spaces with increasingly complex topological structures.

(Note: Many type theorists also use n-type terminology, where h-level 0 is a -2-type, h-level 1 is a -1-type, and h-level 2 (Sets) is a 0-type).

III. Lean 4

So, how does all this theory actually compile? Lean runs on a specific flavor of type theory called the Calculus of Inductive Constructions (CIC). At this point, we can restate some of the concepts we’ve explored into the familiar rules of the Lean compiler.

Dependent Types

We’ve already seen how generic types work (Axis 1 of the Lambda Cube, like List<T> where a type depends on another type). But Lean features Dependent Types, meaning a type can depend on a value.

In a standard programming language, you might declare an array of integers. In Lean, you declare a Vector Int n, where n is the exact length of the array, and it is baked directly into the type signature. If you try to write a function that appends an item to a Vector Int 3 and claim the return type is still Vector Int 3, the compiler will immediately reject it because 3+1=4. This is exactly how Lean enforces mathematical rigor at compile time.

Type Universes and Proof Irrelevance

Here is a puzzle I had to think through: if 5 is a Nat, and Nat is a Type, what is the type of Type?

If Type is simply a Type, the system collapses into Girard’s Paradox (Type Theory’s version of the famous Russell’s Paradox.) To fix this, Lean stacks an infinite hierarchy of universes: Type 0, Type 1, Type 2, and so on.

In Lean, the absolute bottom of the hierarchy is Prop (the universe of logical propositions). Type (or Type 0) is the universe of actual data.

Prop possesses a property called Proof Irrelevance. When you are working in Prop, the compiler literally does not care how you prove a proposition, only that you proved it. All valid proofs of a given proposition are treated as identically equal. This is very much what we did with Propositional Truncation earlier and maps to h-level 1 (Propositions).

The Two Types of Equality

Understanding Lean requires handling two distinct concepts of equality. Definitional Equality: Things that are true just by computing them (e.g., 2 + 2 = 4). The compiler does this automatically. Propositional Equality: Things that are logically true, but require human argument (e.g., a + b = b + a). The compiler cannot do this automatically; you have to write a proof.

(In HoTT terms, definitional equality means the compiler just evaluates the terms and they are identical. They are the exact same point in space. Propositional equality requires a mathematical proof. Meaning you have to explicitly build a path between point A and point B. )

Inductive Types and Coercions

In Lean, almost everything (from booleans, to natural numbers, to the proofs themselves) is defined inductively from the ground up by declaring base cases and constructors.

But this strictness creates a practical problem. What happens if you pass the Nat 5 into a function expecting a Real? In strict type theory, that is a type mismatch. To stop mathematicians from losing their minds writing explicit conversions, Lean utilizes a system of Typeclasses (specifically the Coe class). The compiler checks its registry, finds the coercion path from Nat to Real, and silently inserts the conversion function for you. It was designed to handle this mathematical abuse of notation naturally.

III - 1. The History of Lean

Lean was launched in 2013 at Microsoft Research, spearheaded by Leonardo de Moura. De Moura’s background is crucial to understanding the language: his expertise was in Automated Theorem Proving (ATP) and software verification. He was the chief architect of Z3, a massive, industry-standard SMT solver used to hunt down bugs in software and hardware.

The motivation here makes perfect sense. He wanted to build a system that bridged a massive gap. Automated solvers (like Z3) are blindingly fast but mathematically limited. Interactive theorem provers are mathematically profound but historically slow and clunky.

Before Lean 4, there was a strict dichotomy in computer science. You had to choose your world. Proof Assistants (Coq, Agda) were excellent for formalizing logic, but practically useless for writing standard software. If a proof required heavy computation, the system choked. Functional Programming Languages (Haskell, OCaml) were great for writing actual software, but lacking the dependent type theory required to prove complex mathematical theorems.

Lean 4 was designed to completely destroy this “Two Worlds” divide. It functions simultaneously as a world-class interactive theorem prover and a fast, general-purpose programming language.

To prove it could be done, Lean 4 is entirely bootstrapped. The compiler, the syntax parser, and the language server are all built using Lean 4 itself.

To solve the computational bottlenecks of older proof assistants, Lean compiles directly down to highly optimized C code. It manages memory using deterministic reference counting rather than a traditional garbage collector, allowing it to execute algorithms with speeds comparable to C or Rust.

Historically, pure mathematicians (working in areas like algebraic geometry or topology) largely ignored proof assistants. The cultural shift began with the Xena Project, founded by Kevin Buzzard, a number theorist at Imperial College London. Buzzard began training undergraduate math students to translate the standard math curriculum directly into Lean code.

This movement catalyzed the creation of mathlib, Lean’s massive, community-driven mathematical library. It transformed Lean from a Microsoft Research project into a global, collaborative effort to digitize the entirety of human mathematical knowledge.

III - 2. Formalizing Math

If you pick up a mathematics paper from arXiv, it looks incredibly formal. There are Greek letters, strict definitions, and rigorous proofs. But when we try to feed that paper into a system like Lean, we quickly realize human mathematics is actually highly informal.

Human math relies heavily on a shared, implicit understanding between the writer and the reader. It is full of cognitive shortcuts that a compiler will not process:

Abuse of Notation: Mathematicians use the exact same symbol to mean different things depending on the chapter, relying on the reader’s common sense to figure out the context.
Skipped Steps: Phrases like “By symmetry...”, or “Leaving the details to the reader” are everywhere in textbooks. To a human, a skipped step is a polite time-saver. To a Curry-Howard compiler, a skipped step is a missing function definition.
Implicit Coercions: A human sees the natural number 5 and the real number 5.0 as the exact same thing. Lean sees them as two entirely different data structures living in different universes, requiring an explicit conversion function to bridge them.

Transforming human math into Lean code is called Formalization. Because of the missing steps and contextual leaps, we cannot do this with a simple algorithmic translator or a basic script. You have to actively reconstruct the invisible logic. This is exactly why formalization is currently one of the hottest research areas at the intersection of AI and mathematics, driving projects like Google DeepMind’s AlphaProof.

The De Bruijn Factor is the ratio of the length of a formal proof to the length of a human proof. A single, elegant paragraph in a mathematics textbook might require dozens or even hundreds of lines of dense Lean code to explicitly prove every single detail.

In software engineering, if you want to build a web app, you import a framework. Mathematics builds on itself in the exact same way. If you want to prove a new theorem in topology, you need a library that defines what a topological space is. But that library requires a library on set theory. Which requires a library on basic logic.

If you want to formalize a theorem, but the underlying concepts haven’t been added to Lean’s ecosystem yet, you have to stop your work and build those foundational libraries from scratch yourself. To solve this, the Lean community is currently building mathlib—the master library of all formalized math. It is a massive, decentralized infrastructure project aiming to map the entirety of human mathematical knowledge into a single, unified codebase.

Piecing this together, it becomes obvious why progress has historically been slow. To successfully formalize modern math, you need a brain that sits at the center of a very rare Venn diagram. You need someone who is simultaneously:

A PhD-level mathematician capable of understanding cutting-edge research.
A software engineer capable of understanding functional programming and compiler architecture.
A type theorist capable of debugging Dependent Type Theory universe errors.

Finding all three of those skill sets in one person is exceedingly rare. It takes months to become proficient in Lean. From a purely practical standpoint, many mathematicians simply don’t see the career incentive to spend two years writing code to “re-prove” a theorem they already know is true.

However, the cultural shift is happening because human mathematics is hitting a physical limit.

Fields like algebraic geometry are becoming so incredibly complex that standard human peer review is starting to fail. The most famous example of this is Fields Medalist Peter Scholze’s Liquid Tensor Experiment. In 2020, Scholze published a theorem so vast and complicated that he admitted he couldn’t be 100% sure it was entirely correct, and he doubted any human peer reviewer could confidently check his work.

He issued a challenge to the Lean community to verify it. By digitizing the proof, they proved it was indeed correct. Lean is no longer just a toy for logicians; it is becoming a necessary safety net for the outer edges of human thought.

III - 3. The Really Really Rapid Future

The enthusiasm following breakthroughs like the Liquid Tensor Experiment created a unique problem: mathlib became a victim of its own success.

By early 2025, the repository had exceeded 20,000 contributions, creating a massive traffic jam. Hundreds of mathematical pull requests sat stagnant in the queue. The human bottleneck moved from academic journals to GitHub. To manage the sheer volume of incoming code, the Lean community launched the Mathlib Initiative in late 2025, building an editorial board of research mathematicians to triage submissions.

But this bottleneck sets the stage for the most profound shift yet. We are currently watching the transition from the machine as an Automator to the machine as a Discoverer.

In September 2025, an AI system named Gauss formalized the strong Prime Number Theorem in Lean in three weeks—a translation effort that would have historically consumed a human team for over a year. But translation was only the beginning. Now in 2026, reasoning models have begun generating net-new mathematical discoveries, famously disproving Paul Erdős’s 80-year-old planar unit distance conjecture, with earlier solutions to Erdős Problem 728, Problem 1196.

The prominence of AI models highlights the need for rigor. And here, Lean acts as the ultimate anti-hallucination scaffolding.

If an AI tries to confidently skip a logical step (a common hallucination), the Lean compiler treats it as a missing function definition and throws a fatal error. If the AI hallucinates a false equivalency, the type checker rejects it because the types don’t match.

The AI is forced to continually adjust its reasoning until the compiler accepts the code. And because we know that Proofs are Programs, a successfully compiled program is an guarantee of logical truth. The compiler physically prevents the AI from lying.

At the beginning of this essay, I mentioned that I am very much a student of Mathematics. Looking at the landscape today, there is a part of me that is rather glad that I am not a professional of Mathematics. These would be some scary times.

It is an awesome (in the sublime, terror inducing sense) thing to witness: the overturning of a system that has been exclusively human since its creation. Given the lack of other intelligent life we know of, there is a genuine cosmological scale to this revolution.

I think some of the popular comparisons equating the future of mathematicians to modern chess players—rendered obsolete by engines, left to play only for human sport—are overblown. But one thing seems undeniably clear: as the machines take on the role of discoverers, everyone (and this extends well beyond the beyond mathematics) is going to have to get used to being a student far more often than before.

Systems Thinking Collection

Discussion about this post

Ready for more?