Alpha Geometry: King of the Hill? (Part 1)

In my recent deep dive into the AlphaGeometry (AG) paper, I discovered a fascinating world of automated theorem proving that dates back to 1975. People have been trying to solve math theorems automatically for a long time. Never knew this!

First-order Automatic Theorem Provers (ATP), such as Vampire and the E solver, are prime examples of these efforts. Interestingly, Microsoft Research even developed an entire programming language dedicated to theorem proving and solving, known as LEAP. Pretty Crazy!

So Alpha Geometry came recently and did make some waves in the community. It was able to solve 25 of the 30 problems from the International Math Olympiad(IMO-AG-30). For context, a gold medallist can solve around 25.9 of these problems. The first question that sprung up in my mind was How did they do it? Breaking from old traditions, the model and the entire code implementation have been made open source for all of us to see and explore. I think that is pretty neat and they should do more of this so that the open-source community can take some cues from the best minds in the world. Okay so let’s begin.

IMO level theorem proving is hard, it needs good logical reasoning and the ability to search through a very large space (almost infinite) to prove the damn thing. It needs logical reasoning for sure but creativity is also an important factor. Computers are good at logical reasoning but creativity, not so much. I think we have taken a step towards creativity by employing Large Language Models and their hallucinations. Hallucinations in a LLM can make the model take a path which is not trodden before and I think this can help develop new branches of thoughts and ideas, which we can develop upon. Instead of shitposting our hallucinated chats on X(Twitter), the question is can we take advantage of the limitation. Okay now back to AlphaGeometry.

One of the most beautiful things they did for alpha geometry is that they married two different subsets of Artificial Intelligence which are rather far apart. One is symbolic AI while the other is Generative AI. One is deterministic while the other is probabilistic. Truly Amazing!

Let us talk about the Groom shall we: The Symbolic Artificial Intelligence.

Symbolic AI

Symbolic AI is like the language of logic for machines. Instead of crunching numbers, it deals with symbols and concepts.

Logical Rules: Symbolic AI uses logic-based programming. Picture a long list of rules that are hard-coded into the system by domain experts. Using the logical operators like and, or, not, etc. For instance, if a patient has a fever, cough, and difficulty breathing, it might deduce “pneumonia.” Gives Decision Tree vibes that I touched upon in an earlier post here.
Interpretable and Adaptable: Symbolic AI is transparent. We can trace its reasoning back to the rules it used. Plus, we can tweak those rules as new info arrives.

Symbolic AI doesn’t crave massive data like other AI methods. Instead, it is knowledge representation and reasoning. When the rules are clear, Symbolic AI works well.

Instead of delving straight into the architecture of the system. I would like to take a detour and talk about the training data generation process and we will derive the symbolic AI along the way. This was a crucial part of the system and creativity was shown here! Kudos to the DeepMind Team.

How did they generate Data?

As I mentioned earlier Automatic Theorem Provers are a thing, and given a premise they can deduce a conclusion along with the steps required to prove the premise. The first thing they built was a custom language that builds diagrams for a geometry premise object by object. You can look at Java-Geometry-Expert(JGEX) to understand how this works. This helped them build robust premises free of self-contradictions. They built their own language, to keep it simple and constrained with respect to the test set which was IMO-AG-30.

They now had the ability to generate random premises to solve. So what did they do about it, they generated 1 billion of those premises. Yes 1 billion, Let that sink in! They sampled them randomly and did not use any human-designed problem set. This is to avoid test set contamination.

They ran all of this through a symbolic DD + AR to extract the solutions. We have new terms so let’s talk about it.

Deduction Database(DD)

As we discussed earlier AI is not just chatGPT but a lot more. There is a thing called symbolic AI and the deduction database(DD) is a part of it. So here is how it works at a high level. Based on some rules and facts stored in a database, the DD is an engine that makes deductions and infers new rules. It is a super smart notebook that contains a bunch of facts and rules.

Imagine you’re writing down facts about your friends in this notebook. For example, “Alice is Bob’s sister” and “Bob is Charlie’s brother”. These are your facts that can be written as:

“Sister(Alice, Bob)” and “Brother(Charlie, Bob)”.

Now, you also write down some rules in this notebook. You can write down rules based on the statement above symbols.

“(Brother ^ Sister (Person A, Person B)) and (Brother ^ Sister (Person B, Person C)) ⇒ Brother ^ Sister (Person A, Person C)”.

This is an example of implication saying that if the premise is TRUE then the implication(⇒) is also TRUE.

What’s cool about this notebook (the deductive database) is that it can use the rules to figure out new facts from the ones you’ve written down. So, in our example, the notebook can figure out that “Alice is Charlie’s sister”, even though you didn’t explicitly write that down.

This ability to figure out new facts from existing ones using rules is what makes a deductive database special. It’s like having a personal detective who can discover new information based on what you’ve already told them.

They used a structured DD approach which they claim can find a deduction closure in mere seconds on non-accelerating hardware. I urge the reader to explore this further.

They had one more component to this DD engine named Algebraic Rules(AR), which is a novel method and state-of-the-art for symbolic reasoning in geometry.

Algebraic Rules(AR)

They introduced this module to incorporate more variations and robustness in the generated theorem and proofs. Participants in the IMO regularly employ these tactics to generate proof and come to a conclusion. These rules include angle, ratio or distance chasing. Previous methods like GeoLogic employed only angle and ratio but did not include distance chasing. They expanded this to incorporate all of these rules and also arithmetic reasoning with geometric constants such as pi.

I will explain one of these briefly and leave the rest two as an exercise for the reader.

“Distance chasing” in geometry is a problem-solving strategy where you calculate distances between various points to find the information you need. Let’s look at an example.

Suppose this is a question:

Let ABC be an equilateral triangle and P a point on BC. If PB = 50 and PC = 30, compute PA.

Which is equivalent to the diagram below.

Given that PB = 50 and PC = 30, we know that BC = PB + PC = 80. Since triangle ABC is equilateral, AB = BC = AC = 80.

This can be solved with law of cosines of any traingle which states that for any triangle with sides of lengths a, b, and c and an angle γ between sides a and b, the following equation holds:

c^2 = {a^2 + b^2 - 2*a*b*cos(γ)}

We calculate the distance AP based on other information in our diagram. This is distance chasing at a very high level.

They convert all the equations into matrices and use Gaussian elimination to solve the linear systems at near-instant speeds. (Very Important)

Examples of distance/ratio/angle chasing have been provided in the paper as well.

To handle geometric constants like “pi”. They added default co-efficients to all matrices.

We now have all the puzzle pieces together for the symbolic engine DD + AR.

Generating 100 million training Example.

These are the steps that they followed:

First, they sampled a large set of random theorem premises, 1 billion as we talked about.
Then they use the symbolic deduction engine DD + AR to obtain a deduction closure. This returns a directed acyclic graph of statements.

Directed Acyclic Graph (DAG): This is a way of representing the conclusions that the engine reaches. Each conclusion is a node in the graph. If you can get from one conclusion to another by following the rules, there’s an edge connecting those nodes in the graph. The graph is “directed” because the edges have a direction (from known facts to new conclusions), and it’s “acyclic” because there are no loops (you can’t start at one conclusion, follow the edges, and end up back at the same conclusion).

Then they perform a traceback in the graph to find the minimum set of necessary premise and dependency deductions.
All of this is packed together in examples that the system uses to learn from. It consists of the premises (P), the conclusion (N), and the proof (G(N)).

They used the above process in a highly parallel environment and generated 100 million quality examples for training.

This concludes the symbolic engine part of the paper which is one piece of the architecture puzzle. We touched upon the concept of symbolic AI and how AlphaGeometry uses its own symbolic engine. The next piece is generating training examples with auxiliary connections and training a language model on the entirety of the data, which we will discuss in Part 2 of Alpha Geometry: King of the Hill?.

Until then Peace!

References

Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He & Thang Luong (2024) Solving olympiad geometry without human demonstrations. Nature

Deep Learning Blog