Will the first AI to get IMO gold tree-search the formal proof space?
➕
Plus
29
Ṁ2852
2030
55%
chance

This question resolves when an AI first gets IMO Gold. It resolves YES if this AI is explicitly coded to do the following:

  1. Receive as input (or construct using natural language processing), problems which take the form of mathematical propositions encoded in a formal language.

  2. Construct a tree of logical transformations of that initial proposition (e.g. in the form of tactic applications), and iteratively extend this tree with the goal of identifying a subtree which comprises a complete formal proof of the proposition.

Note that an AI which is not coded to do this explicitly, but instead identifies this strategy emergently does not count.

For example, an AI as described in the paper "HyperTree Proof Search for Neural Theorem Proving" would satisfy these conditions.

Close date updated to 2030-12-31 3:59 pm

Get
Ṁ1,000
and
S1.00
Sort by:

what if it's a model which operates in the same style as o1? Where it runs a single, long, chain of thought (and can backtrack / plan in-context) instead of constructing an explicit tree?

I take it "Note that an AI which is not coded to do this explicitly, but instead identifies this strategy emergently does not count" excludes this class of model, right?

This is not how the recent AlphaGeometry paper works, so updating towards no

@jonsimon It seems to me that an IMO bot that worked this way would definitely count as YES, per the criteria. Is there something I'm missing about AlphaGeometry that makes you think it doesn't count here?

To explicitly lay this out

Receive as input (or construct using natural language processing), problems which take take the form of mathematical propositions encoded in a formal language.

This seems pretty much like what is described in this figure:

Construct a tree of logical transformations of that initial proposition (e.g. in the form of tactic applications), and iteratively extend this tree with the goal of identifying a subtree which comprises a complete formal proof of the proposition.

It seems like this matches the "symbolic deduction through infinite branching points".

Does stuff like tree of thoughts count?

@acertain I believe tree-of-thoughts is informal, so no.

Seems likely, proofs require a lot of multi-step organization and planning. So far the approaches that have worked best for these have used RL-guided monte carlo tree searches (e.g. AlphaZero, AlphaTensor, etc).