This is a variant of the following market:
https://manifold.markets/dreev/will-an-llm-be-able-to-solve-confus
In this version, the problem has to be solved purely by the LLM itself.
Possible clarification from creator (AI generated):
Market will be resolved based on whether ChatGPT-4 (or equivalent) can solve confusing but elementary geometric reasoning problems purely by itself
Creator will make the judgment call on whether solutions that involve ChatGPT-4 count as being solved 'purely by the LLM itself'
@dreev this will be increasingly challenging as more and more models are integrated into a single system, which is in part why I don't bet much on the "Will LLMs do [task] by [future year]?" markets, but yeah, I think it's reasonable to call GPT-o1 an LLM for the rest of 2024.
@Jacy Thank you, that makes a ton of sense. I shall avoid trying to single out LLMs in the future and hope that this one won't turn out too painful to adjudicate over the remaining 3 months in 2024. If anyone has any counterpoints about GPT-o1, chime in! (Not that it matters so far, with GPT-o1 failing our flagship geometric reasoning problem so far, but it does seem to be getting closer... 😬)
@dreev bet up on this one since if o1 counts as an LLM I think this also resolves yes?
@dreev /shrug idk what the word pure means. It's one model that queries itself a lot of times in a row
@JohnCarpenter Yeah this is very nonobvious to me. There's presumably additional code for constructing the chain-of-reasoning prompts.