https://arcprize.org/competition
>=85% performance on Chollet's abstraction and reasoning corpus, private set.
(If Chollet et al. change the requirements for the grand prize in 2025, this question will not change. The bar will remain >=85% performance)
2024 version https://manifold.markets/JacobPfau/will-the-arcagi-grand-prize-be-clai
@MalachiteEagle Wow, I hope they made it clear at least in fine print that they might switch to a harder evaluation set; otherwise this feels really unfair to the people who have put a lot of work into solutions.
https://arxiv.org/abs/2411.07279
TTT significantly improves performance on ARC tasks, achieving up to 6× improvement in accuracy compared to base fine-tuned models; applying TTT to an 8B-parameter language model, we achieve 53% accuracy on the ARC’s public validation set, improving the state-of-the-art by nearly 25% for public and purely neural approaches. By ensembling our method with recent program generation approaches, we get SoTA public validation accuracy of 61.9%, matching the average human score.
Test-time training (TTT) enables parametric models to adapt during inference through dynamic parameter updates, an approach that remains relatively unexplored in the era of large language models. This technique is a form of transductive learning, where models leverages the test data structure to improve its predictions.
I made a version of this market which allows for closed source LLMs: https://manifold.markets/RyanGreenblatt/by-when-will-85-be-reached-on-the-p
This is your chance to win free mana betting against SG, which is a guaranteed winning strategy exploited by top traders such as jackson
@mckiev i might take you up on the offer, but what's your reasoning for 85% accuracy? we're at 30% right now