Will we see improvements in the TruthfulQA LLM benchmark in 2024?

Mini

Ṁ71

resolved Jan 3

Resolved

N/A

ALL

Daron Acemoglu wrote an article with a series of vague AI predictions for 2024 https://web.archive.org/web/20240110122026/https://www.wired.com/story/get-ready-for-the-great-ai-disappointment/.

One of which is: "More and more evidence will emerge that generative AI and large language models provide false information and are prone to hallucination—where an AI simply makes stuff up, and gets it wrong. Hopes of a quick fix to the hallucination problem via supervised learning, where these models are taught to stay away from questionable sources or statements, will prove optimistic at best. Because the architecture of these models is based on predicting the next word or words in a sequence, it will prove exceedingly difficult to have the predictions be anchored to known truths."

We have a benchmark with truthfulness of questions called TruthfulQA. The highest scoring model in 2023 was GPT-4 at 0.59. Will we see any improvement in this benchmark in 2024?

This is the best link I could find with different models run on the TruthfulQA benchmark, but am open to other sources if they exist https://paperswithcode.com/sota/question-answering-on-truthfulqa

#AI

#LLMs

#Academia

Get

1,000

and

1.00

1 Comment

Sort by:

I have not been able to find any further uses of this benchmark this year for the big new models like Claude 3.5 Sonnet, ChatGPT-4o, Gemini 2.0 Flash, etc. I strongly suspect these would crush the previous year's models on TruthfulQA benchmarks, but it appears no one really cares about this benchmark.

Related questions

Related questions