Will AI generate realistic video of animal movement before 2025?
➕
Plus
66
Ṁ5288
Jan 1
48%
chance

Note: This is an effort to make relatively objective, transparent Manifold markets that predict AI capabilities. I won't trade in these markets because there will inevitably be some subjectivity, and I'll try to be responsive with clarifications in the comments (which I will add to the market description). Feedback welcome.

Specifications:

  1. The background of the video doesn't matter (e.g., there can be unrealistic animals or scenery in the background).

  2. The AI-generated video needs to be indistinguishable from a 5-second clip of a single nonhuman animal in action (not just standing, walking, sitting down, etc.). Examples could include a moth flapping their wings, a snake slithering, a cheetah making a sharp turn, or a whale jumping.

  3. This requires more than one example (i.e., not just a fluke), but it doesn't require robustness or high success rates. If a company releases a handful of examples and reliable evidence that they can make videos like this without human assistance (i.e., text-to-video), that's sufficient for YES even if the examples are cherry-picked; the idea here is that even if videos like these take 10 tries each, they could still be commercially viable, and they indicate that the model isn't just getting lucky—even if it still has a lot of hallucination problems.

  4. Indistinguishability approximately means that in a YouTube compilation of 20 clips presented as real animal footage, fewer than 10% of casual, attentive viewers would suspect the AI-generated clip wasn't a real animal. It should be a real animal species, but it doesn't need to pass expert review. (The human observer test isn't a strict or precise requirement, in part because the results would depend a lot on how much people are thinking about AI at the time of the test.)

  5. Most of the animal should be in the video and shouldn't be obscured (e.g., smoke, a blizzard, a dirty camera lens, excessive hair or fur). If the animal is moving quickly, the frame rate needs to be good enough to tell that the animal movement is realistic.

  6. The model needs to be generating novel content. It can't just regurgitate real footage, even if it does some adjustment or recombination. (Thanks to @pietrokc for raising this in the comments. There may be quite a bit of subjectivity here, particularly because there tends to not be much public information about the training data of SOTA models these days.)

The spirit of this market (which will be used to resolve ambiguities that aren't resolved by explicit criteria) is whether the AI seems to have a world model of how animals look and move. YES resolution doesn't require the detailed knowledge of a scientist or sculptor but the general, intuitive understanding that almost all human adults have.

Get
Ṁ1,000
and
S1.00
Sort by:

This is a really good attempt at an unambiguous market but I think the following case is not dealt with.

Imagine I take one real 30s video of a dog walking. Like, a real video of a real dog. Then I train an AI to output that exact video regardless of text prompt. That's very easy to do. So, would that count? What if I do the same but with 5 real videos, and the AI has a 20% chance of outputting each of the 5?

What I'm getting at is that it's very easy to train a model to output any fixed thing. So maybe your question really is, "will AI generate realistic video of animal movement to most reasonable prompts?" With all your (very well thought-out) conditions.

@pietrokc that concern makes sense. The challenges with "most reasonable prompts" are: (i) I'm trying to get at whether the model has understanding, even if that understanding is unreliable, and failing/hallucinating frequently doesn't seem to curtail understanding, (ii) pragmatically, a model that fails even 99 out of 100 times at this task could still be very impactful, especially if selecting the 1/100 is easy; e.g., Hollywood studies could incorporate it even if they have to do that selection, (iii) knowing whether the realism works for particular prompts requires public access, and I'm trying to capture model capabilities rather than, e.g., when companies decide it's advantageous to do a public release.

But I think I can exclude your case with something like, "The model needs to be generating novel content. It can't just regurgitate real footage, even if it does some adjustment or recombination." Of course that's subjective, but I think we'll have a good sense of the degree of the limitation—except for when we don't know if it's doing things very similar to proprietary training data; that's a pretty intractable issue, but probably we should have a higher bar for, say, a no-name start-up that shows 20 examples but we have no idea if those were just the only 20 animal movements for which they had robust training data. What do you think?

@Jacy That all makes sense and I agree with your proposal. It's a tough market to define but so are most (?) interesting markets! Thanks for thinking it through.

@CertaintyOfVictory Already at 3-4seconds he takes two steps with the same foot.

@SophusCorry I do that all the time.

@CertaintyOfVictory As @SophusCorry mentions, the legs aren't really in sync with the torso—among other issues. Also, the cat is merely walking, and this is about a wider range of animal motion (e.g., leaping, turning, rolling, playing).

I'll try to avoid repeating myself in the comments to avoid clutter, but I'll say again that none of the Sora examples meet the bar of this market in my opinion—so a YES resolution would require Sora to become much better or a competitor to take the lead. Personally I would not put the probability of this at 93% (the current market price).

Would i. e. this video pass your judgement? https://twitter.com/AngryTomtweets/status/1758265847334732288

@Lion no, none of the current Sora clips would. The golden retriever puppies aren't showing enough movement, aren't shown fully enough to tell if the motion is realistic, are obscured by snow, and have various warping/shimmering/etc. artifacts that I think would at least produce an uncanny valley effect. The dalmatian is only walking and passes through the blinds, but I actually think that is the best Sora example of animal motion so far.

bought Ṁ5 NO

@Jacy Thanks

bought Ṁ10 YES

@Jacy Not even the bird one?

Comment hidden