Articles

Feature stories, news review, opinion & commentary on Artificial Intelligence

Meta Unveils MovieGen: AI That Makes Videos From Text Prompts

Generative AI


Remember when AI could only make still images from your wildest descriptions? Well, buckle up, because Meta just unveiled MovieGen, a collection of AI models that can generate full-on videos from text prompts! That's right, type in "a blue emu swimming through the ocean," and boom, you've got a mini-movie.

This is a huge leap for generative AI. Up until now, the big players in the AI video scene were mostly commercial systems like Runway Gen3, LumaLabs, and OpenAI's Sora, but MovieGen is outperforming them in terms of video quality and realism. We're talking smoother motion, more accurate details, and videos that actually look like they belong in the real world. To give you an idea, MovieGen is particularly strong when it comes to:

  • Frame Consistency: That means no more weird morphing or objects randomly popping in and out. MovieGen is designed to understand how things should move and interact naturally.
  • Motion Naturalness: Ever seen an AI video where the motion looks stiff or jerky? MovieGen is tackling that head-on by learning to replicate how things actually move in the real world.

And get this—MovieGen isn't just a one-trick pony. It can also:

  • Personalize Videos: Have a picture of yourself? MovieGen can use it to create videos starring you, opening up a whole new world of possibilities for content creation.
  • Edit Videos with Text Instructions: Want to swap out the background of your video or add a new object? Just tell MovieGen what to do, and it will take care of the rest.

So, how does MovieGen work its magic? It's built on the foundation of large language models (LLMs) like LLaMa3. But instead of spitting out text, MovieGen's models have learned to translate your words into images, videos, and even audio!

One of the key ingredients in MovieGen's success is its massive training dataset. We're talking hundreds of millions of video-text pairs and billions of image-text pairs! Meta didn't just throw any old videos into the mix, though. They carefully curated the dataset, filtering out low-quality content and focusing on videos with natural motion and realistic scenes. To top it off, they even used AI to generate incredibly detailed captions for each video, giving MovieGen the best possible training material.

But it's not just about the data. Meta also made some clever tweaks to the model architecture and training process. For instance, they found that using a training approach called "flow matching" helped the model learn more efficiently and produce higher-quality videos.

What does this all mean for the future? Imagine a world where anyone can create Hollywood-level videos from their living rooms or edit videos with the ease of sending a text message. That's the potential MovieGen holds.

Of course, it's still early days. Meta acknowledges that MovieGen needs more work before it's ready for prime time, especially when it comes to safety and addressing potential biases. But there's no doubt that this is a game-changer. Get ready for a future where the only limit to your video creations is your imagination. You can see the demo at https://ai.meta.com/research/movie-gen/ but we think eventually you will be able to use MovieGen at Meta.ai.