Coming back hard on text-to-video AI tools, global technology major Google has launched its own version of ‘text-to-video AI, the Make-a-Video' platform. The new short ‘smart' clip platform, the Imagen Video is an AI that can create video clips from text prompts.
This is the second text-to-video AI launched six months after DALLE-2, a text-to-image generator from Open AI, and merely a week after Meta announced its ‘Make-A-Video.'
According to the tech major, Imagen Video can produce videos of 1,280×768 pixels resolution at 24 frames per second of not more than 5.3 seconds. The model takes a description and generates a 16-frame, 3-fps video having 24 x 48-pixel resolution. Then, the system upscales and “predicts” additional frames, producing a 720p video at 24 frames per second.
“Imagen Video has a high degree of controllability and world knowledge. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding,” Google said.
The Imagen Video was trained with an “internal dataset” of 14 million videos and 60 million still images, and the training data further contained another 400 million images from the LAION-400M open dataset.
The team at Imagen Video plans to join the researchers at Phenaki, another text-to-video AI from Google that can turn detailed text prompts into two-minute-plus videos, though with a lower quality.
The demos shared include a video of “Coffee pouring into a cup,” “Wooden figurine surfing on a surfboard in space,” “Balloon full of water exploding in extreme slow motion,” and more.