In the spirit of not cherry-picking any results, everything you see was the first generation we received for the prompt listed above it.
"A highly intelligent person reading 'Ars Technica' on their computer when the screen explodes"
"A cat in a car drinking a can of beer, beer commercial"
"Will Smith eating spaghetti"
"Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens"
"A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts"
"A herd of one million cats running on a hillside, aerial view"
"Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy"
"A muscular barbarian breaking a CRT television set with a weapon, cinematic, 8K, studio lighting"
Limitations of video synthesis models
Overall, the Minimax video-01 results seen above feel fairly similar to Gen-3's outputs, with some differences, like the lack of a celebrity filter on Will Smith (who sadly did not actually eat the spaghetti in our tests), and the more realistic cat hands and licking motion. Some results were far worse, like the one million cats and the Ars Technica reader.