Yesterday, Microsoft unveiled WHAMM, a generative AI model for real-time gaming, as demonstrated in its demo starring the 28-year-old classic Quake II. The interactive demo responds to user inputs via controller or keyboard, though the frame rate barely hangs in the low to mid-teens. Before you grab your pitchforks, Microsoft emphasizes that the focus should be on analyzing the model's quirks and not judging it as a gaming experience.
WHAMM, which stands for World and Human Action MaskGIT Model, is an update to the original WHAM-1.6B model launched in February. It serves as a real-time playable extension with faster visual output. WHAM uses an autoregressive model where each token is predicted sequentially, much like LLMs. To make the experience real-time and seamless, Microsoft transitioned to a MaskGIT-style setup where all tokens for the image can be generated in parallel, decreasing dependency and the number of forward passes required.
WHAMM was trained on Quake II with just over a week of data, a dramatic reduction from the seven years required for WHAM-1.6B. Likewise, the resolution has been bumped up from a pixel-like 300 x 180 to a slightly less pixel-like 640 x 360. You can try out the demo yourself at Copilot Labs.
The model's ability to keep track of the existing environment, apart from the occasional graphical anomaly, while simultaneously adapting to user inputs, is impressive, regardless of the atrociously bad input lag. You can shoot, move, jump, crouch, look around, and even shoot enemies, but ultimately, it's no more than a fancy showcase and can never substitute the original experience.
As expected, the model isn't perfect. Enemy interactions are described as fuzzy, the context length is limited, the game incorrectly stores vital stats like health and damage, and it is confined to a single level.
This announcement follows OpenAI's latest Ghibli trend, which has garnered a lot of negative attention. While I'm no artist, there's a certain human element to every piece of creative work that AI cannot truly recreate. Yet, with AI's current rate of development, we might see that fully AI-generated games and movies could be a reality within the next few years, and that's where things are heading.
The sweet spot lies in AI enhancing, not replacing, creative works, like Nvidia's ACE technology, which can power lifelike NPCs. Parts of this technology are already integrated into the life simulation game inZOI. From a technological point of view, WHAMM still represents a step up from previous attempts, which were often chaotic, incoherent, and teeming with hallucinations.