Move over, Claude: Moonshot's new AI model lets you vibe-code from a single video upload

1 day ago 2

Follow ZDNET: Add us as a preferred source on Google.

ZDNET's key takeaways

Moonshot debuted its open-source Kimi K2.5 model on Tuesday.
It can generate web interfaces based solely on images or video.
It also comes with an "agent swarm" beta feature.

Alibaba-backed Chinese AI startup Moonshot released Kimi K2.5 on Tuesday, describing it in a blog post as the world's "most powerful open-source model to date."

Built on top of the Kimi K2 LLM, which debuted last summer, Moonshot's latest model comes with coding capabilities that could make it a serious competitor with its proprietary counterparts. Kimi K2.5 scored comparably to frontier models from OpenAI, Google, and Anthropic on the SWE-Bench Verified and SWE-Bench Multilingual coding benchmarks, according to data published by Moonshot.

Its ability to create front-end web interfaces from visual inputs, however, is what could truly set it apart from the crowd.

Coding with vision

Kimi K2.5 was pretrained with 15 trillion text and visual tokens, making it "a native multimodal model," according to Moonshot, that can generate web interfaces from uploaded images or video, complete with interactive elements and scroll effects.

In a demo video of this "coding with vision" capability included in Moonshot's blog post, Kimi K2.5 generated a draft of a new website based on a recorded video of a preexisting website, shown from the perspective of a user's screen as they scroll. The model was able to recreate the general aesthetic, even if -- in classic AI style -- it made some slight visual blunders along the way, like depicting continents on a globe as amorphous blobs.

It's unclear how practical this kind of capability will be. (Why would a company need to create a slightly less visually appealing AI-generated copy of an already perfectly reasonable website?) Still, generating mock-ups of websites and apps exclusively from images or videos would mark a meaningful step forward for so-called "vibe coding" tools, which are based on intuitive methods easily deployed by non-experts rather than traditional coding.

ChatGPT, Claude, and Gemini can generate raw code for new web assets based on screenshots or other images, but that still leaves the user needing to translate it into a finished and usable product. The novelty (and potential market value) of Moonshot's new model is that it cuts out that intermediary step. "By reasoning over images and video, K2.5 improves image/video-to-code generation and visual debugging, lowering the barrier for users to express intent visually," the company wrote in its blog post.

Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic

If it proves useful in the real world, especially among businesses, other developers will probably follow suit with similar capabilities for their own models.

Kimi K2.5's coding capabilities have been made available through an open source platform called Kimi Code, which can be accessed through integrated development environments (IDEs) like Cursor, VSCode, and Zed. The new model is also available through Kimi.com, the Kimi App, and the Kimi API.

Agent swarm

Moonshot also unveiled a research preview called "agent swarm," which orchestrates up to one hundred "sub-agents" to improve performance on certain multistep tasks.

By running multiple tasks in parallel to one another, agent swarm can also speed up the compute process. "Running these subtasks concurrently significantly reduces end-to-end latency compared to sequential agent execution," Moonshot wrote in its blog post, adding that internal evaluations showed that end-to-end runtime -- the total process from input to the completion of the final output -- could be reduced by up to 80%.

Also: I used Claude Code to vibe code an Apple Watch app in just 12 hours - instead of 2 months

Users with an active "Allegretto" or "Vivace" Moonshot account (costing $31/month and $159/month, respectively) can give agent swarm a try on the Kimi website by clicking the model drop-down menu on the bottom-right of the prompt box and selecting "K2.5 Agent Swarm (Beta)."

Read Entire Article