Google has rolled out a new AI tool called “Whisk” which lets users drop photos as prompts to create new images.
On Monday, Google announced its latest AI tool which works a little differently to most image generators which require a long text prompt.
Instead, Whisk — which is now available in the U.S. — lets users generate images using other pictures as prompts and “remixes” them together to create new works.
With Whisk, users can generate an AI image by simply dragging and dropping pictures into the tool. From there, the image generator will do the rest.
With Whisk, users can provide images to define the subject, scene, and style of their AI-generated image. They can also prompt Whisk with multiple images for each of these elements. Additionally, users have the option to fill in text prompts if desired.
For those without images on hand, a dice icon allows Google to supply AI-generated images as prompts. At the end of the process, users may add extra detail about their desired image by entering text into a text box, although this step is optional.
Whisk generates images and a corresponding text prompt for each one. Users can favorite or download an image if satisfied with the results. Alternatively, they can refine the image by entering additional text in the text box or editing the text prompt by clicking on the image.
Whisk is powered by Google’s Gemini AI and Imagen, its image-creation AI tool. According to the company, Gemini works in the background by converting the images users upload into detailed text prompts for the AI model.
“Behind the scenes, the Gemini model automatically writes a detailed caption of your images,” Google Labs Director of Product Management Thomas Iljic and Google DeepMind Product Manager Nicole Brichtova write in a news release.
“It then feeds those descriptions into Google’s latest image generation model, Imagen 3. This process captures your subject’s essence, not an exact replica. That way, you can easily remix your subjects, scenes, and styles in novel ways.”
In a blog post, Google emphasizes that Whisk is intended for “rapid visual exploration, not pixel-perfect edits.” The company also acknowledges that Whisk may sometimes “miss the mark,” which is why it includes the option to edit the underlying prompts.