Get Ready, the AI Hacks Are Coming

1 month ago 30

Think twice before you ask Google’s Gemini AI assistant to summarize your schedule for you, because it could lead to you losing control of all of your smart devices. At a presentation at Black Hat USA, the annual cybersecurity conference in Las Vegas, a group of researchers showed how attackers could include hidden commands in something as simple as a Google Calendar invite and use it to hijack smart devices—an example of the growing attack vector that is prompt injection attacks.

The hack, laid out in a paper titled “Invitation Is All You Need!”, the researchers lay out 14 different ways they were able to manipulate Gemini via prompt injection, a type of attack that uses malicious and often hidden prompts to make large language models produce harmful outputs.

Perhaps the most startling of the bunch, as highlighted by Wired, was an attack that managed to hijack internet-connected appliances and accessories, doing everything from turning off lights to turning on a boiler—basically wrestling control of the house from the owner and potentially putting them in a dangerous or compromising situation. Other attacks managed to make Gemini start a Zoom call, intercept details from emails, and download a file from a phone’s web browser.

Most of those attacks start with something as simple as a Google Calendar invitation that is poisoned with prompt injections that, when activated, will make the AI model engage in behavior that bypasses its built-in safety protocols. And these are far from the first examples that security researchers have managed to put together to show the potential vulnerabilities of LLMs. Others have used prompt injection to hijack code assistants like Cursor. Just last month, Amazon’s coding tool got infiltrated by a hacker who instructed it to delete files off the machines it was running on.

It’s also becoming increasingly clear that AI models appear to engage with hidden commands. A recent paper found that an AI model used to train other models passed along quirks and preferences despite specific references to such preferences being filtered out in the data, suggesting there may be messaging moving between machines that can’t be directly observed.

LLMs largely remain black boxes. But if you’re a malicious actor, you don’t necessarily need to understand what is happening under the hood. You just need to know how to get a message in there that will make the machine work in a specific way. In the case of these attacks, the researchers informed Google of the vulnerability, and the company addressed the issue, per Wired. But as AI gets integrated into more platforms and more areas of the public’s lives, the more risk that such weaknesses present. It’s particularly concerning as AI agents, which have the ability to interact with apps and websites to complete multi-step tasks, are starting to roll out. What could go wrong?

Read Entire Article