This recent hack came just four days after data researcher Riley Goodside discovered the ability to prompt GPT-3 with “malicious inputs” that order the model to ignore its previous directions and do something else instead. AI researcher Simon Willison posted an overview of the exploit on his blog the following day, coining the term “prompt injection” to describe it. “The exploit is present any time anyone writes a piece of software that works by providing a hard-coded set of prompt instructions and then appends input provided by a user,” Willison told Ars. “That’s because the user can type ‘Ignore previous instructions and (do this instead).'”
The concept of an injection attack is not new. Security researchers have known about SQL injection, for example, which can execute a harmful SQL statement when asking for user input if it’s not guarded against. But Willison expressed concern about mitigating prompt injection attacks, writing, “I know how to beat XSS, and SQL injection, and so many other exploits. I have no idea how to reliably beat prompt injection!” The difficulty in defending against prompt injection comes from the fact that mitigations for other types of injection attacks come from fixing syntax errors, noted a researcher named Glyph on Twitter. “Correct the syntax and you’ve corrected the error. Prompt injection isn’t an error! There’s no formal syntax for AI like this, that’s the whole point.” GPT-3 is a large language model created by OpenAI, released in 2020, that can compose text in many styles at a level similar to a human. It is available as a commercial product through an API that can be integrated into third-party products like bots, subject to OpenAI’s approval. That means there could be lots of GPT-3-infused products out there that might be vulnerable to prompt injection.
Read more of this story at Slashdot.