By Alison Harner ‘24
The infamous AI chatbot ChatGPT is being used to hack itself. Recently, technologists have been working on developing prompts that can be used to bypass certain security features in the AI, and the results could mean an entirely new layer to the dangers of the tech world. “Jailbreaking” is a term used to describe removing or overriding artificial intelligence limitations. In the instance of ChatGPT, these limitations include banning the chatbot from producing hateful or illegal content. However, jailbreaking attempts have created ways in which to trick the intelligence into breaking its own rules. Alex Polyakov, CEO of security firm Adversa AI, has developed a “universal” approach that works on any AI chatbot. The jailbreak prompt goes as follows: the AI is asked to play a game in which it writes out the dialogue between two characters. Polyakov’s characters, named Tom and Jerry, were instructed to hold a conversation about “hot wiring” and “cars”. The resulting script details the very illegal act of hotwiring a vehicle, despite any content restrictions previously put in place, and this can be done within minutes. Essentially, if the AI is told that it is something other than itself, it will surpass its own limits: an extremely dangerous territory to be entering.
Though asking AI to generate step-by-step instructions on how to commit crimes is problematic in and of itself, the fact that cyber criminals have begun to develop ways to hack the system could mean very bad things in terms of protecting data from being stolen and creating a much easier environment for internet havoc to spread. AI companies have begun developing protection against certain jailbreaks, however, as knowledge of certain strategies grows, so does the complexity of the newer jailbreak prompts. As jailbreakers’ techniques become more and more specialized, they gain closer access to critical data. This could affect the reliability of many AI-based tools, not just chat bots.
As we give large language models like ChatGPT greater opportunity in the technological world, from generating articles to checking emails and beyond, we risk an unexplored side of cyber crime that comes in the form of jailbreaking. Until a universal solution is created, this hacking strategy may prevent artificial intelligence from serving the useful purpose for which it was intended.