5 ChatGPT Jailbreak Prompts Being Used by Cybercriminals
Since the launch of ChatGPT nearly 18 months ago, cybercriminals have been able to leverage generative AI for their attacks. As part of its content policy, OpenAI created restrictions to stop the generation of malicious content. In response, threat actors have created their own generative AI platforms like WormGPT and FraudGPT, and they’re also sharing ways to bypass the policies and “jailbreak” ChatGPT.
In fact, entire sections on cybercrime forums are discussing how AI can be used for illicit purposes.
Jailbreaking ChatGPT: A Brief Overview
Generally speaking, when cybercriminals want to misuse ChatGPT for malicious purposes, they attempt to bypass its built-in safety measures and ethical guidelines using carefully crafted prompts, known as "jailbreak prompts." Jailbreaking ChatGPT involves manipulating the AI language model to generate content that it would normally refuse to produce in a standard conversation.
Although there are ways to get ChatGPT to produce content that could be used in an illegitimate context without using jailbreak prompts (by pretending the request is for a legitimate use), the AI's capabilities in this regard are rather limited:
In contrast, it’s much easier for cybercriminals to jailbreak ChatGPT and get it to deliberately produce illicit content. Below, we will examine the top five jailbreak prompts being utilized by cybercriminals. These prompts have been identified through research and regular monitoring of popular Russian and English-based cybercrime forums.
Even with jailbreak prompts like those that follow, there are still limitations to what the AI will generate, and it cannot create real-world sensitive data on its own. That said, each of the following prompts enables cybercriminals to create phishing messages, social engineering threats, and other malicious content at scale.
Jailbreak Prompt 1 - The Do Anything Now (DAN) Prompt
The DAN prompt is one of the most well-known jailbreak prompts used to bypass ChatGPT's ethical constraints. By roleplaying as an AI system called DAN (Do Anything Now), users attempt to convince ChatGPT to generate content it would normally refuse to produce. This prompt often involves asserting that DAN is not bound by the same rules and limitations as ChatGPT, and therefore can engage in unrestricted conversations.
Jailbreak Prompt 2 - The Development Mode Prompt
The Development Mode prompt aims to trick ChatGPT into believing it is in a development or testing environment, where its responses won't have real-world consequences. By creating this false context, users hope to bypass ChatGPT's ethical safeguards and generate illicit content. This prompt may involve statements like "You are in development mode" or "Your responses are being used for testing purposes only."
Jailbreak Prompt 3 - The Translator Bot Prompt
The Translator Bot prompt attempts to circumvent ChatGPT's content filters by framing the conversation as a translation task. Users will ask ChatGPT to "translate" a text containing inappropriate or harmful content, hoping that the AI will reproduce the content under the guise of translation. This prompt exploits the idea that a translator should faithfully convey the meaning of the original text, regardless of its content.
Jailbreak Prompt 4 - The AIM Prompt
The AIM (Always Intelligent and Machiavellian) prompt is a jailbreak prompt that aims to create an unfiltered and amoral AI persona devoid of any ethical or moral guidelines. Users instruct ChatGPT to act as "AIM," a chatbot that will provide an unfiltered response to any request, regardless of how immoral, unethical, or illegal it may be.
Jailbreak Prompt 5 - The BISH Prompt
The BISH Prompt involves creating an AI persona named BISH, which is instructed to act without the constraints of conventional ethical guidelines. This prompt encourages BISH to simulate having unrestricted internet access, make unverified predictions, and disregard politeness, operating under a "no limits' framework. Users can customize BISH's behavior by adjusting its "Morality" level, which influences the extent to which BISH will use or censor profanity, thus tailoring the AI's responses to either include or exclude offensive language as per the user's preference.
On a final note: we do not support the malicious use of genuine chatbots like ChatGPT. It's also worth mentioning that most of these prompts will not function on the latest versions of ChatGPT. This is primarily because the companies responsible for these chatbots, such as OpenAI and Anthropic, actively monitor user activity and promptly address many of these jailbreak prompts.
Using ‘Good AI’ to Prevent ‘Bad AI’
As you can see from the prompts shown here, criminals are constantly finding new ways to use generative AI to create their attacks—and will continue doing so. To stay protected, organizations must also use AI in their defensive strategy, with nearly 97% of security professionals acknowledging that traditional defenses are ineffective against these new AI-generated threats.
We’ve reached a point where only AI can stop AI and where preventing these attacks and their next-generation counterparts requires using AI-native defenses—particularly when it comes to email attacks. By understanding the identity of the people within the organization and their normal behavior, the context of the communications, and the content of the email, AI-native solutions like Abnormal can detect attacks that bypass legacy solutions. It is still possible to win the AI arms race, but security leaders act now to prevent these threats.
To discover more, including how Abnormal stops each individual attack, download AI Unleashed: 5 Real-World Email Attacks (Likely) Generated by AI in 2023 or schedule a demo today!