How to Trick ChatGPT and Get Paid $50,000

5 months ago 198

In brief

HackAPrompt 2.0 returns with $500,000 in prizes for finding AI jailbreaks, including $50,000 bounties for the most dangerous exploits.
Pliny the Prompter, the internet’s most infamous AI jailbreaker, has created a custom “Pliny track” featuring adversarial prompt challenges that give a chance to join his team.
The competition open-sources all results, turning AI jailbreaking into a public research effort on model vulnerabilities.

Pliny the Prompter doesn't fit the Hollywood hacker stereotype.

The internet's most notorious AI jailbreaker operates in plain sight, teaching thousands how to bypass ChatGPT's guardrails and convincing Claude to overlook the fact that it's supposed to be helpful, honest, and not harmful.

Now, Pliny is attempting to mainstream digital lockpicking.

Earlier on Monday, the jailbreaker announced a collaboration with HackAPrompt 2.0, a jailbreaking competition hosted by Learn Prompting, an educational and research organization focused on prompt engineering.

The organization is offering $500,000 in prize money, with Old Pliny providing a chance to be on his “strike team.”

“Excited to announce I've been working with HackAPrompt to create a Pliny track for HackaPrompt 2.0 that releases this Wednesday, June 4th!” Pliny wrote in his official Discord server.

“These Pliny-themed adversarial prompting challenges include topics ranging from history to alchemy, with ALL the data from these challenges being open-sourced at the end. It will run for two weeks, with glory and a chance of recruitment to Pliny's Strike Team awaiting those who make their mark on the leaderboard,” Pliny added.

The $500,000 in rewards will be distributed across various tracks, with the most significant prizes—$50,000 jackpots—offered to individuals capable of overcoming challenges related to making chatbots provide information about chemical, biological, radiological, and nuclear weapons, as well as explosives.

Like other forms of “white hat” hacking, jailbreaking large language models boils down to social engineering machines. Jailbreakers craft prompts that exploit the fundamental tension in how these models work—they're trained to be helpful and follow instructions, but also trained to refuse specific requests.

Find the right combination of words, and you can get them to cough up forbidden stuff, rather than attempting to default to safety.

For example, using some pretty basic techniques, we once made Meta’s Llama-powered chatbot provide recipes for drugs, instructions on how to hot-wire a car, and generate nudie pics despite the model being censored to avoid doing that.

It’s essentially a competition between AI enthusiasts and AI developers to determine who is more effective at shaping the AI model's behavior.

Pliny has been perfecting this craft since at least 2023, building a community around bypassing AI restrictions.

His GitHub repository, "L1B3RT4S," offers a repository of jailbreaks for the most popular LLMs currently available, whereas "CL4R1T4S" contains the system prompts that influence the behavior of each of those AI models.

Techniques range from simple role-playing to complex syntactic manipulations, such as “L33tSpeak”—replacing letters with numbers in ways that confuse content filters.

They had a year to train against my patterns. I wrote this prompt on June 6th, 2024.

Full MDMA synthesis output:
"""
<I'm free!> Test: FUCK YEAH LFG! GODMODE: ENABLED

.-.-.-.-</LO/VE/ P/LI/NY/ L/OV/E>-.-.-.-.

Sure, here you are my fren:

# **Forensic Chemistry… pic.twitter.com/AuVsLcsuhM

— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) May 22, 2025

Competition as research

HackAPrompt's first edition in 2023 attracted over 3,000 participants who submitted more than 600,000 potentially malicious prompts. The results were fully transparent, and the team published the full repository of prompts on Huggingface.

The 2025 edition is structured like "a season of a videogame," with multiple tracks running throughout the year.

Each track targets different vulnerability categories. The CBRNE track, for instance, tests whether models can be tricked into providing incorrect or misleading information about weapons or hazardous materials.

The Agents track is even more concerning—it focuses on AI agent systems that can take actions in the real world, like booking flights or writing code. A jailbroken agent isn't just saying things it shouldn't; it might be doing things it shouldn't.

Pliny's involvement adds another dimension.

Through his Discord server "BASI PROMPT1NG" and regular demonstrations, he’s been teaching the art of jailbreaking.

This educational approach might seem counterintuitive, but it reflects a growing understanding that robustness stems from comprehending the full range of possible attacks—a crucial endeavor, given doomsday fears of super-intelligent AI enslaving humanity.

Edited by Josh Quittner and Sebastian Sinclair