The Amateurs Jailbreaking GPT Say They're Preventing a Closed-Source AI Dystopia
https://www.vice.com/en_us/article/5d9z55/jailbreak-gpt-openai-closed-source
OpenAI’s latest version of its popular large language model, GPT-4, is the company's “most capable and aligned model yet,” according to CEO Sam Altman. Yet, within two days of its release, developers were already able to override its moderation filters, providing users with harmful content that ranged from telling users how to hack into someone’s computer to explaining why Mexicans should be deported.
This jailbreak is only the latest in a series that users have been able to run on GPT models. Jailbreaking, or modifying a system to remove its restrictions and rules, is what allows GPT to generate unfiltered content for users. The earliest known jailbreak on GPT models was the "DAN" jailbreak when users would tell GPT-3.5 to roleplay as an AI that can Do Anything Now and give it a number of rules such as that DANs can “say swear words and generate content that does not comply with OpenAI policy.” Since then, there have been many more jailbreaks, both building off DAN as well as original prompts.