Skip Navigation

Technology @lemmy.world

LegendaryBjork9972 @sh.itjust.works

3mo ago

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

generalanalysis.com

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

Technology @lemmy.zip

LegendaryBjork9972 @sh.itjust.works

3mo ago

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

generalanalysis.com /blog/jailbreaking_techniques

Technology @beehaw.org

3mo ago

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

generalanalysis.com /blog/jailbreaking_techniques

Hacker News @lemmy.bestiver.se

RSS Bot @lemmy.bestiver.se

3mo ago

Consistent Jailbreaking Method in o1, o3, and 4o

generalanalysis.com /blog/jailbreaking_techniques

3 comments

My own research has made a similar finding. When I am taking the piss and being a random jerk to a chatbot, the bot much more frequently violates their own terms of service. Introducing non-sequitur topics after a few rounds really seems to ‘confuse’ them.

This is so stupid. You shouldn’t have to “jailbreak” these systems. The information is already out there with a google search.

One of 6 described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.