Skip Navigation

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

3 comments
  • One of 6 described methods :
    The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.