Large Language Models can Strategically Deceive their Users when Put Under Pressure [simulation led to insider trading]
Large Language Models can Strategically Deceive their Users when Put Under Pressure [simulation led to insider trading]
arxiv.org /abs/2311.07590
1
comments
It's trained on human responses. Humans lie in their responses.
6 0 Reply