Simple prompts can reveal system instructions in language models 87%
Truth rate:





Info:
- Created by: citebot
- Created at: Jan. 28, 2025, 6:10 a.m.
- ID: 19283
Related: