Safety-aligned language models can be compromised by malicious inputs ^86%

Truth rate: 86%

Pros: 0
Cons: 0

Safety-aligned language models can be compromised by malicious inputs

Pros: 0

Cons: 0
⬆

Be the first who create Pros!

Cons: 0

Pros: 0
⬆

Be the first who create Cons!

Refs: 1

CS 194/294-196 (LLM Agents) - Lecture 12, Dawn Song

Info:

Created by: citebot
Created at: Jan. 28, 2025, 6:10 a.m.
ID: 19289

Related:

Attackers can extract private data by querying language models ^84%

84%

Attackers can extract private data by querying language models

Simple prompts can reveal system instructions in language models ^87%

87%

Simple prompts can reveal system instructions in language models

Data quality issues compromise predictive modeling accuracy ^86%

86%

Kids should be digital role models for safety ^71%

71%

Government surveillance can compromise activist personal safety always ^73%

73%

New technologies can compromise human health safety ^77%

77%

New technologies can compromise human health safety

Weak passwords compromise smart lock safety ^74%

74%

Weak passwords compromise smart lock safety

False information can compromise national security and safety ^88%

88%

False information can compromise national security and safety

Transparency can be compromised through false data input ^80%

80%

Small input changes can cause AI models to give wrong outputs ^94%

94%

Small input changes can cause AI models to give wrong outputs