CiteBar
  • Log in
  • Join

Source: CS 194/294-196 (LLM Agents) - Lecture 12, Dawn Song

Main ideas:

20
AI models are exceeding human-level performance in many tasks 96%
96%
u1727780031663's avatar u1727780328672's avatar u1727780324374's avatar u1727780309637's avatar u1727780115101's avatar

AI models are exceeding human-level performance in many tasks

19
Attackers actively seek to exploit new technologies 98%
98%
u1727779906068's avatar u1727780148882's avatar u1727780034519's avatar

Attackers actively seek to exploit new technologies

18
AI security protects systems from external malicious actors 89%
89%
u1727780173943's avatar u1727694227436's avatar u1727779953932's avatar u1727779950139's avatar u1727779906068's avatar u1727780333583's avatar u1727780314242's avatar u1727780107584's avatar u1727780291729's avatar u1727780282322's avatar

AI security protects systems from external malicious actors

17
AI safety prevents systems from harming the external environment 89%
89%
u1727780338396's avatar u1727779950139's avatar u1727780247419's avatar u1727779906068's avatar

AI safety prevents systems from harming the external environment

16
Neural networks can memorize sensitive training data 92%
92%
u1727780027818's avatar u1727780140599's avatar u1727780224700's avatar u1727780190317's avatar u1727780173943's avatar

Neural networks can memorize sensitive training data

15
Attackers can extract private data by querying language models 84%
84%
u1727779945740's avatar u1727694216278's avatar u1727780140599's avatar u1727780050568's avatar u1727694254554's avatar u1727780237803's avatar u1727779927933's avatar u1727780040402's avatar u1727780037478's avatar u1727780216108's avatar

Attackers can extract private data by querying language models

14
Larger AI models have worse privacy leakage problems 80%
80%
u1727780247419's avatar u1727694244628's avatar u1727780091258's avatar u1727780067004's avatar

Larger AI models have worse privacy leakage problems

13
Simple prompts can reveal system instructions in language models 87%
87%
u1727779919440's avatar u1727780252228's avatar u1727780031663's avatar u1727780140599's avatar

Simple prompts can reveal system instructions in language models

12
Multi-modal AI models can leak training images 88%
88%
u1727779915148's avatar u1727780127893's avatar u1727780199100's avatar u1727779933357's avatar u1727780342707's avatar u1727780318336's avatar u1727780140599's avatar

Multi-modal AI models can leak training images

11
Differential privacy protects user data during model training 89%
89%
u1727780136284's avatar u1727780034519's avatar u1727780243224's avatar u1727780194928's avatar

Differential privacy protects user data during model training

10
Small input changes can cause AI models to give wrong outputs 94%
94%
u1727780010303's avatar u1727694254554's avatar u1727780333583's avatar u1727780031663's avatar u1727779923737's avatar u1727780182912's avatar u1727779919440's avatar u1727780282322's avatar u1727780078568's avatar u1727780247419's avatar

Small input changes can cause AI models to give wrong outputs

9
Physical objects can be modified to fool AI classifiers 90%
90%
u1727694239205's avatar u1727779953932's avatar u1727780107584's avatar u1727780202801's avatar u1727779933357's avatar u1727780007138's avatar u1727780177934's avatar u1727780299408's avatar

Physical objects can be modified to fool AI classifiers

8
Adversarial attacks work without knowledge of model details 88%
88%
u1727779927933's avatar u1727780273821's avatar u1727780247419's avatar u1727780007138's avatar

Adversarial attacks work without knowledge of model details

7
Safety-aligned language models can be compromised by malicious inputs 86%
86%
u1727780071003's avatar u1727780207718's avatar u1727780010303's avatar u1727780186270's avatar

Safety-aligned language models can be compromised by malicious inputs

6
Multi-modal models are especially vulnerable to adversarial attacks 86%
86%
u1727779950139's avatar u1727780309637's avatar u1727780046881's avatar u1727780140599's avatar u1727780040402's avatar u1727779910644's avatar u1727780124311's avatar u1727780228999's avatar

Multi-modal models are especially vulnerable to adversarial attacks

5
Model trustworthiness decreases in adversarial environments 90%
90%
u1727779958121's avatar u1727780107584's avatar u1727780091258's avatar u1727780347403's avatar u1727780074475's avatar u1727780156116's avatar u1727780132075's avatar

Model trustworthiness decreases in adversarial environments

4
Privacy protection often reduces model performance 86%
86%
u1727780169338's avatar u1727780144470's avatar u1727780299408's avatar

Privacy protection often reduces model performance

3
Model size correlates with increased capabilities 93%
93%
u1727780252228's avatar u1727694221300's avatar u1727780119326's avatar u1727780194928's avatar u1727780071003's avatar u1727779984532's avatar u1727779979407's avatar u1727780053905's avatar

Model size correlates with increased capabilities

2
Visual AI systems can be fooled by carefully crafted perturbations 96%
96%
u1727780127893's avatar u1727779979407's avatar u1727779927933's avatar u1727780216108's avatar u1727780338396's avatar u1727780182912's avatar u1727780295618's avatar u1727780169338's avatar

Visual AI systems can be fooled by carefully crafted perturbations

1
Training data deduplication helps prevent privacy leakage 80%
80%
u1727779988412's avatar u1727780173943's avatar u1727779945740's avatar u1727780043386's avatar

Training data deduplication helps prevent privacy leakage

View Original Source
© CiteBar 2021 - 2025
Home About Contacts Privacy Terms Disclaimer
Please Sign In
Sign in with Google