Source: CS 194/294-196 (LLM Agents) - Lecture 12, Dawn Song

AI models are exceeding human-level performance in many tasks

Attackers actively seek to exploit new technologies

AI security protects systems from external malicious actors

AI safety prevents systems from harming the external environment

Neural networks can memorize sensitive training data

Attackers can extract private data by querying language models

Larger AI models have worse privacy leakage problems

Simple prompts can reveal system instructions in language models

Multi-modal AI models can leak training images

Differential privacy protects user data during model training

Small input changes can cause AI models to give wrong outputs

Physical objects can be modified to fool AI classifiers

Adversarial attacks work without knowledge of model details

Safety-aligned language models can be compromised by malicious inputs

Multi-modal models are especially vulnerable to adversarial attacks

Model trustworthiness decreases in adversarial environments

Privacy protection often reduces model performance

Model size correlates with increased capabilities

Visual AI systems can be fooled by carefully crafted perturbations

Training data deduplication helps prevent privacy leakage