对抗样本攻击简单学习

很久以前就听说过对抗样本攻击,最近看Diffusion Model的时候也注意到了GAN相关,所以这里对对抗样本攻击做一个简单记录.

先简短写一份FGSM,写完之后才发现有一个叫AISafety的平台,后续准备再在这个平台上面研究一下.

FGSM

首先torch官方也给出了对抗样本相关poc,可以直接进行参考. FGSM.

关于FGSM,首先肯定是最有名的熊猫图了 panda

从torch官网给出的例子来看,FGSM是在知道模型参数情况下的白盒攻击. 思路主要是通过求lossfunction对x的梯度,对x往最大化loss的方向进行移动,同时做出对epsilon的上限限制以及range的clip.

说清楚了之后其实就是上面这一句话, 然后这里贴一个github上面找到的另一个fgsm_attack的主要函数方便分析.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# https://github.com/Harry24k/FGSM-pytorch/blob/master/FGSM.ipynb
def fgsm_attack(model, loss, images, labels, eps) :

images = images.to(device)
labels = labels.to(device)
images.requires_grad = True

outputs = model(images)

model.zero_grad()
cost = loss(outputs, labels).to(device)
cost.backward()
attack_images = images + eps * images.grad.sign()
attack_images = torch.clamp(attack_images, 0, 1)

return attack_images

后续自己跑回kaggle复用了一下digit-recognizer,并且对添加噪音之后的图进行了输出查看.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def show_images(datset, num_samples=20, cols=5, label=None):
""" Plots some samples from the dataset """
plt.figure(figsize=(15,15))
for i, img in enumerate(datset):
if i == num_samples:
break
ax = plt.subplot(int(num_samples/cols) + 1, cols, i + 1)
ax.set_title(label[i])
plt.imshow(img[0])

epsilons = [0, .05, .1, .15, .2, .25, .3]
x_train, x_test, y_train, y_test = train_test_split(train_x, train_y, test_size=0.2)
dataset = TensorDataset(x_train, y_train)
train_iter = DataLoader(dataset, batch_size=batch_size, shuffle=True)
# Run test for each epsilon
for eps in epsilons:
print(f"Epsilon: {eps}")
correct_before = 0
correct_after = 0
total = 0
for x, y in train_iter:
x = torch.clamp(x, 0, 1)
fgsm_x = fgsm_attack(model, loss, x, y.long(), eps)
y_pred = model(x).argmax(axis=1)
total += len(y)
correct_before += sum(y_pred == y)
y_pred_after = model(fgsm_x).argmax(axis=1)
correct_after += sum(y_pred_after == y)
last_x = x

last_y_before = y_pred
last_y_after = y_pred_after

show_images(last_x[:5].detach().cpu(), num_samples=5, cols=5, label=last_y_before[:5])
show_images(fgsm_x[:5].detach().cpu(), num_samples=5, cols=5, label=last_y_after[:5])
print(f'before attack: {correct_before/total}')
print(f'after attack: {correct_after/total}')
# Epsilon: 0
# before attack: 0.9817559123039246
# after attack: 0.9817559123039246
# Epsilon: 0.05
# before attack: 0.9817559123039246
# after attack: 0.9459821581840515
# Epsilon: 0.1
# before attack: 0.9817559123039246
# after attack: 0.8537499904632568
# Epsilon: 0.15
# before attack: 0.9817559123039246
# after attack: 0.6502678394317627
# Epsilon: 0.2
# before attack: 0.9817559123039246
# after attack: 0.3650892674922943
# Epsilon: 0.25
# before attack: 0.9817559123039246
# after attack: 0.14562499523162842
# Epsilon: 0.3
# before attack: 0.9817559123039246
# after attack: 0.04541666433215141

这里一开始我自己并没有做clip,导致FGSM即使epsilon设置0也会导致性能巨幅下降,后续在train里面带上了clip并且对模拟攻击部分也带上了相应处理,顺利解决. 这里epsilon为0时两者性能接近,而随着epsilon的增大模型性能开始下降,通过后面图片的输出发现其实主要部分还是不变的, 不过颜色的处理应该还是有一些问题, 其实仔细一看差别还是蛮大的. epsilon

并且目前CV方向的模型还处理的不是很熟练,灰度处理之后应该会好一些或者至少能看出更多问题. 暂时先写一个FGSM,后续有空还会继续看其他攻击方法.