Thursday, 11 January 2018

Hacking FaceNet using Adversarial examples

With the rise in popularity of face recognition systems with deep learning and it's application in security/ authentication, it is important to make sure that it is not that easy to fool them. I recently finished the 4th course on where there is an assignment which asks us to build a face recognition system - FaceNet. While I was working on the assignment, I couldn't stop thinking about how easy it is to fool it with adversarial examples. In this post I will tell you how I managed to do it.

First off, some basics about FaceNet. Unlike image recognition systems which map every image with a class, it is not possible to assign a class label to every face in face recognition. This is because one, there are way too many faces that a system should handle in the real world to assign class to each of them and two, if there are new people the system should handle, it can't do it. So, what we do is, we build a system that learns similarities and dissimilarities. Basically, there is a neural network similar to what we have in image recognition and instead of applying softmax in the end, we just take the logits as embedding for the given image input and then minimize something called the triplet loss.  Consider face A, we have a positive match P and negative match N. If f is the embedding function and L is the triplet loss, we have this:

Triplet loss

Basically, it is incentivizing small distance between A - P and large distance between A - N. Also, I really recommend watching Ian Goodfellow's lecture from Stanford's CS231n course if you want to know about adversarial examples.

Like I said earlier, this thought came to me while doing an assignment from 4th course from which can be found here and I have built on top of it.  The main idea here is to find small noise that when added to someone's photo although causing virtually no visual changes, can make faceNet identify them as the target.

Benoit (attacker)
Add noise
Kian Actual (Target)

First lets load the images of the attacker Benoit and the target Kian.

Now say that the attacker image is A` and the target image is T. We want to define triplet loss to achieve two things:

  1. Minimize distance between A` and T
  2. Maximize distance between A` and A` (original)
In other words the triplet loss L is:

L (A, P, N) = L (A`, T, A`)

Now, let's compute the gradient of the logits with respect to the input image 

These gradients are used to obtain the adversarial noise as follows :

noise = noise - step_size * gradients

According to the assignment, a l2 distance of the embeddings of less than 0.7 indicates that two faces have the same person. So lets do that.

The distance decreases from 0.862257 to 0.485102 which is considered enough in this case.

L2 distance between embeddings of attacker and target
This is impressive because, all this is done while not altering the image visibly just by adding a little calculated noise!

Also note that the l2 scores indicate that the generated image is more of Kian than Benoit in spite of looking practically identical to Benoit. So there you go, adversarial example generation for FaceNet.