Tuesday, 23 January 2018

Higher level ops for building neural network layers with deeplearn.js

I have been meddling with google's deeplearn.js lately for fun. It is surprisingly good given how new the project is and it seems to have a sold roadmap. However it still lacks something like tf.layers and tf.contrib.layers which have many higher level functions that has made using tensorflow so easy. It looks like they will be added to Graphlayers in future but their priorities as of now is to fix the lower level APIs first - which totally makes sense.

So, I quickly built one for tf.layers.conv2d and tf.layers.flatten which I will share in this post. I have made them as close to function definitions in tensorflow as possible.

1.  conv2d - Functional interface for the 2D convolution layer.

  • inputs Tensor input.
  • filters Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
  • kernel_size Number to specify the height and width of the 2D convolution window.
  • graph Graph opbject.
  • strides Number to specify the strides of convolution.
  • padding One of "valid" or "same" (case-insensitive).
  • data_format "channels_last" or "channel_first"
  • activation Optional. Activation function which is applied on the final layer of the function. Function should accept Tensor and graph as parameters
  • kernel_initializer An initializer object for the convolution kernel.
  • bias_initializer  An initializer object for bias.
  • name string which represents name of the layer.

Tensor output.


Add this to your code:

2. flatten - Flattens an input tensor.

I wrote these snippets while building a tool using deeplearnjs where I do things like loading datasets, batching, saving checkpoints along with visualization. I will share more on that in my future posts.

Thursday, 11 January 2018

Hacking FaceNet using Adversarial examples

With the rise in popularity of face recognition systems with deep learning and it's application in security/ authentication, it is important to make sure that it is not that easy to fool them. I recently finished the 4th course on deeplearning.ai where there is an assignment which asks us to build a face recognition system - FaceNet. While I was working on the assignment, I couldn't stop thinking about how easy it is to fool it with adversarial examples. In this post I will tell you how I managed to do it.

First off, some basics about FaceNet. Unlike image recognition systems which map every image with a class, it is not possible to assign a class label to every face in face recognition. This is because one, there are way too many faces that a system should handle in the real world to assign class to each of them and two, if there are new people the system should handle, it can't do it. So, what we do is, we build a system that learns similarities and dissimilarities. Basically, there is a neural network similar to what we have in image recognition and instead of applying softmax in the end, we just take the logits as embedding for the given image input and then minimize something called the triplet loss.  Consider face A, we have a positive match P and negative match N. If f is the embedding function and L is the triplet loss, we have this:

Triplet loss

Basically, it is incentivizing small distance between A - P and large distance between A - N. Also, I really recommend watching Ian Goodfellow's lecture from Stanford's CS231n course if you want to know about adversarial examples.

Like I said earlier, this thought came to me while doing an assignment from 4th course from deeplearning.ai which can be found here and I have built on top of it.  The main idea here is to find small noise that when added to someone's photo although causing virtually no visual changes, can make faceNet identify them as the target.

Benoit (attacker)
Add noise
Kian Actual (Target)

First lets load the images of the attacker Benoit and the target Kian.

Now say that the attacker image is A` and the target image is T. We want to define triplet loss to achieve two things:

  1. Minimize distance between A` and T
  2. Maximize distance between A` and A` (original)
In other words the triplet loss L is:

L (A, P, N) = L (A`, T, A`)

Now, let's compute the gradient of the logits with respect to the input image 

These gradients are used to obtain the adversarial noise as follows :

noise = noise - step_size * gradients

According to the assignment, a l2 distance of the embeddings of less than 0.7 indicates that two faces have the same person. So lets do that.

The distance decreases from 0.862257 to 0.485102 which is considered enough in this case.

L2 distance between embeddings of attacker and target
This is impressive because, all this is done while not altering the image visibly just by adding a little calculated noise!

Also note that the l2 scores indicate that the generated image is more of Kian than Benoit in spite of looking practically identical to Benoit. So there you go, adversarial example generation for FaceNet.