XOR as a Neural Network

So I’m really just ripping this from Richard Socher’s course on Deep Learning through NLP (CS224d @ Stanford), so I’m just writing down my understand of it as I go to maybe spread the knowledge. To quote the assignment:

It is well-known that a single linear classifier cannot represent the XOR function $x \oplus y$, depicted below: there is no way to draw a single line that can separate the red and magenta (square) points from the blue and cyan (circle) points.

A two-layer neural network, however, can separate this pattern easily. Below, we give you a simple dataset in two dimensions that represents a noisy version of the XOR pattern. Your task is to hand-pick weights for a very simple two-layer network, such that it can separate the red/magenta points from the blue/cyan points.

The network uses the following equations for $W \in {\mathbb R}^{2 \times 2}$ and $U \in {\mathbb R}^2$

This can be modelled by the following graph.

The question becomes: how do you choose the weights $W, U, b_1, b_2$ to create this sort of separation. Well the way I thought about it was this (not really sure if it’s super correct, but whatever).

Let’s start with the bottom layer: $h$. We also define:

for the first layer say we want to arbitrarily split the data in two. So, if we have

then our first neural layer hypothesis will look like:

The goal of the top level of the neural network is to literally draw a line through the data and say: “anything above this line is one class, anything below is another”. That’s what the sigmoid function is doing. The weights just allow you to configure how this line is drawn, and the number of variables say how many lines are drawn.

For example, in the hidden layer, $h$ has 2 dimensions and therefore allows us to draw two lines through the data in this layer. (This is maybe an oversimplified and slightly off view of it, but it helps me to think about it this way.) So now, in the first layer, we say $h_1$ will represent the red group of data and $h_2$ will represent the magenta group of data. How can we draw a line through the data to make this split. Well for the red group $x_2 = .5-x_1$ represents a good split and for the magenta group $x_2 = 1.5 - x_1$ represents a good split. But, we want to say that for anything below the line is the red group will represent $h_1=1$ so we just reverse this equation. Now, we can simply map these equations to the weight parameters (note, the sigmoid function will simply remap anything on the opposite sides of the line to 0, 1).

Now, with this we will remap the data such that a graph of $h_1,h_2$ will look like:

Now, the data is split in such a way that we can actually just draw a line through it and remap the data such that we have our XOR function.

We can draw the line at $h_1+h_2-.5 = 0$ and thus we will have our data split if we let $q=1, r=1, \beta_2=-.5$.

And the output looks like:

And there you have it, we made an XOR function out of a neural network and manually set the weights. I hope this gives some insight into how a NN works in finding the weights it uses. Really messy code for this can be found on my github, but most of it was taken from Socher’s course.

Written on August 6, 2015