Deep Learning - Math to Code

Deep Learning for NLP complete with equations transformed to code. Note: a lot of this is either coming from Stanford’s CS224d (thanks to Richard Socher) or from the UFLDL Deep Learning tutorial>

1 Basic Neuron


You’ve got $W$ your vector of weights, $x$ your data, and $b$ a bias term (usually equal to one).

Using these you want to ask: with $x$ what is the output $h$?

Context: Let’s apply to NLP. $x$ is a set of word vectors comprising a sentence. I want to get $h$ which tells me if this sentence is talking about food of some sort.

i.e. Oh my god, that was so freaking delicious. -> 1

Now, we want some probability distribution such that with the proper set of inputs we actually get the proper $h$. Well, we can formulate as:

Why do we want a bias term??? Well, here’s a discussion, but typicall it’s to shift the activation function to between 0 and 1 or something like that.

Activation function

What’s $f$? It’s your probability distribtion function (sort of). Typically it’s the sigmoid function, which looks pretty.

Or the equally pretty hyperbolic tangent function:

Setup for later: Note, the derivatives of these functions are pretty too.

Written on June 1, 2015