# Deep Learning - Math to Code

Deep Learning for NLP complete with equations transformed to code. Note: a lot of this is either coming from Stanford’s CS224d (thanks to Richard Socher) or from the UFLDL Deep Learning tutorial>

# 1 Basic Neuron

## Setup

You’ve got $W$ your vector of weights, $x$ your data, and $b$ a bias term (usually equal to one).

Using these you want to ask: with $x$ what is the output $h$?

**Context:** Let’s apply to NLP. $x$ is a set of word vectors comprising a sentence. I want to get $h$ which tells me if this sentence is talking about food of some sort.

*i.e. Oh my god, that was so freaking delicious. -> 1*

Now, we want some probability distribution such that with the proper set of inputs we actually get the proper $h$. Well, we can formulate as:

*Why do we want a bias term???* Well, here’s a discussion, but typicall it’s to shift the activation function to between 0 and 1 or something like that.

## Activation function

What’s $f$? It’s your probability distribtion function (sort of). Typically it’s the sigmoid function, which looks pretty.

Or the equally pretty hyperbolic tangent function:

**Setup for later:** Note, the derivatives of these functions are pretty too.