Deep Learning - Math to Code
1 Basic Neuron
You’ve got $W$ your vector of weights, $x$ your data, and $b$ a bias term (usually equal to one).
Using these you want to ask: with $x$ what is the output $h$?
Context: Let’s apply to NLP. $x$ is a set of word vectors comprising a sentence. I want to get $h$ which tells me if this sentence is talking about food of some sort.
i.e. Oh my god, that was so freaking delicious. -> 1
Now, we want some probability distribution such that with the proper set of inputs we actually get the proper $h$. Well, we can formulate as:
Why do we want a bias term??? Well, here’s a discussion, but typicall it’s to shift the activation function to between 0 and 1 or something like that.
What’s $f$? It’s your probability distribtion function (sort of). Typically it’s the sigmoid function, which looks pretty.
Or the equally pretty hyperbolic tangent function:
Setup for later: Note, the derivatives of these functions are pretty too.