A common analogy for explaining gradient descent goes like the following: a person is stuck in the mountains during heavy fog, and must navigate their way down. The natural way they will approach this is to look at the slope of the visible ground around them and slowly work their way down the mountain by following the downward slope.
This is a pretty fun thought experiment carried through into real code. I’m a little sad that it stops where it does, though. I’d love to have seen the paths created for RMS Prop, the Adam optimiser, learning rate annealing1, etc. Also: It would be fun to see what would happen if this was expended to include elevations below sea level.
- Which I can’t find a good link for. This is where the gradually reduce the learning rate over the lifetime of training. ↩