in Link Post

High-Accuracy Low-Precision Training


This is an interesting sort of follow up to John Carmack’s note that neural nets can still work if you have major bugs in your maths, just so long as the sign is right. You can get away with using lower precision for your maths as well:

It turns out that yes, it is sometimes possible to get high-accuracy solutions from low-precision training—and here we’ll describe a new variant of stochastic gradient descent (SGD) called high-accuracy low precision (HALP) that can do it. HALP can do better than previous algorithms because it reduces the two sources of noise that limit the accuracy of low-precision SGD: gradient variance and round-off error.

Why is this a good thing? Half the precision means you can store twice as many values in the same amount of space. Which means you can keep more of your dataset in GPU memory. Which widens one of the major bottlenecks involved in training.