The Zen of Gradient Descent | Moody Rd - http://mrtz.org/blog...
Nice article explaining a principled way of deriving Nesterov's method. Also makes the interesting point that Nesterov is more brittle than standard gradient descent, and thus may be less useful in practice. - Michael Nielsen