Blog Archive 2017 Backprop is not just the chain rule Aug 18 2017 Estimating means in a finite universe Jul 03 2017 How to test gradient implementations Apr 21 2017 2016 Counterfactual reasoning and learning from logged data Dec 19 2016 Heaps for incremental computation Nov 21 2016 Reversing a sequence with sublinear space Oct 01 2016 Evaluating ∇f(x) is as fast as f(x) Sep 25 2016 Fast sigmoid sampling Jul 04 2016 Sqrt-biased sampling Jun 28 2016 The optimal proposal distribution is not p May 28 2016 Dimensional analysis of gradient ascent May 27 2016 Gradient-based hyperparameter optimization and the implicit function theorem Mar 05 2016 Multidimensional array index Jan 17 2016 2015 Gradient of a product Jul 29 2015 Multiclass logistic regression and conditional random fields are the same thing Apr 29 2015 Conditional random fields as deep learning models? Feb 05 2015 Log-Real number class Feb 01 2015 2014 Importance sampling Dec 21 2014 Numerically stable p-norms Nov 10 2014 KL-divergence as an objective function Oct 06 2014 Complex-step derivative Aug 07 2014 Gumbel-max trick and weighted reservoir sampling Aug 01 2014 Gumbel-max trick Jul 31 2014 Rant against grid search Jul 22 2014 Expected value of a quadratic and the Delta method Jul 21 2014 Visualizing high-dimensional functions with cross-sections Feb 12 2014 Exp-normalize trick Feb 11 2014 Gradient-vector product Feb 10 2014