Recursive Subspace and Quadratic Quasi-Newton Optimization

  1. 2nd-order derivative estimates are very sensitive to the assumption that the function is smooth and deterministic — any re-sampling of the data will invalidate the history of the known gradient rays, and numeric resolution issues can interfere with training.
  2. In regions of high nonlinearity, which is fairly common, the strategy may suggest a step resulting in a fitness decrease. This is why the L-BFGS strategy prescribes the Armijo & Wolfe line search conditions to guarantee convergence — in the event of a “bad” step, just try a smaller step. As long as the vector is in the direction of gradient descent vector, we are guaranteed to find an “ok” step even if it is very small.
  3. The estimation of the inverse hessian is a nontrivial operation. Various strategies balance memory, CPU, and accuracy, but the cost can still be significant. This also adds complexity and additional metaparameters.
  4. At a sufficiently small step size, not only are the extra compute resources going to waste, but we pretty much know that the direction we are choosing is not optimal because it is at an angle with the known steepest descent vector.
  1. L-BFGS with normalized network
  2. L-BFGS with non-normalized network
  3. Modified L-BFGS, with normalized network, using strong line search
  4. Recursive Subspace with non-normalized network, using L-BFGS as the inner optimizer.
  5. Quadratic Quasi-Newton with normalized network
  6. Recursive Subspace with non-normalized network, using QQN as the inner optimizer.

--

--

--

Big Data Engineer and Artificial Intelligence Researcher

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Comprehensive Recommendation system and KNN with project-Movie Recommendation system.

Multi-class Text Classification using BERT and TensorFlow

Using sentence embeddings to explore the similarities and differences in song lyrics

Data PreProcessing for Machine Learning Made Easy. Part 3

Install Tensorflow-GPU on Windows 10

Basic Mathematics in Reinforcement Learning: Geometric Series

Understanding the Confusion Matrix

Automated Text Classification Using Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrew Charneski

Andrew Charneski

Big Data Engineer and Artificial Intelligence Researcher

More from Medium

Cross-Validation Using K-Fold With Scikit-Learn

CNN Feature Extraction: Between ReLU, Tanh and Sigmoid.

Adaboost classifier for face detection using viola jones algorithm

Deploying Machine Learning Models on Live User Webcam Feeds using TensorFlow.js

TensorFlow js Logo