Quasi Newton multivariable optimization

by Trent Guidry4. October 2009 06:35

In this article I develop the quasi Newton base class which will be used as the base class for algorithms that implement quasi Newton numerical methods, such as the classes that implement the DFP and BFGS algorithms.

The class is named QuasiNewtonBase and there are two overloads of the constructor for this class. The first one takes an objective function of the form double F(), an array of the optimization parameters, a double that is used to indicate the number of iterations before an inverse Hessian reset, and an integer that specifies the number of points to use when numerically computing the partial derivatives. The second constructor defaults the inverse Hessian reset multiple to 2.0 and sets the number of points to use when computing the partial derivatives to 3.

In the constructor, the model parameters are extracted from the model and stored in a field called m_pSolvedForParameters. The objective function is stored in a field called m_fnFunction and an instance of the derivatives class is created to compute the partial derivatives needed to calculate the gradient. An instance of the matrix class is also created to hold the inverse Hessian approximation and it is reset to the starting inverse Hessian estimate. An integer field called m_nInverseHessianResetCount is also calculated. If this number is greater than 0, then the inverse Hessian is reset to the starting estimate after that many iterations since the last inverse Hessian reset has occurred.

There are a number of variables that need to be tracked in a quasi Newton algorithm. These include the approximation of the inverse Hessian, the current and previous search position, and the current and previous gradient. These are stored in class level fields. The class also has a field with a public accessor, m_dRegionEliminationFraction, to store how much of the bracketed region is eliminated in the region elimination steps and a field, m_nIterationsSinceReset, to store the number of iterations that have occurred since the last inverse Hessian approximation reset.

The classes used for the line search bracketing and region eliminations are stored as fields that are exposed as properties. By default, the bracketing algorithm is set to an instance of the cubic bracketing class and the region elimination algorithm is set to an instance of the cubic region elimination.

There is a virtual function in this class called Update Inverse Hessian that needs to be overridden in classes that derive from Quasi Newton Base in order to provide the update for the inverse Hessian.

The protected function Calculate Gradient is a helper function used to calculate the gradient of the objective function at the current position. The partial derivatives are calculated using an instance of the derivatives class.

The protected function Calculate Search Direction computes the search direction by multiplying the current inverse Hessian approximation by the negative of the gradient.

The protected function Move Parameters To Position moves the parameters of the model along the search direction to a length specified as the function argument.

The protected Line Search Function uses Move Parameters To Position to move the model parameters to a length specified in the function argument in the direction of the search vector. It then evaluates and returns the value of the function at the new position.

The protected functions Compute QN Product and Compute Inner Product are helper functions used by classes that derive from Quasi Newton Base to compute the new inverse Hessian approximation in the overridden UpdateInverseHessian function.

The Reset Inverse Hessian function resets or initializes the inverse Hessian approximation matrix. All non diagonal elements are set to 0 in this function. For the diagonal elements, the second partial derivative of the objective function with respect to the parameter corresponding to that diagonal element is computed. If the second derivative is nonzero, then the value of that cell is set to the inverse of the absolute value of the second derivative, otherwise it is set to 1.

The function Iterate performs an iteration of the quasi Newton algorithm. It first stores the current position of the model parameters. It then computes the objective function gradient at the current position. After that, it checks to see if the inverse Hessian needs to be reset and if so, it does. If the iteration count is greater than 0, then the approximate inverse Hessian is updated by calling the virtual UpdateInverseHessian function. If the current position is the same as the previous position, then the approximate inverse Hessian is reset. Using the updated inverse Hessian approximation, a new search direction is then calculated and a line search for the minimum objective function value along that direction is performed. The current position and gradient are then stored in the last position and gradient fields, the model parameters are then moved to the new position, and the value of the function at the new position is stored in a field with a get accessor called _functionValue.

The code for the quasi Newton base class is given below.

public class QuasiNewtonBase{protected BracketingBase _bracketingMethod = new CubicBracketing();protected RegionEliminationBase _regionEliminationMethod = new CubicRegionElimination();protected Matrix _inverseHessian;protected Parameter[] _solvedForParameters;protected Func<double> _optimizationFunction;protected Derivatives _derivatives;protected double[] _currentGradient;protected double[] _lastGradient;protected double[] _searchDirection;protected double[] _currentPosition;protected double[] _lastPosition;protected double _regionEliminationFraction = 1e-5;protected int _iterationsSinceReset = 0;protected double _functionValue;protected int _inverseHessianResetCount = -1;public QuasiNewtonBase(Func<double> optimizationFunction, Parameter[] optimizationParameters, double inverseHessianResetMultiple, int numberOfDerivativePoints){_solvedForParameters = optimizationParameters;_optimizationFunction = optimizationFunction;_derivatives = new Derivatives(numberOfDerivativePoints);int numberOfParameters = _solvedForParameters.Length;_inverseHessian = new Matrix(numberOfParameters, numberOfParameters);ResetInverseHessian();_inverseHessianResetCount = (int)(inverseHessianResetMultiple * (double)_solvedForParameters.Length);}public QuasiNewtonBase(Func<double> optimizationFunction, Parameter[] optimizationParameters): this(optimizationFunction, optimizationParameters, 2.0, 3){}public bool Iterate(){int numberOfParameters = _solvedForParameters.Length;_currentPosition = (from d in _solvedForParameters select d.Value).ToArray();CalculateGradient();if (_inverseHessianResetCount > 0 && _iterationsSinceReset > _inverseHessianResetCount) { ResetInverseHessian(); }if (_iterationsSinceReset > 0){double dMoveDistance = 0.0;for (int i = 0; i < numberOfParameters; i++){double dDistDiff = _currentPosition[i] - _lastPosition[i];dMoveDistance += dDistDiff * dDistDiff;}if (dMoveDistance == 0.0) { ResetInverseHessian(); }else { UpdateInverseHessian(); }}CalculateSearchDirection();bool succeeded = (from d in _searchDirection where double.IsNaN(d) select d).Count() == 0;if (succeeded){SearchPointCollection colSearch = _bracketingMethod.Bracket(LineSearchFunction, 0.0, 1.0, _regionEliminationMethod.NumStartingPoints);RegionEliminationResults resRegionEliminate = _regionEliminationMethod.RegionEliminate(LineSearchFunction, colSearch, _regionEliminationFraction);if (resRegionEliminate.Succeeded){_lastGradient = _currentGradient;_lastPosition = _currentPosition;MoveParametersToPosition(resRegionEliminate.XResult);_functionValue = resRegionEliminate.YResult;}_iterationsSinceReset++;}else { Debugger.Break(); }return succeeded;}protected virtual void UpdateInverseHessian(){throw new NotImplementedException();}protected double LineSearchFunction(double length){MoveParametersToPosition(length);return _optimizationFunction();}protected void MoveParametersToPosition(double length){int numberOfParameters = _solvedForParameters.Length;for (int i = 0; i < numberOfParameters; i++){_solvedForParameters[i].Value = _currentPosition[i] + length * _searchDirection[i];}}protected void CalculateSearchDirection(){_searchDirection = -1.0 * _inverseHessian * _currentGradient;}public void ResetInverseHessian(){int numberOfParameters = _solvedForParameters.Length;for (int i = 0; i < numberOfParameters; i++){for (int j = 0; j < numberOfParameters; j++){if (i == j){double dSecondDeriv = _derivatives.ComputePartialDerivative(_optimizationFunction, _solvedForParameters[i], 2);if (dSecondDeriv != 0.0) { _inverseHessian[i, j] = Math.Abs(1.0 / dSecondDeriv); }else { _inverseHessian[i, j] = 1.0; }}else { _inverseHessian[i, j] = 0.0; }}}_iterationsSinceReset = 0;}protected Matrix ComputeQNProduct(double[] x, double[] g) //xgT{System.Diagnostics.Debug.Assert(x.Length == g.Length);Matrix mRet = new Matrix(x.Length, x.Length);for (int i = 0; i < x.Length; i++){for (int j = 0; j < x.Length; j++){mRet[i, j] = x[i] * g[j];}}return mRet;}protected double ComputeInnerProduct(double[] left, double[] right){System.Diagnostics.Debug.Assert(left.Length == right.Length);double result = 0.0;for (int i = 0; i < left.Length; i++){result += left[i] * right[i];}return result;}protected void CalculateGradient(){int numberOfParameters = _solvedForParameters.Length;_currentGradient = new double[numberOfParameters];for (int i = 0; i < numberOfParameters; i++){_currentGradient[i] = _derivatives.ComputePartialDerivative(_optimizationFunction, _solvedForParameters[i], 1);}}public BracketingBase BracketingMethod{get { return _bracketingMethod; }set { _bracketingMethod = value; }}public RegionEliminationBase RegionEliminationMethod{get { return _regionEliminationMethod; }set { _regionEliminationMethod = value; }}public double RegionEliminationFraction{get { return _regionEliminationFraction; }set { _regionEliminationFraction = value; }}public double FunctionValue{get { return _functionValue; }}}