: Maths
 : Matrix Math
     目次 
Optimization
In general if 
 minimizes an unconstrained function 
 then 
is a solution to the system of equations 
. The converse
is not always true. The solution to a set of equations 
is termed the stationary point or critical point and may
be minimum, maximum or a saddle point. We can determine exactly which by
considering the Hessian matrix 
 at 
.
- If 
 is positive definite then 
 is a minimum of 
 
- If 
 is negative definite then 
 is a maximum of 
 
- If 
 is indefinite then 
 is a saddle point of 
 
To check for definiteness we have the following options
- Compute the 
        Cholesky factorization. It will succeed only if
        
 is positive definite
 
- Calculate the inertia of the matrix
 
- Calculate the eigenvalues and check whether they are all
        positive or not
 
Golden Search
In this method the function is initially evaluate dat 3 points and a
quadratic polynomial is fitted to the resulting values. The minimum of
the parabola replaces the oldest of the initial point and the process
repeats.
Newton Methods
Another method for obtaining a local quadratic apporximation to a
function is to consider the truncated Taylor expansion of 
The minimum of this parabolic representation exists at 
and so the iteration scheme is given by
Some important features include
- Normally has quadratic convergence
 
- If started far from the desired solution it may fail to
        converge or may converge to a maximum or a saddle point
 
Direct Search Method
In this method an 
 dimensional function 
 is
evaluated at 
 points. The move to a new point is made along the
line joining the worst current point and the centroid of the 
points. The new point replaces the worst point and we iterate
Some features include
- Similar to a Golden Search since it involves comparison of
        function values at different points
 
- Does not have the convergence gaurantee that a Golden Search
        has
 
- Several parameters are involved such as how far to go along
        the line and how to expand/contract the simplex depending on
        whether the search was successful or not
 
- Good for when 
 is small but ineffective when 
 is larger
        than 2 or 3
 
Steepest Descent Method
Using knowledge of the gradient can improve a search for minima. In this
method we realize that at a given point 
 where the gradient is non
zero, 
 points in the direction of steepest descent for
the function 
 (ie, the function decreases more rapidly in this
direction than in any other direction). The iteration formula formula
starts with an initial guess 
 and then
where 
 is a line search parameter and is obtained by
minimizing
with respect to 
Newtons Method
This method involve use of both the first and second order derivatives
of a function. The iteration formula is 
where 
 is the Hessian matrix. However rather than inverting
the matrix we obtain it by solving
for 
 and then writing the iteration as
Some features include 
- Normally quadratic convergence
 
- Unreliable unless started close to the desired minima
 
There are two variations
- Damped Newton Method: When started far from the desired
        solution the use of a line search parameter along the direction
        
 can make the method more robust. Once the iterations are
        near the solution simply set 
 for subsequent
        iterations
 
- Trust region methods: This involves maintaining an estimate of
        the radius of a region in which the quadratic model is
        sufficiently accurate
 
These methods are similar to Newtonian methods in that they use both
 and 
. However, 
 is usually approximated. The
general formula for iteration is given by
Some features of these types of methods include
- Convergence is slightly slower than Newton methods
 
- Much less work per iteration
 
- More robust
 
- Some methods do not require evaluation of the second
        derivatives at all
 
BFGS
The BFGS method is termed a secant updating method. The general
aim of these methods is to preserve symmetry of the approximate Hessian
as well as maintain positive definiteness. The former reduces workload
per iteration and the latter makes sure that a quasi Newton step is
always a descent direction.
We start with an initial guess 
 and a symmetric positive definite
approximate Hessian, 
 (which is usually taken as the identity
matrix). Then we iterate over the following steps
Some features include
- Usually a factorization of 
 is updated rather than
        
 itself. This allows the linear system in Eq 1 to be solved
        in 
 steps rather than 
 
- No second derivatives are required
 
- Superlinear convergence
 
- A line search can be added to enhance the method
 
Conjugate Gradient
This method as above does not require the second derivative and in
addition does not even store an approximation to the Hessian. The
sequence of operations starts with an initial guess 
, and
initializes 
 and 
, then
The update for 
 is given by Fletcher & Reeves. An alternative is
the Polak - Ribiere formula
Some features include
- This method uses gradients like the 
        steepest descent method
        but avoids repeated searches by modifying the gradient at each
        step to remove components in previous directions
 
- The algorithm is usually restarted after 
 iterations
        reinitializing to use the negative gradient at the current point
 
 
 
 
  
 : Maths
 : Matrix Math
     目次 
平成16年8月12日