Review waiting, please be patient.
This may take 2 months or more, since drafts are reviewed in no specific order. There are 1,251 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.
Definition
editThe compact representation of a quasi-Newton matrix for the inverse Hessian or direct Hessian of a nonlinear objective function expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank- or rank- update of an initial matrix [1] . Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients in its definition . In particular, for or the rectangular matrices and the square symmetric systems depend on the 's and define the quasi-Newton representations
Applications
editBecause of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software [2][3][4]. When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients [5]. Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector is:
Background
editIn the context of the GMRES method, Walker[6] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of identity plus rank-1 matrices [5]. Specifically, for and when the product of rank-1 updates to the identity is The BFGS update can be expressed in terms of products of the 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations
(1) |
Recursive quasi-Newton updates
editA parametric family of quasi-Newton updates includes many of the most known formulas [7]. For arbitrary vectors and such that and general recursive update formulas for the inverse and direct Hessian estimates are
(2) |
(3) |
By making specific choices for the parameter vectors and well known methods are recovered
BFGS | PSB (Powell Symmetric Broyden) | ||
DFP | |||
SR1 | SR1 | ||
[8] | MSS (Multipoint-Symmetric-Secant) |
Compact Representations
editCollecting the updating vectors of the recursive formulas into matrices, define
upper triangular
lower triangular
and diagonal
With these definitions the compact representations of well known quasi-Newton updates are given for each formula:
Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known [5]. In particular, the inverse representation is given by
The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:
The SR1 (Symmetric Rank-1) compact representation was first proposed in [5]. Using the definitions of and from above, the inverse Hessian formula is given by
The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form
MSS
editThe multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov [9]. The compact representation for the direct Hessian was derived in [10]
The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.
Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping , and in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS [11].
PSB
editThe PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation[12].
Reduced BFGS
editThe reduced compact representation (RCR) of BFGS is for linear equality constrained optimization , where is underdetermined. In addition to the matrices the RCR also stores the projections of the 's onto the nullspace of
For the compact representation of the BFGS matrix (with a multiple of the identity ) the (1,1) block of the inverse KKT matrix has the compact representation[13]
Limited Memory
edit
The most common use of the compact representations is for the limited-memory setting where denotes the memory parameter,
with typical values around (see e.g., [13][5]). Then, instead
of storing the history of all vectors one limits this to the most recent vectors and possibly or .
Further, typically the initialization is chosen as an adaptive multiple of the identity ,
with and . Limited-memory methods are frequently used for
large-scale problems with many variables (i.e., can be large), in which the limited-memory matrices
and (and possibly ) are tall and very skinny:
and .
Implementations
editOpen source implementations include:
- ACM TOMS algorithm 1030 implements a L-SR1 solver [14] [15]
- R's
optim
general-purpose optimizer routine uses the L-BFGS-B method. - SciPy's optimization module's minimize method also includes an option to use L-BFGS-B.
- IPOPT with first order information
Non open source implementations include:
- Artelys Knitro nonlinear programming (NLP) solvers use compact quasi-Newton matrices [2]
- L-BFGS-B (ACM TOMS algorithm 778)[16]
Works cited
edit- ^ Nocedal, J.; Wright, S.J. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, NY. doi:10.1007/978-0-387-40065-5. ISBN 978-0-387-30303-1.
- ^ a b Byrd, R. H.; Nocedal, J; Waltz, R. A. (2006). "KNITRO: An integrated package for nonlinear optimization". Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications. Vol. 83. In: Di Pillo, G., Roma, M. (eds) Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications, vol 83.: Springer, Boston, MA. p. 35-59. doi:10.1007/0-387-30065-1_4. ISBN 978-0-387-30063-4.
{{cite book}}
: CS1 maint: location (link) - ^ Zhu, C.; Byrd, R. H.; Lu, P.; Nocedal, J. (1997). "Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization". ACM Transactions on Mathematical Software (TOMS). 23 (4): 550-560. doi:10.1145/279232.279236.
- ^ Wächter, A.; Biegler, L. T. (2006). "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming". Mathematical Programming. 106: 25-57. doi:10.1007/s10107-004-0559-y.
- ^ a b c d e Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods". Mathematical Programming. 63 (4): 129–156. doi:10.1007/BF01582063. S2CID 5581219.
- ^ Walker, H. F. (1988). "Implementation of the GMRES Method Using Householder Transformations". SIAM Journal on Scientific and Statistical Computing. 9 (1): 152–163. doi:10.1137/0909010.
- ^ Dennis, Jr, J. E.; Moré, J. J. (1977). "Quasi-Newton methods, motivation and theory". SIAM Review. 19 (1): 46-89. doi:10.1137/1019005. hdl:1813/6056.
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^
- ^ Burdakov, O. P. (1983). "Methods of the secant type for systems of equations with symmetric Jacobian matrix". Numerical Functional Analysis and Optimization. 6 (2): 1–18. doi:10.1080/01630568308816160.
- ^ Burdakov, O. P.; Martínez, J. M.; Pilotta, E. A. (2002). "A limited-memory multipoint symmetric secant method for bound constrained optimization". Annals of Operations Research. 117: 51–70. doi:10.1023/A:1021561204463.
- ^ Erway, J. B.; Jain, V.; Marcia, R. F. (2013). Shifted limited-memory DFP systems. In 2013 Asilomar Conference on Signals, Systems and Computers. IEEE. pp. 1033–1037.
- ^ Kanzow, C.; Steck, D. (2023). "Regularization of limited memory quasi-Newton methods for large-scale nonconvex minimization". Mathematical Programming Computation. 15 (3): 417–444. doi:10.1007/s12532-023-00238-4.
- ^ a b Brust, J. J; Marcia, R.F.; Petra, C.G.; Saunders, M. A. (2022). "Large-scale optimization with linear equality constraints using reduced compact representation". SIAM Journal on Scientific Computing. 44 (1): A103–A127. arXiv:2101.11048. Bibcode:2022SJSC...44A.103B. doi:10.1137/21M1393819.
- ^ "Collected Algorithms of the ACM (CALGO)". calgo.acm.org.
- ^ "TOMS Alg. 1030". calgo.acm.org/1030.zip.
- ^ Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM Transactions on Mathematical Software. 23 (4): 550–560. doi:10.1145/279232.279236. S2CID 207228122.