Wikipedia:Reference desk/Archives/Mathematics/2011 February 19
Mathematics desk | ||
---|---|---|
< February 18 | << Jan | February | Mar >> | February 20 > |
Welcome to the Wikipedia Mathematics Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
February 19
editis the BFGS formula right?
editI have a question about the BFGS method page. It is my understanding that when the objective function takes k arguments, then xk and Bk will converge to the optimum and hessian in k steps, so long as that hessian is constant. It is also my understand that a multivariate normal has constant Hessian. Knowing (?) this, I implemented the BFGS method and was surprised that neither happened. As for xk, this is certainly possible since the line search for the last step needs to find the actual optimum for the method to converge. But the search direction did not even point directly at the optimum... because Bk did not converge. Thanks in advance for any help. 68.239.122.222 (talk) 20:03, 19 February 2011 (UTC)
- When debugging a new implementation of an optimization algorithm, a few common errors arise. Here are some things you should check:
- Did your method correctly compute the gradient? Are you relying on any analytic solutions to estimate the gradient, and does that approximation depend on any implicit assumptions?
- Is your implementation updating the variables in the correct order? If not, you are using values from last step, not the current step.
- Are you using double precision math - particularly for your gradient? Many numerical algorithms are sensitive to error and single precision math is not good enough.
- Can you correctly find the minimum solution for a simple objective function, say a one-dimensional parabola or a simple parabolic bowl?
- I will just point out that if your objective function is pathological, even a solid conjugate-gradient or BFGS method may not successfully find the optimum solution. In those complex cases, numerical preconditioning or variable-substitution may work; or a statistical method can be used to average the results of several (non-converging) runs that use your direct solutions. Try on a simple objective function first; worry about pathological nonlinear optimization after you are sure you can trust your code. Nimur (talk) 20:16, 19 February 2011 (UTC)
- Thanks for the suggestions. I have checked all of these. (1) The multivariate normal is not pathological (and my code does find the max, but slower than I would expect); (2) I use a analytic formula from a JASA paper; (3) I verified the gradient using a numerical gradient package; (4) I am using IEEE doubles for every computation, the problems you mention also arise when doing and outer product of y and itself. —Preceding unsigned comment added by 151.200.185.212 (talk) 20:54, 19 February 2011 (UTC)
- So, your only problem is that the Hessian does not converge, even though xk does converge to the optimum?
- Question. If you override your estimated Hessian with an identity matrix, you effectively convert BFGS into steepest descent method. Does that cause xk to converge? As a debug approach, print out the residual of the Hessian (either the full matrix, or just the sum of elements). | Bk - 1 | should be close to zero - so let's see if your Hessian estimate is far from where it should be. Nimur (talk) 21:51, 19 February 2011 (UTC)
- This is an interesting idea, thanks for proposing it. However, I now think I know the reason now. The function I am maximizing the is the log likelihood, not the likelihood. So I guess, now I wonder if there is a way to convert the Hessian of log(f) to the Hessian of f. But, I'll bet with a pencil and paper I can get the answer to that myself. 151.200.127.248 (talk) 01:20, 20 February 2011 (UTC)
- Thanks for the suggestions. I have checked all of these. (1) The multivariate normal is not pathological (and my code does find the max, but slower than I would expect); (2) I use a analytic formula from a JASA paper; (3) I verified the gradient using a numerical gradient package; (4) I am using IEEE doubles for every computation, the problems you mention also arise when doing and outer product of y and itself. —Preceding unsigned comment added by 151.200.185.212 (talk) 20:54, 19 February 2011 (UTC)
Weierstrass substitution
editHas anyone tabulated the variety of different forms the answer can take when one finds the antiderivative of a rational function of sine and cosines by using the Weierstrass substitution, along with the sorts of inputs that result in each sort of output? It seems that in some simple cases the answer is a logarithm or an arctangent of a rational function of sines and cosines, and in some you get things like the logarithm of the square root of 2 times an arctangent, and in some you get things still messier. Textbooks don't seem to tell you that the more exotic-looking ones exist. Michael Hardy (talk) 23:34, 19 February 2011 (UTC)
- Perhaps it's worth noting that the Weierstrass substitution is reversible, with the inverse substitution being
- (up to conventions). So any rational function of t can be expressed as a rational function of trigonometric functions, and so to answer your question, it's enough to know the integrals of rational functions. Sławomir Biały (talk) 13:19, 20 February 2011 (UTC)
You seem to have missed the point of the question. Reducing integrals of rational functions of sines and cosines to integrals of rational functions of the variable with respect to which one is integrating is of course the whole point of the substitution. That doesn't depend on reversibility. And your "inverse substitution" is of course usually expressed as
Michael Hardy (talk) 20:47, 20 February 2011 (UTC)
An example:
Applying the substitutions, we get
We decompose this into partial fractions:
Do you see things like
in textbooks? Michael Hardy (talk) 20:56, 20 February 2011 (UTC)
- I think I also saw something like an iterated logarithm in an example I looked at. Michael Hardy (talk) 21:40, 20 February 2011 (UTC)
- I understand that the point of the substitution is to go from rational functions of sin x and cos x to rational functions of t. My point is that it is possible to go the other way, namely to go from any rational function of t to a rational function of sin x and cos x (that is, any rational function of t whatsoever). Now, I interpret your question as asking what are the possible kinds of outputs that can result from a Weierstrass substitution. My response is that there is no constraint: anything in the List of integrals of rational functions can occur, by the reversibility of the substitution. For instance, there you will see all of the inverse trigonometric functions. Sławomir Biały (talk) 21:00, 20 February 2011 (UTC)
OK, so you're saying there is no rational function that cannot arise via this substitution from some rational function of sines and cosines, and that that follows from the invertibility of the thing. I wonder how that takes into account the factor of 2 dt/(1 + t2). I will have to think about that one.
But there's another question: After you've got your antiderivative of a rational function, and you plug in tan(x/2) in place of t wherever it appears, what sort of variety of different sorts of functions will you see, and which original rational functions of sines and cosines will result in each one? Michael Hardy (talk) 01:16, 21 February 2011 (UTC)
- Alright, obvious point: The mapping ƒ(t) ↦ ƒ(t) · 2/(1 + t2) from rational functions to rational functions is also bijective. So there isn't any rational function of t that can't emerge from this substitution. But that still leaves us with work to do to answer the actual question. Michael Hardy (talk) 02:11, 21 February 2011 (UTC)
- From the List of integrals of rational functions,
- Any rational function can be integrated using the above equations and partial fractions in integration, by decomposing the rational function into a sum of functions of the form:
- Any rational function can be integrated using the above equations and partial fractions in integration, by decomposing the rational function into a sum of functions of the form:
- Try different a,b,c,d,e,f and n, and substitute tan back, and you would get your variety of functions. (Igny (talk) 05:22, 22 February 2011 (UTC))
- From the List of integrals of rational functions,
- Alright, obvious point: The mapping ƒ(t) ↦ ƒ(t) · 2/(1 + t2) from rational functions to rational functions is also bijective. So there isn't any rational function of t that can't emerge from this substitution. But that still leaves us with work to do to answer the actual question. Michael Hardy (talk) 02:11, 21 February 2011 (UTC)