Wikipedia:Reference desk/Archives/Mathematics/2010 June 30
Mathematics desk | ||
---|---|---|
< June 29 | << May | June | Jul >> | July 1 > |
Welcome to the Wikipedia Mathematics Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
June 30
editAdmissible decision rule pdf
editOn the page admissible decision rule, how is pdf F(x|θ) derived?--Alphador (talk) 00:09, 30 June 2010 (UTC)
- Do you really mean PDF? I think you might mean LaTeX. To use LaTeX you need to tell Wikipedia, so you use <math> at the beginning and </math> at the end. Whatever comes inside must be LaTeX language. (See the TeX and LaTeX article.) You write <math> F(x | \theta) \, </math> to get . The \, at the end makes Wikipedia produce full LaTeX font. Without it you would get . Not all mathematical formulas need the \, to make them render in LaTeX font, e.g. <math>x_1^2</math> gives straight away. Just experiment and see. Although what you see might depend on how your preferences have been set up on Wikipedia. •• Fly by Night (talk) 21:07, 30 June 2010 (UTC)
- I think the OP means probability density function rather than portable document format.→86.132.163.252 (talk) 21:33, 30 June 2010 (UTC)
Inverse scaling
editI'm sure this is an incredibly simple problem. I have two identical squares. When the first square is scaled down 43% (any arbitrary amount), the second square copies the first square, so it also reduces 43%. By what percentage would I have to scale up the 2nd square to make it appear to keep its original scale (even though it's being reduced 43%). In other words, by what amount do I need to scale up the 40% to get it back to 100%? It's easy to figure out for 10%, 25%, 50% -- (scale by 1000%, 400%, and 200% respectively). But my brain isn't working for arbitrary amounts. --70.130.58.45 (talk) 01:26, 30 June 2010 (UTC)
- From your 10% / 1000% answer, I assume that you are scaling down to 43% and not by 43%. So, you want to know x where 0.43 x = 1? Try x = 1 / 0.43 ≅ 2.3256 = 232.56%. -- 58.147.53.253 (talk) 01:37, 30 June 2010 (UTC)
- Exactly! So I multiply the current scale by the inverse of the original scale percentage? —Preceding unsigned comment added by 70.167.58.6 (talk) 17:52, 30 June 2010 (UTC)
- It appears to me that you are asking what percent you've reduced when you've reduced by 43% twice (so you can increase by the inverse percent and get back to the original size). If the original size is X and you reduce by 43%, the new size is .43*X. If you reduce again, the new size is .43*.43*X which is 0.18*X. So, divide by 0.18 and you get (0.18*X)/0.18 = X. If your scale amount is Y, you just divide by Y squared. -- kainaw™ 01:39, 30 June 2010 (UTC)
- THey probably mean they stuck 43% in a box to make it that relative size. To scale up again you need 100*100/43, which Google says is 232.55814, or about 233% Dmcq (talk) 18:00, 30 June 2010 (UTC)
nth power...
editI am trying to solve a = (b+c)^n for b = . Can't figure out where to even start with this, any hints please?--Dacium (talk) 02:03, 30 June 2010 (UTC)
- Take the nth root of both sides. --Tango (talk) 02:07, 30 June 2010 (UTC)
- duh... its one of those days where my brain isn't working :) a^(1/n)-c=b --Dacium (talk) 02:11, 30 June 2010 (UTC)
Given 700 possible participants, how many need be surveyed for a useful random sample?
editI need to survey a 700-strong workforce. However, if possible, I'd like to survey a randomized sub-set of that group. Is 700 potential respondents a large enough pool to allow for randomized sampling? How large would the random sample need to be? Or will I have to survey everyone to get meaningful data? 218.25.32.210 (talk) 02:50, 30 June 2010 (UTC)
- See Sample size#Estimating proportions. For a sufficiently large population, the population size ceases to be a significant factor and the sample sizes required for given margins of error are constant (several examples are given at the bottom of that section). However, I can't remember how large is generally considered sufficient - I suspect 700 is too small. That doesn't mean you can't use a random sample, you certainly can, it's just more difficult to calculate the same size required for a desired margin of error. This site will calculate it for you, though. (I make no guarantees of its accuracy, but it had the most official looking URL of the dozens of calculators Google found me.) --Tango (talk) 03:03, 30 June 2010 (UTC)
- It depends on how sure you want to be and how definitely the disparity is. I was going to mention sequential sampling as a way of cutting down the number if you're interviewing them but there seems to be no article on that, it just links to Milton Friedman. Dmcq (talk) 10:15, 30 June 2010 (UTC)
- I've moved the sequential sampling redirect to point to sequential analysis, which seems a bit more useful and less surprising. Qwfp (talk) 11:42, 30 June 2010 (UTC)
- Anyway I better point you at the Statistical survey page in case you've missed that out. Dmcq (talk) 10:28, 30 June 2010 (UTC)
It also depends on how much information you're trying to get. If it's a binary thing, i.e. each person answers "yes" or "no", then that stuff about proportions should cover it. But not if you've got 10 multiple-choice questions. Michael Hardy (talk) 23:05, 30 June 2010 (UTC)
- I agree with all of the above. Some of the main things you want to consider are:
- What do you want to measure? Is it a binary value or does it have more possible values?
- If it's a binary value/proportion, do you have an idea of how prevalent it is? (You need a smaller sample size to measure something with a prevalence close to 50% than something close to 0%).
- Do you want to look at subpopulations as well as the population as a whole? What kind of analysis do you want to do?
- Answering some of these should help get a better idea of what kind of sample size you need. Confusing Manifestation(Say hi!) 23:53, 30 June 2010 (UTC)
Weighted coins and prior
editIf you have 1000 coins (if it will make for a rounder answer you can read 1024, I don't know how it will work), all but one of which is fair, but one of which is weighted so heavily for heads that it comes up heads 100% of the time, and from this pile you pick one coin at random, flip it ten times, and get ten heads, then given the prior probability (1 in 1000 or 1024 for the weighted coin) and your experimental results (only one in every 1024 experiments comes up ten heads in a row for a fair coin), then what are the chances you have the weighted coin? 50/50? This is not homework. 84.153.209.237 (talk) 20:20, 30 June 2010 (UTC)
- Bayes theorem says:
- Now, , and .
- If we plug those numbers in, we get 0.5062 (to 4dp). If we went with 1024 coins, it would be 0.5002 (4dp). --Tango (talk) 20:40, 30 June 2010 (UTC)
- It would be 50/50 for 1025 coins (1024 fair ones and the weighted one). You'd be 1024 times more likely to pick a fair coin, but that's balanced by the fact that the chance of getting all heads with a fair coin is only 1/1024. Rckrone (talk) 20:44, 30 June 2010 (UTC)
- In other words, the odds of picking the biased coin need to be 1:1024, rather than the probability. In general, bayesian inference is easiest to do with odds. The initial odds is 1:1024; getting heads is twice as likely with a biased coin than a fair coin, so the likelihood ratio for each toss is 2:1, and for 10 tosses it's 1024:1; you multiply the two together, and you get 1024:1024 which is 1:1 odds or 50% probability. If you then get 10 more heads the odds go all the way to 1024:1 which is roughly 99.9% probability. -- Meni Rosenfeld (talk) 04:53, 1 July 2010 (UTC)
Homogeneous Continuous-time Markov chains - analytical expressions for the finite time transition probabilities?
editI am new to Markov chains, and I have run into this real-world problem, which I have realize is equivalent to a homogeneous Continuous-time Markov chain with a finite numer of n states, where I have knowledge about the n×n constant transition rate matrix Q.
The problem is that I know how to write down the transition probablity matrix in differential dime, but I would like to have the ability to compute it (preferably analytically) "integrated" over finite time, where the time intervals between discrete samling of the state is non-constant.
An equivalent problem to the one I have would be to consider an ant, looking for food on a blank sheet of paper with no sensory input giving any direction to where the food is. For simplicity we could say that the ants motion could have three states "Turn left" (L), "Continue straight ahead" (C) and "Turn right" (R) and that it is doing a random walk by making random transitions between those states. We assume that we know the characteristic times for the ant to switch from one state to another state given by the "transition times" (characteristic time to switch from the C to the L state), , , , , . If we know those (I do, roughly at least), and we know the probability for the ant to be in those states at time t, we can calulate the new probablities at time t + dt, where dt is an infinitesimal time period as the matrix product
The matrix is infinetesimally close to being an identity matrix
Since the transition times are constant (homogenuous Markov chain) I can also write an expression for the probability k infinetesimal time steps ahead in time by listing this matrix to the k'th power
Now, if I would like to find the transitions probablity matrix P for a finite time step τ = k dt. The result is:
So I need to go to a limit where I multiply a matrix which is infinetesimally close to the identity matrix and infinite number of times with itself to yeld a non-trivial finite result. I cannot figure out how to do that, and I could not find any articles here or on wikibooks or wikiversity which could help me solve that problem. Now, I am sure this is a recurring problem for continuous time Markov chains, and I bet there is a smart way to do it, but how? Any hints?
Thanks in advance --Slaunger (talk) 23:21, 30 June 2010 (UTC)
- I transferred this question from the Science Reference Desk. Dolphin (t) 00:34, 1 July 2010 (UTC)
- It is called the exponential function of the matrix. Bo Jacoby (talk) 00:36, 1 July 2010 (UTC).
- The matrix exponential article is likely of more use. JackSchmidt (talk) 01:03, 1 July 2010 (UTC)
- It is called the exponential function of the matrix. Bo Jacoby (talk) 00:36, 1 July 2010 (UTC).
- Specifically, you first write the transition rate matrix Q:
- Where is the transition rate from L to C, etc. Then the transition probability matrix over time t is
- Remember that conventionally, the i, j element of a transition matrix represents the transition from i to j, from which it follows that the matrix multiplies from the right: , where is the probabilities row vector at time t. -- Meni Rosenfeld (talk) 05:28, 1 July 2010 (UTC)
- Thank you Dolphin for moving my question to the adequate Reference Desk (which I was not conscious existed, but what a treasure!), thank you Bo Jacoby, for identifying the obvious, that this has to do with the exponential function. Thank you; JackSchmidt for pointing out, the thorough relevant article about the matrix exponential (which will keep me occupied for some hours grasping it), and finally thank you Meni for spelling out to me the transition rate matrix, which goes into the matrix differential equation and the notation. The combined help from you all is greatly appreciated, and I am no longer heplessly stuck with my real-world problem! Now I just need to figure out if my matrix obeys some simplifying criteria which allows me to do the matrix exponential in some smart way, but I think I can figure that out myself. --Slaunger (talk) 07:14, 1 July 2010 (UTC)
- His name is JackSchmidt, not "JackSmith". PST 10:08, 1 July 2010 (UTC)
- Thank you for pointing this careless misspelling out. It is now corrected. --Slaunger (talk) 11:46, 1 July 2010 (UTC)
- A supplementary question. In my problem I need to have the capability to evaluate for many different randomly distributed values t without too much computational effort. I have no problem spending time on precomputations as Q is a constant in my case. Now, this would have been straight-forward if Q had been diagonalizable, but with the examples I have tried out I find a zero eigenvalue. (BTW, can anyone explain to me why it is not diagonalizable, it has to do with degeneracy, right?). Had it been diagonalizable I could have precomputed the eigenvalues and eigenvectors. As a rusty physicist I must admit that I am a little bit blown away by the sections in matrix exponential which deal with decomposing the matrix into either commutable diagonalizable and nillpotent matrices (Jordan-Chevalley decomposition) or the Jordan form. I am wondering if there is a smart precomputational path through those tricks to evaluate it easier. Alternatively, I have found in my SciPy library the scipy.linalg.expm function, which does a brute force evaluation using the Padé approximant, and this works fine. My final production language is more low-level though and only has access to basic mathematical functions. But what I could do is precompute for a preselcted set of sampled times externally and make a hard-coded look-up table in the low-level language, and then interpolate at run-time to get a fair approximation for the exact result. I have noticed that for my examples, the function is smoothly varying, so that should be doable. I would rather avoid the look-up though, if I can. Any advice? --Slaunger (talk) 10:52, 2 July 2010 (UTC)
- The simplest way is to post your matrix Q here. We will find its Jordan decomposition and translate it into something you can easily evaluate.
- Another way is to approximate the answer (to arbitrary accuracy) in much the same way you could when evaluating the real-valued exponential function. For small you have , and for any n you have . So given t, you can take and use (which requires only k multiplications, by squaring). For higher accuracy you can use a better base case, etc.
- You can also fix and cache values of for various k. Then, given t, you let where n is chosen so that is small, and the exponentiation is done by multiplying the cached values corresponding to the binary expansion of n.
- Note: A transition rate matrix will always have an eigenvalue of 0, since all its rows sum to 0. This by itself has no bearing on whether it is diagonalizable. -- Meni Rosenfeld (talk) 12:11, 2 July 2010 (UTC)
- Thank you for your generous offer to make a Jordan decomposition. The thing is that that I will have several rate matrices in my real problem, and they will need som fine-tuning along the road, so I think it is best that I learn how to do it myself:-) The first thing is that I should go back to the diagonalization path, as it was just my confused untrained linear algebra brain, which made me stop when I noticed an eigenvalue of zero. One feeling I have about your other solutions (which I do not have the wits to substantiate with any good arguments) is the numerical stability/susceptibility to round-off errors of the partitioning into smaller time steps? That is of course something I could investigate by comparing with the more elaborate but numerically expensive methods. Anyway, I now have loads of stuff to work on, which will keep me occupied for some time. The cool thing is that I am learning a lot of new stuff along the road, and that is fun! --Slaunger (talk) 12:43, 2 July 2010 (UTC)
- Knowing that general structure of a rate matrix, can something sensible be said abot the conditions it needs to fulfill to be diagonalizable? --Slaunger (talk) 12:53, 2 July 2010 (UTC)
- Numerical robustness is indeed a potential issue. Because of this, if you want very high precision it is better to add terms to the Taylor expansion than to make too small.
- Having as many distinct eigenvalues as the dimension of the matrix is a sufficient condition to being diagonalizable. So does being symmetric. Other than that I don't know any simple rule (of course, there are non-simple rules like "all eigenvalues have equal algebraic and geometric multiplicities"). A rate matrix is fairly general, so I don't think it has special rules.
- For a "random" matrix the eigenvalues will be distinct, so your matrix will be diagonalizable unless it specifically isn't. -- Meni Rosenfeld (talk) 15:22, 2 July 2010 (UTC)
- Why discuss diagonalization? Why not just precompute and cache the powers I, Q, QQ, QQQ, . . . and compute etQ=∑k(tk/k!)Qk ? Bo Jacoby (talk) 16:01, 2 July 2010 (UTC).
- That works very poorly when t is large. And because diagonalization is the right way to do it. -- Meni Rosenfeld (talk) 16:14, 2 July 2010 (UTC)
- Why discuss diagonalization? Why not just precompute and cache the powers I, Q, QQ, QQQ, . . . and compute etQ=∑k(tk/k!)Qk ? Bo Jacoby (talk) 16:01, 2 July 2010 (UTC).
- His name is JackSchmidt, not "JackSmith". PST 10:08, 1 July 2010 (UTC)
- Thank you Dolphin for moving my question to the adequate Reference Desk (which I was not conscious existed, but what a treasure!), thank you Bo Jacoby, for identifying the obvious, that this has to do with the exponential function. Thank you; JackSchmidt for pointing out, the thorough relevant article about the matrix exponential (which will keep me occupied for some hours grasping it), and finally thank you Meni for spelling out to me the transition rate matrix, which goes into the matrix differential equation and the notation. The combined help from you all is greatly appreciated, and I am no longer heplessly stuck with my real-world problem! Now I just need to figure out if my matrix obeys some simplifying criteria which allows me to do the matrix exponential in some smart way, but I think I can figure that out myself. --Slaunger (talk) 07:14, 1 July 2010 (UTC)
- How many states will you be transitioning between? If it is 3, then yes, diagonalize or use the Cayley-Hamilton theorem and the Taylor series to get a reasonable closed form. If it is 30 or larger, then you are running into some very serious numerical problems. Finding eigenvalues of "large" transition matrices is very hard, and is often very poorly conditioned. Finding eigenvalues is a pre-requisite (or consequence) of doing the diagonalization. Transition matrices are typical examples of "this eigenvalue algorithm fails drastically" because it cannot adequately handle clusters. However, the problems don't really appear in the 3x3 case, so if it really is only three states, then you should be fine. JackSchmidt (talk) 18:13, 2 July 2010 (UTC)
- Hi JackSchmidt, it is a very relevant question. Currently I anticipate (and that is also what I see in the literature other people do for the kind of problem I am trying to address) that I will be working with three states. I may need to go to four or five states, but that will be the absolute max. The rates between states will the same within 1-2 orders of magnitude, and all states will be either connected directly or via one intermediate more probable state, where the rates to and from that one are both sizeable. And, yes after having looked more into it, the matrices seem so "normal" that they are always diagonalizable. And now I also realize how crucial it is that one eigenvalue is zero. It has to as this eigenvalue gives rise to the steady-state which is reached for times much larger than any of the typical timescales in the rates. It seems to me for the examples that I have tried (I perceive myself as an experimental mathematician, as I am not good enough to be a real one), that the other eigenvalues are always negative, which also seems to make sense, as all terms should be damped away to reach the steady-state. So, I will definately go for diagonalization this time, but I am now aware of other powerfull tools should I ever encounter larger and more illconditioned continuous-time Markov chain problems. --Slaunger (talk) 19:04, 2 July 2010 (UTC)
- The eigenvalues of a square matrix are the roots of the characteristic polynomial of the matrix. As the Durand-Kerner method for finding all roots of a polynomial was not widely known, ad hoc eigenvalue algorithms were developed. High precision arithmetic is needed for computing roots, otherwise the problem is dubbed ill-conditioned. Bo Jacoby (talk) 20:26, 2 July 2010 (UTC).
- That's silly. Finding the roots of a polynomial is the easy part. The hard part is finding the polynomial. Advanced eigenvalue algorithms were developed to deal with the numerical stability issues in large matrices. Also, solving the characteristic polynomial only gives you the eigenvalues, not the eigenvectors. -- Meni Rosenfeld (talk) 19:07, 3 July 2010 (UTC)
- The eigenvalues of a square matrix are the roots of the characteristic polynomial of the matrix. As the Durand-Kerner method for finding all roots of a polynomial was not widely known, ad hoc eigenvalue algorithms were developed. High precision arithmetic is needed for computing roots, otherwise the problem is dubbed ill-conditioned. Bo Jacoby (talk) 20:26, 2 July 2010 (UTC).
- Hi JackSchmidt, it is a very relevant question. Currently I anticipate (and that is also what I see in the literature other people do for the kind of problem I am trying to address) that I will be working with three states. I may need to go to four or five states, but that will be the absolute max. The rates between states will the same within 1-2 orders of magnitude, and all states will be either connected directly or via one intermediate more probable state, where the rates to and from that one are both sizeable. And, yes after having looked more into it, the matrices seem so "normal" that they are always diagonalizable. And now I also realize how crucial it is that one eigenvalue is zero. It has to as this eigenvalue gives rise to the steady-state which is reached for times much larger than any of the typical timescales in the rates. It seems to me for the examples that I have tried (I perceive myself as an experimental mathematician, as I am not good enough to be a real one), that the other eigenvalues are always negative, which also seems to make sense, as all terms should be damped away to reach the steady-state. So, I will definately go for diagonalization this time, but I am now aware of other powerfull tools should I ever encounter larger and more illconditioned continuous-time Markov chain problems. --Slaunger (talk) 19:04, 2 July 2010 (UTC)
- That is strange. How can it be hard to compute the characteristic polynomial , the coefficients being rational functions of the matrix elements? (But high precision arithmetic is required in order not to loose data). The roots, however, are irrational functions of the coefficients, and so their evaluation requires iteration. The eigenvectors are rational functions of matrix elements and eigenvalues. Bo Jacoby (talk) 22:25, 3 July 2010 (UTC).
- Being irrational and requiring iterations is the least of our problems. Our problem is that when you do many arithmetical operations, numerical error is compounded, especially where there is subtraction of numbers similar in magnitude. If you're using a naive algorithm you need extremely high precision to work around this.
- I've made the following experiment. I generated a random symmetric 100x100 matrix with 100 digits of precision (this is a relatively small matrix; a large matrix would be 10000x10000). Finding its eigenvalues with Mathematica's default algorithm worked. Finding the characteristic polynomial by manually asking about didn't work, it resulted in underflow and overflow errors. Finding the characteristic polynomial with a special-purpose built-in function worked. Solving the polynomial worked (apparently Mathematica uses Jenkins-Traub algorithm). Reducing the polynomial to machine precision and then solving it gave results that are just a bit off.
- Then I reduced the matrix to machine precision. Using the default eigenvalue algorithm, or finding and solving the polynomial, gave results that are way off.
- This confirms that you cannot find the characteristic polynomial with a naive algorithm; and if you use advanced algorithms for all parts involved, precision in the finding step is more important than in the solving step. Also, the default eigenvalue method was slightly faster than the polynomial method. -- Meni Rosenfeld (talk) 07:07, 4 July 2010 (UTC)
- Matrix elements are typically measurements having not 100 digits of precision but perhaps 5. Our problem is a 3x3 matrix, so the generalization to 100x100 or 10000x10000 is not immediately relevant. The computation of the determinant of an nxn matrix, each matrix element having k digits, requires nk digit arithmetic, which is not extremely high precision, but Mathematica apparently does not work like that. The n nontrivial coefficients of the (monic) characteristic polynomial requires totally n2k digits, the same number of digits required to represent the matrix. Bo Jacoby (talk) 23:07, 4 July 2010 (UTC).
- Of course I'm not talking about 3x3 matrices. For 3x3 matrices there's no problem at all and you can do the calculation however you want.
- I was replying to your comment "special-purpose eigenvalue algorithms were developed because people thought solving polynomial was difficult" and I said "no, solving polynomials is easy, especially if they're low-order, and eigenvalue algorithms were developed to deal with the problems of working with large matrices".
- There is a difference between machine arithmetic and software arithmetic. Machine arithmetic, which is up to 16 digits, uses the arithmetic instructions of the processor and is thus very fast. Software arithmetic can work to any precision but is much slower, even if you only use 16 digits, and more so for more digits.
- Mathematica does whatever you tell it to. If you tell it to use machine arithmetic it does machine arithmetic. If you give it high-precision input it does high-precision arithmetic. If you give it exact input and ask for a numerical result, it automatically finds what precision it needs in order to obtain the required output precision.
- I don't know how you got the nk figure but let's go with that. You might not consider nk digits "extreme", but doing operations with that precision is impossibly slow. If there is an alternative algorithm that is robust enough to work with machine precision than it is the way to go. -- Meni Rosenfeld (talk) 08:27, 5 July 2010 (UTC)
- The product of n k-digit numbers has no more than nk digits. Machine arithmetic precision soon becomes insufficient. Bo Jacoby (talk) 16:50, 5 July 2010 (UTC).
- Matrix elements are typically measurements having not 100 digits of precision but perhaps 5. Our problem is a 3x3 matrix, so the generalization to 100x100 or 10000x10000 is not immediately relevant. The computation of the determinant of an nxn matrix, each matrix element having k digits, requires nk digit arithmetic, which is not extremely high precision, but Mathematica apparently does not work like that. The n nontrivial coefficients of the (monic) characteristic polynomial requires totally n2k digits, the same number of digits required to represent the matrix. Bo Jacoby (talk) 23:07, 4 July 2010 (UTC).