Talk:Principle of indifference

Latest comment: 4 years ago by Vaughan Pratt in topic Meaningless in frequentist statistics?

-Ann O'nyme

Hrm, yeah, not too novice-friendly. A nice example would go a long way. I'll do that eventually, if someone doesn't beat me to it.
Cyan 18:25, 11 Aug 2003 (UTC)
I've got a book by Martin Gardner on paradoxes (the name eludes me at this time) that presents a user-friendly discussion on the Principle. If I remember, I'll paraphrase an example or two from it.
CHz 20:12, 2 Jan 2004 (UTC)
The book is "aha! Gotcha" by Martin Gardner. Here's the full citation:
  • Gardner, Martin (1982). aha! Gotcha. New York: W. H. Freeman and Company. ISBN 0-71-671361-6.
The following example occurs on pages 107-108:
"Let's see how contradictions arise if the principle is carelessly applied to our questions about Titan and atomic war. What is the probability there is some form of life on Titan? We apply the principle of indifference and answer 1/2. What is the probability of no simple plant life on Titan? Again, we answer 1/2. Of no one-celled animal life? Again, 1/2. What is the probability there is neither simple plant life nor simple animal life on Titan? By the laws of probability we must multiply 1/2 by 1/2 and answer 1/4. This means that the probability of some form of life on Titan has now risen to 1 - 1/4 = 3/4, contradicting our former estimate of 1/2.
"What is the probability of an atomic war before the year 2000? By the principle of indifference we reply 1/2. What is the probability of no atom bomb dropped on the United States? Answer: 1/2. Of no atom bomb on Russia? Answer: 1/2. Of no atom bomb on France? Answer: 1/2. If we apply this reasoning to ten different countries, the probability of no atom bomb falling on any of them is the tenth power of 1/2, or 1/1024. Subtracting this from 1 gives us the probability that an atom bomb will fall on one of the ten countries--a probability of 1023/1024.
"In both of the above examples the principle of indifference is aided by an additional assumption in yielding such absurd results. We have tacitly assumed the independence of events that clearly are not independent. In light of the theory of evolution, the probability of intelligent life on Titan is dependent on the existence there of lower forms of life. Given the world situation as it is, the probability of an atom bomb falling on, say, the United States is not independent of the probability of such a bomb falling on Russia."
(forgive any typographical errors: I'm tired)
This is a great example that explains both how the Principle is applied and how it can be incorrectly used. I don't think I can rephrase in any better way, so does anyone have an objection to me slapping this into the article?
I was also thinking about adding a simplified definition, maybe something like "In layman's terms, the Principle of indifference states that, if we have a list of several independent events and have no reason to believe that any are more or less likely to occur than others, then we should assume that each has an equal chance of occurring." Any thoughts? CHz 05:31, 5 Jan 2004 (UTC)

Quoting such a large chunk of text may be a copyright violation, so check up on fair use rules before you paste it into the article. Also, feel free to go ahead and make any changes you think are appropriate - typically, people use the talk page for proposing really big changes, but relatively small improvements can be discussed after the fact, if anyone feels it's necessary. Cheers, Cyan 15:45, 5 Jan 2004 (UTC)

You do have a point there; I forgot all about fair use. You can tell I'm still gettin' used to all this. "Just because you quote something and give the source doesn't mean your reference is legal." I'm working on some legal examples for the page. I should hopefully have them up within the next couple of days, providing I don't go on unexpected vacation again like I did on Monday-Thursday. CHz 03:43, 10 Jan 2004 (UTC)
Ta-da! I'm finally done. I aded a somewhat simpler definition, extended the sections on coins and dice, and added a new section on misuse. Feel free to dissect my work as you see fit. I do have a tendency to write things that make perfect sense to me but no sense to anyone else. CHz 03:48, 18 Jan 2004 (UTC)

Not bad, not bad. However, I believe that one of the explanations is flawed. I am refering to the section under the heading "Ranges". You wrote, "These conflicting results arise because the middle of the ranges is not the "best" guess." I believe this is incorrect. From the point of view of decision theory, there is no best guess until a loss function is specified; once it is, the best guess is the one that minimizes the expected loss. If the loss function is symmetric, then the median (also the mean) of the bounded uniform distribution is the optimal guess.

The unknown box example does demonstrate why the principle of indifference cannot be applied to a continuous variable. The reason why it fails is that the example implicitly assumes a uniform epistemic probability distribution both for the length of a side and for the volume; these two assumptions are contradictory. Choosing the middle value as the best guess for both variables and arriving at a contradiction is simply a consequence of these two implicit, contradictory assumptions.

In general, for continuous variables, the principle of indifference does not indicate the which variable (e.g. in this case, length or volume, or surface area, even) is to have a uniform epistemic probability distribution. This is the reason why the principle of indifference can't be applied to continuous variables.

What do you think? -- Cyan 05:43, 18 Jan 2004 (UTC)

Yeah, I struggled with that section a bit while writing. I read about the unknown cube paradox in a book, but it didn't have an explanation about how to resolve it, so I had to come up with my own. (It also didn't help that I couldn't find any information on applying the principle of indifference on continuous variables.) I tried to get around the loss theory issue (in my mind, anyway) by assuming that "better" means "more likely." The statement "The principle of indifference is normally misused by being applied to dependent events. It is also used as part of a faulty argument involving ranges of values." (which has a grammar error in the first sentence... whoops) could be changed to "The principle of indifference is normally misused in one of two ways: it is misapplied to dependent events and continuous variables."
And, of course, your explanation is good. It stands up to examination more than mine does. =P CHz 04:40, 19 Jan 2004 (UTC)

Shall I work my explanation into the text of the article, or do you want to do it? -- Cyan 04:47, 19 Jan 2004 (UTC)

Well, you seem to have somewhat more experience with the application of the principle of indifference to continuous variables, so it's probably best that you do it. I don't mind you tampering with my baby... CHz 03:59, 20 Jan 2004 (UTC)

Okay, all done. -- Cyan 04:48, 20 Jan 2004 (UTC)

"Not bad, not bad." I just rephrased the introduction to the "Ranges" section; no new content. (I like the addition of surface area, by the way.) CHz 04:12, 22 Jan 2004 (UTC)

Hello. This is a nice article. I think the part about continuous variables can be improved. The principle of indifference doesn't imply that all functions of a quantity must be uniformly distributed if one of them is. If I assume surface area is uniformly distributed, then the p.i. does not apply to volume, since I am not assuming that ranges of the volume of equal size have equal probability. Likewise if I assume volume is uniformly distributed, the p.i. doesn't apply to surface area. The point is that if I make an assumption about one variable, I'm no longer ignorant about the other; the p.i. has nothing to say about the other, so I can't construct a contradiction. I agree that there is some subtlety to working with continuous variables, but it doesn't follow that the p.i. is not applicable. I invite your comments. FWIW, Wile E. Heresiarch 17:04, 16 Mar 2004 (UTC)

You are correct, of course. I think that historically, the essential complaint is that nothing in the P.I. tells one which parameterization of the problem is the one in which P.I. applies. If one uses it in a real data analysis problem, one immediately invites the criticism, "Why did you choose to make that parameter's prior uniform? Why not its square, or its square root, or some other arbitrary function?" A principle that applies to continuous parameters really ought to provide the answer to this criticism, but P.I. doesn't. (The principle of transformation groups claims to, but I haven't tried to write that article because (i) I have misgivings over some of the applications, and (ii) my math skills aren't quite there yet. Good username by the way. "Heresiarch." Heh.) -- Cyan 17:37, 16 Mar 2004 (UTC)

Hello. I have a comment on the "life on Europa" example. The inconsistent result is due to the erroneous assumption of independence, as pointed out by Gardner himself -- he's quoted above as saying "We have tacitly assumed the independence of events that clearly are not independent." So although this example is a cautionary tale -- you have to be careful about dependence -- it's not clear to me that this example tells something about the principle of indifference. Is there some way to rephrase it so that the problem is clearly with the p.i. and not with an unjustified assumption about independence? Happy editing, Wile E. Heresiarch 14:16, 29 Mar 2004 (UTC)

Hmm. Well, to me the example seems perfectly clear, but that's likely a side effect of the fact that I wrote that portion. How about this revision to the last paragraph?
"This probability of 3/4 contradicts the previous probability of 1/2. These contradictory results occur because (assuming evolution is valid) the principle of indifference has been applied to dependent events. The probability of multi-celled life is not independent of the probability of single-celled life; multi-celled life would develop from single-celled life."
How's that? CHz 04:37, 30 Mar 2004 (UTC)
I agree the example is perfectly clear, but what is clear is that a mistaken assumption of independence leads to a mistaken conclusion. I don't see that the example, either in the original version or the rephrased version above, says anything about the principle of indifference. Whether the probabilities were assigned by the p.i. is immaterial; the answer will be wrong whatever kind of assignment is made. Can we find an example that speaks directly about the p.i. ? Wile E. Heresiarch 04:49, 15 Apr 2004 (UTC)
On a related note, the p.i. can certainly be applied to dependent events, so that can't be the source of the problem. For example: you roll two dice and then you get to spin a roulette wheel with a number of segments equal to the sum of the dice. The second outcome depends on the first, yet a conventional analysis (as correct as can be without making a detailed physical analysis) would be to apply the p.i. twice. Wile E. Heresiarch 04:49, 15 Apr 2004 (UTC)

Sorry about not replying for a month; I somehow managed to remove this article from my watchlist. And then I went on vacation. =P

I read your comment yesterday and did some thinking about it: working it out, realizing I need a probability textbook, and so forth. Suddenly, while I was working on something else, I got a flash of inspiration, so I ran over to the computer and typed up the next two paragraphs in a minute or two. I then went back to what I was doing. It made sense when I wrote it, and hopefully it'll make sense to you.

I think I figured out why the principle of indifference works on the dice-and-wheel example but not the Europa one. The problem lies not in the application on dependent events in general but rather on dependent events that are unaccounted for. In the dice-and-wheel example, the p.i. can be used with no problems on the roll (although a single die might be better because the probability distribution of the total of two rolled dice isn't uniform) and can therefore also be applied to a wheel whose number of segments is equal to the number rolled by the dice. In this case, the certainty of the probabilities of one event allows the p.i. to be applied to another event dependent on it.
In the Europa example, however, the principle of indifference can't simply be applied to the existence of single-celled or multi-celled life because each is dependent on a large list of other factors that need to be accounted for, including temperature, moisture, the existence of certain elements, etc. The probability of life cannot be boiled down to a simple "either there is or there isn't" dichotomy, which is why the p.i. cannot be applied without further information.

Does that sound right? CHz 17:57, 7 May 2004 (UTC)Reply


Following up on this discussion from last spring -- I've struck out the section titled "Dependent Events". The essential difficulty is that Gardner's puzzle hinges on an inappropriate assumption of independence, and doesn't say anything about the principle of indifference. It may be possible to reformulate the puzzle to directly address the p.i., but, absent any evidence that such a mistake or misapplication has been made more than once, we're veering off into original research. The implication in the previous revision was that application of p.i. to dependent events is a common mistake (the article said "normally misused") but I've yet to see that in a decade of reading, and there's no evidence presented that this misuse occurs "in the wild". If anyone has such evidence, we can discuss how to restore the section, otherwise, I just don't see the point of it. For what it's worth, Wile E. Heresiarch 17:56, 15 Aug 2004 (UTC)

After reading the article, I don't understand why the principle of indifference would apply to discrete variables but not to continuous variables. First, the example with a cube is confusing, as it doesn't justify "simply picking the mid-values" from different probability distributions and then expecting them to be related to each other somehow. Second, I don't see how the choice of parameterization isn't as much of a problem for discrete variables. If you simply discretize the cube (it's made of atoms anyway, measure the lengths in atoms instead of centimeters, then you get integers), you get exactly the same problem. 82.103.214.43 18:37, 11 June 2006 (UTC)Reply

I agree. I think the whole cube example should be removed, because it confuses the issue regarding continuous variables. Consider the same example but with the restriction that the side length must be an integer: assuming equal probabilities for the lengths 3, 4 and 5, the expectation value is 4 for the side length, 100 for the surface area, and 72 for the volume. These values are just as mutually contradictory as in the continuous case, without assuming uniform distributions for the area and volume ranges.
The issue with continuous variables is not that equiprobability isn't uniquely definable, but rather that the probabilities of most elementary events are zero, so one must use probability density functions instead. When a discrete uniform distribution is transformed by applying some function to the underlying variable, it is intuitively obvious that the resulting discrete distribution is also uniform only when the transformation is linear. The same holds for continuous uniform probability distributions, but it is not as obvious since we generally don't have good intuition for the continuous case. For example, I am not sure if non-uniform equiprobable continuous distributions can exist at all, but it's intuitively obvious that non-uniform equiprobable discrete distributions exist. --Shastra 15:27, 11 July 2006 (UTC)Reply

not quite meaningless for frequentists

edit

Although often stated in Bayesian terms, the principle of indifference isn't "meaningless" from a frequentist perspective. It translates to the principle that one ought to choose the uniform distribution as a density estimator, unless given some basis (such as experimental data) from which to choose a different density estimator. Of course, it fares no better as an estimator than as a means of assigning degrees of belief. --Delirium 06:51, 4 September 2006 (UTC)Reply

:IL NUOVO PRINCIPIO DI INDIFFERENZA ==

Given the set of all admissible hypotheses H, let h denote any one element of the partition of H and d the projected design, then we are allowed to assign the same probability to any h if prior information is considered irrelevant and d is impartial with respect to H.

In other words, the principle prescribes a uniform distribution over a partition of hypotheses where there is no reason to believe one more likely to be true than any other, in the sense of both irrelevance of prior information and impartiality of the method of inquiry. Notice that this is a general principle that concerns any inquiry (historical, judicial, and so on).
In statistics, there is a way to see whether d is impartial with respect to the parameter θ. We need to observe the likelihood functions generable from the design with respect to θ itself. In particular, d is impartial if any likelihood, represented in function of θ, is symmetrical (by providing an equal support to equal deviations from its maximum), and it does not change shape and spread whatever the data may be.
This criterion allows us to make the new principle of indifference operative. Namely, we can look at likelihood curves (for different sets of data) and see what they say about the impartiality of the projected design.
Box and Tiao (1973) introduced a criterion essentially analogous. According to these authors, the prior distribution for a parameter, let us say θ, is assumed to be locally uniform if different sets of data translate the likelihood curve on the θ-axis, leaving it otherwise unchanged (that is, the data only serve to change the location of the likelihood). On the other hand, if θ is not data translated, Box and Tiao suggest expressing the parameter in terms of a new metric f(θ), so that the corresponding likelihood is data translated.
Anyway, in order to ensure the impartiality of d with respect to θ, it is sufficient that, for all possible likelihood functions that can be obtained from d, the respective maxima of likelihood remain constant with respect to θ itself. In other words, the prior is uniform if the maximum of each possible curve of likelihood is constant, or it is situated on the same level of any other.
On the other hand, the uniform prior is simply a particular case. In fact, as we showed in a Symposium held to Virginia Tech in Blacksburg (Virginia) on 1-5 June 2006, the prior for a parameter θ can be assumed proportional to the corresponding maximum value of likelihood for all possible likelihood functions obtainable from the projected design.

MAJOR INCONSISTENCY- whoever wrote this page, please make sure it is self-consistent!!!

edit

The introduction states that Principle of indifference is meaningless under "frequency interpretation of probability". Then, as the examples, coin toss and statistical physics are given. Well -- those are exactly the examples where "frequency interpretation" is the only appropriate interpretation!

To be clear: I hold a PhD in Physics, and "frequency interpretation" is the only probability this science uses. We are not setting to evaluate the probabilities unless those could be verified by a reproducible frequency check. I wanted to understand what the "Principle of indifference" is, and this article was completely useless. Please revise it!


_________________________________________
MY ANSWER ==
I support the logical view about probability and the analytical solution of induction. In my view, any application of probability theory must be based exclusively on the axioms of the discipline. Namely, I agree with Lindley: "according to this view, all manipulations in inference are solely and entirely within the calculus of probability. The mathematics is that of probability".
The theory of statistics, as a mathematical discipline, can and should be developed from axioms in exactly the same way as Geometry and Algebra. This means that after we have stated the axioms, all further exposition must be based exclusively on them. Under such axioms, statistical inference correctly describes the inductive situation.
The analytical solution to statistical induction requires a notion of probability taken as logical relationship between hypothesis and evidence (real evidence, not hypothetical evidence). Carnap particularly emphasized this point: ``the omission of any reference to evidence, is often harmless.
The concept itself of conditional probability implies a reference to evidence. In particular, sampling without replacement from a finite population confirms, if it ever was necessary, that probability depends from that which we known from the results of previous trials. For instance, a card is selected at random from a standard deck of 52 cards. The probability of holding the seven of diamonds is 1/52. Well, this probability remains unchanged even if we take one card out of the pack without knowing which card it is. Though the cards are 51, information has not changed, hence probability also has not. This is a result of standard procedures and not one associated to a special notion of probability.
According to Carnap, the problem of determining the degree of confirmation of a hypothesis $h$ based on the evidence $e$ is not of an empirical nature, but it is a logical problem which can be answered analytically.
As you can read in Carnap, since the classic period, many writers have said of certain probability statements that they are `based on frequencies' or `derived from frequencies'. Nevertheless, often, and practically always, these statements, if made before Venn's time, speak of logical probability. They are logical probability statements referring to an evidence involving frequencies.
The situation for our principle is similar to the one that arises when we assign probability in casting dice. The assignment of an equal probability to each face of a die is based on the assumption that the casting is fair. In the same way, we can assign the same prior probability to every hypothesis if the design (or the method) used in the inquiry is `impartial', in the sense of ensuring the same support to all hypotheses.
The unsuccessful performance of the classical principle of indifference is just due to non-recognition of this side of the matter: the impartiality of the design or method of inquiry.
In statistics, there are two kinds of alternatives to the new principle of indifference: a calculus with a subjective assignment of probabilities (as it was suggested, for instance, by de Finetti); a calculus independent from probability axioms (Likelihood approach, Neyman-Pearson theories and the like).
As Carnap wondered (1962, p. 518), why did statisticians spend so much effort in developing methods of inference independent of the probability axioms? It seems clear that the main reason was purely negative; it was the dissatisfaction with the principle of indifference (or insufficient reason). If we should find a degree of confirmation which does not lead to the unacceptable consequences of the principle of indifference, then the main reason for developing independent methods of estimation and testing would vanish.
Under the new principle of indifference, the main reason for using statistical methods independent from probability axioms has vanished.
CONCLUSION
Your question: "I wanted to understand what the "Principle of indifference" is"
Answer: This Principle allows us to be consistent with probability axioms. Inconsistency is somewhere else. For instance, by rejecting that principle.

Rodolfo de Cristofaro decrist@ds.unifi.it

I like this quote

edit

"Evidently we require not mere absence of knowledge of reasons favoring one alternative over another, but knowledge of the absence of such reasons." W C Kneale, Probability and Induction (1949)

I like this quote. Do you? Boris Tsirelson (talk) 18:16, 2 May 2009 (UTC)Reply


Misclassified

edit

It appears this article is misclassified (Philosophy... ). "The Principle of Indifference" is key to the study of Bayesian Probability. Given that there has been no substantial discussion or revision of this article in several years (2006 or so), I suggest the article be restarted in the appropriate classification.Dennis Lindley has written definitively on the subject and I'll be looking in his direction to restart it elsewhere (Bayesian, Probability, ...).

I was looking to refresh my memory when I found this article and have now bitten off more than I intended.

Jmiche (talk) 08:55, 5 November 2009 (UTC)Reply

I think it's fair to consider it a philosophy of probability (or science) matter, but also to consider it a probability or statistics matter. Given that, according to the article, its earliest recognizable formulation was by logicians, and the restatement by an economist, it isn't misclassified as much as it is used by multiple areas of study. JJL (talk) 15:09, 5 November 2009 (UTC)Reply

Principle of transformation groups and the box example

edit

The "solution" to the box example using the principle of transformation groups is horrendous. I'm removing it until I or someone gets a chance to re-write it. Here's the text:

To apply this to the above box example, we have three problems, with no reason to think one problem is "our problem" more than any other - we are indifferent between each. If we have no reason to favour one over the other, then our prior probabilities must be related by the rule for changing variables in continuous distributions. Let L be the length, and V be the volume. Then we must have
 
Which has a general solution:   Where K is an arbitrary constant, determined by the range of L, in this case equal to:
 
To put this "to the test", we ask for the probability that the length is less than 4. This has probability of:
 .
For the volume, this should be equal to the probability that the volume is less than 43 = 64. The pdf of the volume is
 .
And then probability of volume less than 64 is
 .
Thus we have achieved invariance with respect to volume and length. You can also show the same invariance with respect to surface area being less than 6(42) = 96. However, note that this probability assignment is not necessarily a "correct" one. For the exact distribution of lengths, volume, or surface area will depend on how the "experiment" is conducted. This probability assignment is very similar to the maximum entropy one, in that the frequency distribution corresponding to the above probability distribution is the most likely to be seen. So, if one was to go to N people individually and simply say "make me a box somewhere between 3 and 5 cm, or a volume between 27 and 125 cm, or a surface area between 54 and 150 cm", then unless there is a systematic influence on how they make the boxes (e.g. they form a group, and choose one particular method of making boxes), about 56% of the boxes will be less than 4 cm - and it will get very close to this amount very quickly. So, for large N, any deviation from this basically indicates the makers of the boxes were "systematic" in how the boxes were made.

Problems: (1) It's super confusing. (2) It doesn't at all make clear that this is based on a symmetry transformation (which, for the box?). (3) This is wrong/misleading: "So, for large N, any deviation from this basically indicates the makers of the boxes were "systematic" in how the boxes were made." (4) Should use different letter for pdfs f(L) and f(V), say g(V). (5) Needs to actually call them pdfs. (6) Many others... Jess (talk) 16:52, 30 December 2013 (UTC)Reply

"A symmetric dice has n faces..."

edit

Under 'Dice' the part: "A symmetric dice has n faces..." has me wondering: should that read "A symmetric die has n faces..."

Fix? 71.139.161.9 (talk) 06:01, 15 September 2014 (UTC)Reply

History section has problems

edit

This section omits Gerolamo Cardano.

Before anyone else, Cardano expressed the Principle of Indifference in his Book on Games of Chance. He called it a principle of equality. His book was not published for 100 years.

He should probably be added, but I think it may take a bit of rewriting of the section.Tadamsmar (talk) 18:56, 30 July 2019 (UTC)Reply

Meaningless in frequentist statistics?

edit

Lede, second paragraph: "The principle of indifference is meaningless under the frequency interpretation of probability,..."

I would have thought that evidence that two distributions are different would be admissible as "relevant evidence" in the sense of the principle. Vaughan Pratt (talk) 22:51, 24 October 2020 (UTC)Reply