Talk:Bayesian average

Latest comment: 1 year ago by FeralOink in topic Messy

In its present state the section titled "calculation" seems to consist of things that might make sense ONLY if the prior distribution and the conditional distribution of the observations given the parameter are both normal. Michael Hardy (talk) 03:46, 18 July 2008 (UTC)Reply

Messy

edit
A Bayesian average is a method of calculating the mean of a data set, where there is a known prior probability of the value being estimated.

What does that mean? I have a Ph.D. in statistics and I'm good at deciphering opaque writing, and here I need to guess. Where it says "prior probability of the value being estimated", does it mean "prior probability distribution of the value being estimated"? "Values being estimated" are not things to which one assigns probabilities! The "mean of a data set"??? Really? Why should one need a prior distribution if the thing whose mean one wants is a data set? And what does calculating the "mean of a data set" have to do with "estimating" something?? My guess is that someone wants to estimate a population mean (not the "mean of a data set") and that estimate is to be based on a data set.

This is a badly written article in its present form. Michael Hardy (talk) 05:59, 30 June 2009 (UTC)Reply

Hello Michael Hardy. I agree with you, even after the article was revised in the intervening 13 years since you wrote your comment. Here is a two paragraph example of a web browser safety service that claims to use "Bayesian averages" to determine website reputation, Biased Average by MyWot (which doesn't have a great reputation itself, see archived talk page if curious. I will have a look at the article, although I don't know if I can help much.-- FeralOink (talk) 13:15, 2 February 2023 (UTC)Reply

What is a "height" of an occupation? Michael Hardy (talk) 06:02, 30 June 2009 (UTC)Reply


The example ends without saying what is done with the data! Michael Hardy (talk) 06:04, 30 June 2009 (UTC)Reply

Perhaps needs to be related to pseudocount, and broadened. Bayesian estimates made using conjugate priors can quite often in form resemble the adding of fictitious data.
As for usage of the term, I believe that IMDB says it applies a "Bayesian mean" to its user ratings, essentially meaning the formula on this page.
IMO, if it is going to call the method Bayesian, the article needs to be much more explicit as to how the adjustment can arise in a properly Bayesian setting; and to identify that, even if it is true that this calculation may sometimes be called "the Bayesian mean" (citation needed), nevertheless it is only actually the Bayesian estimate of the (population) mean if particular modelling choices have been made. Jheald (talk) 09:44, 30 June 2009 (UTC)Reply

The context here is related to that of a Shrinkage estimator and it would probably also be possible to present is as a type of Empirical Bayes estimate. However, the basis of the estimator need not be Bayesian in any formal sense as such estimators can be derived from a MVUE approach ... and thus no distributional assumptions are needed. A simple linear model involving group means could be set up and the theory worked out which would yield an optimal estimate for a group mean, weight the observed mean for the chosen group together with the overall mean, assuming the relevant variances are known. But the main questions are ... should this be called a Bayesian average (or who calls it a Bayesian average) and is it important enough for a separate article? Perhaps something could be added to Shrinkage estimator . Melcombe (talk) 10:26, 30 June 2009 (UTC)Reply

There's nothing particularly Bayesian about this article; I'd delete it, the article is confusing and badly motivated. Bill Jefferys (talk) 20:12, 10 July 2009 (UTC)Reply

In a bid to prevent this article from being deleted, I have entirely rewritten the introduction based on my own understanding of the subject. The language is totally laymen (and no citation) but I thought I should start by making the article understandable at least, then we can improve from there. I really don't know what to do with the sections though. They're in pretty bad shape. --Mizst (talk) 14:03, 22 July 2009 (UTC)Reply

To address the notability issue, Bayesian average is in use mostly in review sites, most popular of them (that I've seen) is probably IMDB as mentioned earlier. I have a few more examples: www.thebroth.com, www.mangaupdates.com, and www.boardgamegeek.com. In these sites, they pad out the reviews with arbitrary scores until a certain amount of reviews is reached in order to prevent a lopsided computed average as a consequence of the small number of initial reviews, and they call this method Bayesian Average. If considered in the sense that the probabilist is imposing his prior experience/belief (of scores) which is outside of the data at hand (the actual reviews) into the representative statistic (the average), then it could be considered Bayesian. --Mizst (talk) 17:08, 22 July 2009 (UTC)Reply

What a mess....

edit

This article still begins as follows:

A Bayesian average is a method of calculating the mean of a data set[...]

That is nonsense. Obviously this is an attempt to ESTIMATE a mean of a POPULATION by using a DATA SET. It is NOT an attempt to calculate the mean of the DATA SET. Michael Hardy (talk) 15:41, 22 July 2009 (UTC)Reply

Thanks for pointing that out. I actually added the later paragraphs before modifying the existing top so that got lost on me. I have a tendency to keep whatever's there too, a hard wikipedian habit which is hard to shake. Btw, you can also edit any errors you spot yourself too which is encouraged. You're probably actually more qualified than me as you said you have a Ph.D. in statistics. --Mizst (talk) 16:12, 22 July 2009 (UTC)Reply

Factual Accuracy

edit

Let's clean up this article step by step toward the way of a quality article. Michael, would you kindly start by stating the currently disputed factual accuracy in the article? (as it was you who put the {accuracy} tag there) This will enable us or other people to start cleaning them up. --Mizst (talk) 17:14, 22 July 2009 (UTC)Reply

I've removed the accuracy tag. I've had to spend some effort at guessing what this was trying to say. There's a question of what is Bayesian about this. Bayesianism is about probability as degree-of-belief in propositions that are uncertain. This would coincide with posterior expected value if both the prior and the data were normally distributed, so in those circumstances it could be considered Bayesian. But the article doesn't say that. There is also a question of whether this sort of shrinkage estimator should be considered desirable independently of that sort of consideration, and then only afterwards one should address the question of probability distributions. But I'm not sure how one would argue for such a thing. Michael Hardy (talk) 19:05, 22 July 2009 (UTC)Reply
Hmm ... I think I may have confused Bayes' Theorem with Bayesian Interpretation when I rewrote the sentence in the article. Actually the way "Bayesian Average" is employed is specifically the subjectivist view of Bayesian Interpretation. In a way, the person computing the statistic believes that the arithmetic mean does not represent the population, so he adds other information into that mean to get closer to what he believes the population is, which doesn't have to be a normal distribution. Since it is subjective, whether it is desirable depends on how much you agree with him. --Mizst (talk) 19:56, 22 July 2009 (UTC)Reply
The real reasons for the overly simplistic model are probably just simplicity, essentially zero-cost calculation and not giving too conspicuously bogus numbers (as opposed to simple mean). It reminds me of the use of naive Bayes classifier in Bayesian spam filtering - I think the reasons for the choice are essentially the same, and that the people doing these things are similar (programmers who want a quick 80% solution, not statisticians). For people who really care about accuracy there are plenty of more serious approaches, see e.g. the Netflix prize.
That said, what this article really needs is reliable sources. I hadn't heard of this article's topic before either, and am not sure if it's really notable. -- Coffee2theorems (talk) 01:13, 26 July 2009 (UTC)Reply

Example has incorrect calculations

edit

The table with the data for basketball players, students, and the actor, has incorrect Bayesian Averages.

The calculations result in this:

Basketball Players

=(((average amount of data per set * average height) + (amount of data per set *average height per set))/(average amount of data per set+ amount of data per set ))

=(((8.666666667 * 190.333333333333) + (15* 191))/(8.666666667 + 15))

=190.7558685

Students

=(((average amount of data per set * average height) + (amount of data per set *average height per set))/(average amount of data per set+ amount of data per set ))

=(((8.666666667 * 190.333333333333) + (10* 179))/(8.666666667 + 10))

=184.2619048

Actors

=(((average amount of data per set * average height) + (amount of data per set *average height per set))/(average amount of data per set+ amount of data per set ))

=(((8.666666667 * 190.333333333333) + (1* 201))/(8.666666667 + 1))

=191.4367816

Scarborough Res (talk) 06:11, 15 December 2011 (UTC)Reply

The text above the table makes it clear that the average height of the population is 176 cm, for which the values in the table are approximately correct. --Brilliand (talk) 06:20, 17 January 2012 (UTC)Reply

--Polzme (talk) 11:48, 9 October 2014 (UTC)Reply

(15*191 + 10*179 + 1*201)/(15+10+1) = 186.76 and not 190.33.

Citations?

edit

As far as I can tell, this technique is being used in place of collaborative filtering, which typically requires building profiles of user ratings before a recommendation can be made. Given the lack of user profiles, it looks similar to techniques used in reputation systems. I was able to find a paper on computing expected ratings (in a similar way) for multinomial dirichlet here. Benjaminbishop (talk) 20:28, 8 December 2009 (UTC)Reply

Missing information

edit

Even in its limited form, this article is incomplete. It would be useful to have all terms of the equation clearly defined. --Japarthur (talk) 08:19, 28 April 2017 (UTC)Reply

Potential?

edit

I've thought for a while that this article has potential, if it said more. I've occasionally thought of trying to see if I could do something with it. I see that someone's proposed a merger. The subject of the Additive smoothing article seems to be quite similar. Michael Hardy (talk) 02:08, 5 September 2018 (UTC)Reply