Talk:Binomial regression

Latest comment: 5 years ago by Nbarth in topic Binomial vs. binary regression


Untitled

edit

Before you discuss here, please be sure that you have read the wikipedia merging page. Specifically, the three paragraphs under the first title, Merging. The basic problem that I see is that binomial regression has many people interested in it, but finding all these models or even more information about binomial regression once one has founc one article is difficult. As an example, google 'probit' or 'logit' and you get to a page that doesne't help much. I think that all the information would be better edited and tighter if it were in one article, more like an encyclopedia. Pdbailey 14:45, 27 April 2006 (UTC)Reply

I agree with some of the proposals, but not all of them. I basically have two objections:
  1. Why merge into binomial regression? This doesn't strike me as the most obvious place. Why not merge into generalized linear model instead?
  2. A number of articles are general enough to stand on their own. These include logit, probit, and logistic regression. Check "what links here" to see that "logit" and "probit" are referenced outside the context of generalized linear models. Also, while logistic regression could be seen as a special case of binomial regression, everything here is a special case of GLMs, but nobody is proposing merging everything into the GLM article. Certain topics have a life of their own and are large enough to warrant stand-alone articles.
--MarkSweep (call me collect) 18:17, 27 April 2006 (UTC)Reply
MarkSweep, thanks for pointing out the usefulness of the "what links here" pages. There appears to be a number of uses for these functions, Based on looking at the what links to logit and probit, they should stay. But, I would argue that they both need to be disambiguation pages because people appear to linke to them with abandon. In general, these two appear to have three uses:
  • To model something that has a maximum value that it approaches first quickly, then linearly, and finally asymptotically.
  • To model a binomial or multinomial process.
  • To model the same process, but to use the model and data to allow for the estimation of of parameters in the model.
I would suggest the present "logit" and "probit" be moved to the same with function added on the end. But otherwise, it seems that there is a lot of confusion about the relationship between "logit" (and to a lesser extent, the "probit") and the use of these functions as link functions for regression or for models that do not have fit parameters. It is interesting to note that Naive_Bayes_classifier, Rasch_model, and Mode_choice are similar articles on very disperate topics. At any rate, I'll leave logit and probit linked here for discussion. Pdbailey 19:12, 27 April 2006 (UTC)Reply

I'd suggest that the main article should be called "Discrete regression", which is sufficiently general to encompass all the cases suggested for merger (note that logit doesn't give rise to a binomial distribution but to a logistic). The particular cases could either be treated in full in subsections or discussed briefly with a link to separate articles depending on how thorough the treatment is. JQ 07:10, 28 April 2006 (UTC)Reply

That strikes me as a bit too narrow. If we want to generalize and merge, it seems to me that the right context would be the article on generalized linear models, which don't have to be discrete. It's in the context of GLMs that link functions can be discussed properly (where the logit is indeed the canonical link function for binomial regression models and its relationship to the logistic CDF is an afterthought). I agree with your general point on subsections: articles like Ordered Probit are very short and don't make a whole lot of sense as stand-alone entries. Even if that article was much more detailed, it's probably better to have it as a subsection of a larger section (or article) on probit models, to establish context and make it easier to compare and contrast. If a section becomes too long, it will be easy to spin it off into a separate article, but let's start off with a single coherent article instead of the current fragmentation. --MarkSweep (call me collect) 08:52, 29 April 2006 (UTC)Reply
Given the current size of the articles, it would probably be reasonable to merge most of these into the article on the generalized linear model, with a section on ordinary linear regression and link to linear regression, and then sections on discrete regression and count models. But in the long run, it would be good to have full articles down to the level of Ordered Probit.

Yes, so the general idea was to merge the other pages in with this one, based on the basic structure I gave this one in just a few hours. I think GLM will be to laden with we include all the possible GLM in it. That said, if others want it in there for now, I'm happy to move all this in there and then move it if this gets to be too much. Pdbailey 05:13, 2 May 2006 (UTC)Reply

I would be against merging logistic regression, either here or with GLM. The ol' Google test yields: 5.34 million hits for phrase "logistic regression", 75,700 hits for "binomial regression", and 170 thousand hits for "generalized linear model". So, people out in the world seem to care about logistic regression as its own entity, and will look for it as such in Wikipedia. hike395 13:24, 10 May 2006 (UTC)Reply
So it looks like your logic is that there are more pages that google finds with the search "logistic regression" than the catagories that it falls into, so there should be a wikipedia article about it in that area. I would argue first that there are more google hits for "making gold" than there are for "alchamy", but that doesn't mean that making gold deservies its own article seperate form alchamy, and second that Wikipedia is not a dictionary, and why would you want it to be? There is a lot more to an encyclopedia in that it can bring topics together and give lots of context. So, redirecting to an overarching article makes a lot of sense. Pdbailey 20:00, 14 May 2006 (UTC)Reply
Um, "making gold" -> 91,600 hits; "alchemy" has 41M hits. Alchemy clearly wins, and it is the name of the article.
WP:WINAD doesn't talk about titles of articles, it talks about content --- articles shouldn't be simply definitions, but full articles.
The relevant Wikipedia policy is Wikipedia:Naming conventions (common names), which directs articles to be titled with the most common name. Hence, my use of the Google test.
I think that my logic still stands. It doesn't exclude having an article about generalized linear model, but I think that removing/merging a logistic regression will disappoint a lot of people. -- hike395 03:52, 17 May 2006 (UTC)Reply


Looking at the various components proposed for merger, I think there is already too much for a single article. What we need is for the more general articles (say on GLM and discrete choice) to have sections with links to more specific articles, giving rise to a natural hierarchy.JQ 05:31, 15 May 2006 (UTC)Reply

Agree: there's nothing that prevents us from having both a GLM article and a logistic regression article. I just oppose removing the logistic regression article.


As a biologist who has used Probit and Logit analysis to determine dose response curves, amongst other things, in the past but who does not understand all the ins and outs of the maths, I think it would be far more useful if these articles stood alone rather than being rolled up into a larger one on the binomial theorem. Either way, I would suggest that someone checks out Probit analysis (1971) D.J.Finney, Cambridge University Press before making any changes. Maccheek 16:21, 5 June 2006 (UTC)Reply

Maccheek, I guess exactly your situation is the one that worries me the most, that is people who thinking of logit and probit as very different. It's a bit like thinking of (a) linear regression after taking the log of the response and (b) linear regression, as completely different things. True, you need some math to link the two, but they are so highly related that having a seperate page for each is not straightforward. I think I'm getting many negatives because I have not develped this page. I'll work on that now. Pdbailey 03:28, 6 June 2006 (UTC)Reply
pdBailey. My concern would be that some poor biology student looking for information on probit analysis for use in determining dose reponse curves would get completely lost in the maths. Probably what we need is something written into the article on dose response relationship on how to use probits. I support getting the math sensible but also making it accessible to those whose maths is not so hot but may have to use it and want a basic understanding. Then again maybe I am just the oddity as a half numerate biologist. Most biologists, at least the agricultural ones I come across, would give up at the first equation at least I can understand most of it if I try (Many agricultural scientists employed in the agrochemical industry don't even seem to be able to calculate a mean properly!). On that I'll probably bow out of this thread Maccheek 20:18, 6 June 2006 (UTC)Reply
  • The best way to structure a complex topic like this IMHO is to have a main article that gives an structured overview (= 1 introductory paragraph, not only links) about the different methods and models plus separate articles on the individual sub-topic.
    1. Don't forget you are losing readers coming from Google as well as all of the Interwiki links if you simply merge all articles to one lemma.
    2. Merging too much into one big article is definitely no more digestible for students or beginners.
    3. This is not only a structure problem, it is a content problem as well. Most of the articles go to specific from the very first sentence and are ill-structured. Start with a short introduction that everybody can understand.
  • So please go step by step. For the moment it should be best if you merge Ordered Logit and Multinomial logit into logit as well as Ordered Probit and maybe Probit model into probit. Large scale solutions rarely work in Wikipedia. PanchoS 15:17, 13 July 2006 (UTC)Reply
I have clearly not done this an nobody to date agrees. I will remove the proposed merge link. Pdbailey 13:26, 17 July 2006 (UTC)Reply

Started cleanup

edit

I started at the top. I agree with some of the past discussion that some of these articles should be consolidated, although disagree strongly that they should all go into generalized linear model. Baccyak4H (Yak!) 17:33, 14 June 2007 (UTC)Reply

Baccyak4H, I was more pointing out that what already exists in the GLM article seems to be better than or about as good as what is on this page, not suggesting a merge. PDBailey (talk) 02:58, 5 January 2009 (UTC)Reply
what if we moved the article to a talk page (we could use one of mine) until it has some content and then undelete it when that is appropriate? Nothing links here except lists of articles, the article has no editors, and the thing is an eyesore. I had thought this would be totally uncontroversial since I created and wrote most of it, but probably should have started it on my user page in the first place rather than in the main space. PDBailey (talk) 14:05, 5 January 2009 (UTC)Reply
If I wanted to find out about it then I would come here and read it. Do not confuse an article with few edits with an article which is useless. I have no idea what BR is (and no interest in the topic as a topic), but, assuming the page has correct information, I would find it valuable as a freestanding page if I wanted to know about it. Clean it with pleasure, but neither merge nor delete it. Fiddle Faddle (talk) 14:11, 5 January 2009 (UTC)Reply
Timtrent / Fiddle Faddle, my point was that if you wanted to find out about binomial regression (1) you would probably not know to call it that (this is the reason for the above proposed merges) and (2) you would be happier reading what is on the GLM page on this topic since it has more information. Perhaps a redirect would make sense. PDBailey (talk) 14:35, 5 January 2009 (UTC)Reply
Trust me! If I ever want to find out about BR I will know what it is called! when I want to find out about something I tend to know what it is I want to find out about. What I do not want to do is to wade through a load of other stuff first
Imagine wanting to find out about Chlamydia and instead having to read about King Lear's other daughters Syphilis and Gonorrhea as well to get to the information I want! Fiddle Faddle (talk) 15:27, 5 January 2009 (UTC)Reply

(outdent) Actually, that is exactly what redirects are for. In fact, we could redirect right to the appropriate section of the GLM article, so as not to have to read through the other nonbinomial topics. Now, as to where to stash the in progress page, a better idea IMO would be to put it as a subpage of the GLM article's talkpage. But userspace is always a safe bet...

But if there is going to be significant editing on this article soon, it is not clear to me that subpage/deletion is even the best option; why not just improve the current article? (Also, I know I am punting here but in terms of improving the project any of the proposed courses of action would work.) Baccyak4H (Yak!) 18:17, 5 January 2009 (UTC)Reply

It is one of the uses of a redirect, yes. And if Mediawiki software were actually any good I might agree with the idea of a redirect to a section. But, when the section name changes the redirect is broken. While a redirect is usable until that point it is useless thereafter and just goes to the article itself. Redirects, while they work in this manner, should really only be used as gross redirect, whatever the custom and practice suggests. Far better to improve individual articles. And improvement can be done "on the run" very easily. Not by me in this article, though. It's all Greek to me. Fiddle Faddle (talk) 20:23, 5 January 2009 (UTC)Reply
Now there's a thought. Why not have a single article on WP and call it "Logos". Have many many many subsections and have even more redirects. We could even drop the stupid Mediawiki software then and go back to good ol' HTML... 87.175.60.49 (talk) 00:11, 6 January 2009 (UTC)Reply

Timtrent / Fiddle Faddle and 87.175.60.49, the thing is that I created this article because I felt that the Wikipedia reader might get the impression that logit and probit are different in some non-minor ways when they are, in fact, highly related. The idea was to merge those two articles (along with several others) into this one (see above). If either of you could demonstrate that you had some idea what you were talking about when commenting on this article, or could sensibly respond to my claim that those looking for binomial regression will not know it by name (see [1] if you are interested in not knowing the right name for what you are talking about), or if you could determine the difference between this article and what is in the proposed link page, I would be more interested in your comments. Until then, if you really want to enjoy Chlamydia separate from when you enjoy your Gonorrhea, this article is not your desideratum.

Despite over a year of no editing interest, I will wait at least one week before making this a redirect to the section. In answer to Baccyak4H, I suspect initial disinterest in this article, or even knowledge of why it might be a good idea to have it drives lack of interest; but in the end there are as many reasons why not to edit an article as their are editors. The one thing we can be sure of is that the answer to the question "should I make substantive improvements to Binomial regression now?" has been no for every editor for 18 months. PDBailey (talk) 02:47, 6 January 2009 (UTC)Reply

Wikipedia is based upon consensus. You do not have a consensus to make it a redirect, nor will you have one unless it has been built prior to your choosing to do so. What you have so far is "no consensus" which means "Keep it as it is", though that does not prevent improvements. Also, please do not be patronising. You say "If (either of) you could demonstrate that you had some idea what you were talking about when commenting on this article...", and I find that not to be particularly friendly or useful. You have no idea where my skill levels lie except insofar as I state what they are. Those statements need not be true.
It would be well worth your while reading about ownership of articles. You may have created it, but the community controls its destiny from the point you pressed save page. Fiddle Faddle (talk) 11:21, 6 January 2009 (UTC)Reply
Timtrent / Fiddle Faddle, it is true that I do no know your skills, I more left it open for you to make some argument that this page should not be moved. So far every argument has been completely general to all pages which amounts to a blanket block on merges / deletion and is simply insane and untenable. Wikipedia operates on consensus but there is no policy (that I know of, feel free to prove me wrong) saying that no action is the best action when everyone does not agree. In general, in a case such as an AfD the quality of the arguments is weighed. Baccyak4H appears to be indifferent (except to prefer for someone to improve the article) and Baccyak4H is well informed on this topic. It is true that you and the anonymous editor may be well informed, but your comments to date do not yet demonstrate that. PDBailey (talk) 14:48, 6 January 2009 (UTC)Reply
What I have demonstrated clearly is that you have, currently, no consensus for a merge or redirect. My argument is simple. I do not believe this page should be turned into a redirect or merged, thus I oppose the proposed action. And that is all that I have to do.
In the same manner that you are one voice in this community, so am I. So is any other editor. Since you have no consensus currently, by which I mean an explicit lack of consensus, and since you do not own the article, your role, should you wish to merge/redirect is to build consensus in the normal way. If you build that consensus I am bound to respect it even, perhaps especially, if I do not agree with it.
When there is no consensus for an action Wikipedia custom and practice says that the proposed action does not take place. Look at the AfD process and other similar processes as your guiding examples.
I have no idea how much weight the IP only editor should be given, bit it is certainly an opinion against your proposed action. S/he makes an amusing point and one that has some relevance. After all, WP is not short on database horsepower here, and the disk space is already used, so that is not a consideration.
Instead of barrack room lawyering, why not enhance the article. Lack of other editors playing in the pond you created is no indicaton of anything at all. Fiddle Faddle (talk) 19:39, 6 January 2009 (UTC)Reply
Timtrent / Fiddle Faddle, I eagerly await an enumeration your reasons why this article should not be made into a redirect. Please highlight how your concerns are specific to this article and its content. PDBailey (talk) 02:02, 7 January 2009 (UTC)Reply
It should not be made a redirect because it is worthy of a separate article. That is it. Period. If you think it is a poor article, them improve it. If you want a consensus to merge and redirect it, then build one. Please cease bothering me with trifles. Devote your considerable energies to improving the articles that you touch rather than spending hours engaging in pointless debate. How much better could this article be, now, if you had done this already? Fiddle Faddle (talk) 07:54, 7 January 2009 (UTC)Reply

What's needed now

edit

OK let's try to improve this article. I have made a start. I think we need two further things:

  • At least one more reference, preferably more dedicated to binomial data ... is there not a book by DR Cox on binomial data.
  • A decent example (with a reference) where binomial regression is applied, at least saying what the binomial observations reprersent and what the explanatory variables were.

Melcombe (talk) 13:42, 7 January 2009 (UTC)Reply

Thank you for the enhancements. Fiddle Faddle (talk) 14:46, 7 January 2009 (UTC)Reply
Analysis of Binary Data, Second Edition (Monographs on Statistics and Applied Probability) by D.R. Cox and E. J. Snell. It is in the same series (the green books). Not sure about adding something as a reference when the authors have not, you know, referred to it. PDBailey (talk) 01:24, 8 January 2009 (UTC)Reply
This seems quite common in the stats articles on Wikipedia, when they bother to includes refs at all. If you want to be careful on the point, it could be included in a section headed "Further Reading", as is also done here. Melcombe (talk) 15:42, 8 January 2009 (UTC)Reply

With the page as it is, just a few additions would make it have more information on it than the probit model page (as it currently is). But there could be some value to having those articles. It occurs to me, could we have those pages say this is the main article (i.e. fitting is essentially identical), but contain specific information on the regression formats / history / dogmatic uses in various fields. PDBailey (talk) 03:54, 11 January 2009 (UTC)Reply

Binomial vs. binary regression

edit

Every example on this page discusses settings where the response variable yi only takes on the values 0 and 1 (that is, are Bernoulli distributed). It seems to me that those are better covered at binary regression, not here. I think we need to be clearer on this page about what (if any) benefits come from the generalization from binary to binomial regression. How do most texts make that distinction? (Pinging Nbarth as well.) Wikiacc () 13:37, 3 November 2019 (UTC)Reply

Thanks Wikiacc; this is an important and subtle distinction!
The formal term is grouped binary data (see grouped data), and a great explanation is given in the introduction to Collett, Modeling Binary Data (p. 1).
A key example to bear in mind is "sex ratio for a given couple". Just as sex ratio (at birth) for births in a population deviates from 1:1 (it's instead p:1), the distribution of births in a couple/family with (say) 2 children is not   (binomial distribution with parameter p): there are more male/male and female/female families. This is called overdispersion (relative to the binomial model), a form of model misspecification, and interestingly can still be modeled within the Generalized Linear Model (since you only need the mean and variance to estimate the parameters)!
In some applications, notably in machine learning (specifically supervised learning), people only look at individual points (ungrouped data), so it's really a binary regression, as you say.
"Binomial regression" is said because binary regression is a special case ( ), and it fits into the Generalized Linear Model framework, but I agree it's clearer to treat these as binary regressions, and explicitly discuss binomial only when it's actually grouped data.
I'll have a shot at rewriting the intros, but feel free to reorganize these pages as you see fit!
—Nils von Barth (nbarth) (talk) 04:39, 11 November 2019 (UTC)Reply