This is the talk page for discussing improvements to the MNIST database article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
| ||||||||||
A fact from this article appeared on Wikipedia's Main Page in the "Did you know?" column on September 19, 2013. The text of the entry was: Did you know ... that the best error rate a computer program has gotten on the MNIST database of handwritten digits is 0.23 percent? |
This article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||
|
Questions
edit- the link to the "Official Homepage" asks for username/password?! — Preceding unsigned comment added by 129.27.234.52 (talk) 11:00, 23 April 2024 (UTC)
- MNIST : the link to github images of the MNIST database has changed, but I cannot edit the page to update the link.
The new link is : https://github.com/mbornet-hl/MNIST/tree/master/IMAGES/GROUPS Can a Wikipedia administrator update the link for "groups of images of MNIST handwritten digits on GitHub" ? Thanks. — Preceding unsigned comment added by Wikispring001 (talk • contribs) 21:19, 8 October 2019 (UTC)
- Half of the training set and half of the test set were taken from NIST's training dataset, while the other halves were taken from NIST's testing dataset..... is that 4 halves? Surely quarters or rephrase
- There have been a number of scientific papers attempting to achieve the lowest error rate.... what no computers? just papers?
- SVM? This isn't a common abbreviation .... explain please?
Hope this is helpful Victuallers (talk) 22:05, 31 August 2013 (UTC)
- @Victuallers: The first sentence was rephrased. (Good catch...) SVM is wikilinked both places it appears in the article, but I changed it to be spelled out. Thanks! APerson (talk!) 11:52, 1 September 2013 (UTC)
- This is not for DYK.... I'm just curious. What are digits? I guess this is 0-9 but does it include 12.1 or 12,1 as the French would write it. Can it tell 9,123 which is a real number less than ten in France from 9,123 which is an integer over 9 thousand in England? Is the French 7 supported... and programmers who write a zero with a slash? You may not know much about this but a picture showing the variation in digits that this database holds would be intriguing... Victuallers (talk) 13:45, 1 September 2013 (UTC)
- It's just the digits 0 through 9. I'm trying to get a picture of the database from one of the gazillions of scientific papers about it; I'll probably add a picture soon. APerson (talk!) 22:46, 1 September 2013 (UTC)
- This is not for DYK.... I'm just curious. What are digits? I guess this is 0-9 but does it include 12.1 or 12,1 as the French would write it. Can it tell 9,123 which is a real number less than ten in France from 9,123 which is an integer over 9 thousand in England? Is the French 7 supported... and programmers who write a zero with a slash? You may not know much about this but a picture showing the variation in digits that this database holds would be intriguing... Victuallers (talk) 13:45, 1 September 2013 (UTC)
Citation about error rate of 0.27%
editTomash, I just undid your removal of a citation since I felt that the achievement of a lowest error rate was important information. APerson (talk!) 18:13, 9 June 2014 (UTC)
- @APerson:, there is another citation of a paper by the same authors two sentences ahead which states 0.23 error rate. So I removed the citation as it seemed to me that 0.27 was not the lowest error rate (0.27 > 0.23). Tomash (talk) 18:46, 9 June 2014 (UTC)
- @Tomash: Reviewing the papers a lot more closely, it looks like the most recent paper (that is to say, the 2012 one) does not acknowledge the 2011 paper. While writing the article initially, I relied on the 2012 one for information and didn't look at the 2011 one until later, at which point I thought "Cool!" and stuck it in the lead. I'll move the 2011 paper down into the "Performance" section, since it did break the previous record in the literature at the time it was published. APerson (talk!) 01:09, 10 June 2014 (UTC)
0.21% Test error rate citation
editI think the paper at http://yann.lecun.com/exdb/publis/pdf/wan-icml-13.pdf should be cited here, because they attained a test error rate of 0.21% in 2012 or 2013. The citation in the table just takes me to a set of 5 neural network weights on Google drive with a timestamp of November 2016. Searching for "Parallel Computing Center (Khmelnitskiy, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate." as per the citation gives me nothing.
Aizenberg et. al./0.0 error rate
editFrom 2018-07-13 to 2018-07-23 and then (after reversion by Joel B. Lewis on 2018-07-27) again on 2018-08-07, one user added the Aizenberg et. al. result[1] of 0.0% error rate to the page.
As Joel B. Lewis mentioned in his reversion, this is "[...] a month's worth of efforts to promote brand-new papers here with no evidence of significance or lasting impact". My removal is predicated on two arguments, one against the result and one against the edit.
- 0-example error rate is arguably impossible to reach on MNIST without overfitting on the test set (i.e. a serious methodological error in the paper); see for example the already-cited examples in the article[2]. The only other paper that I'm aware of claiming something similar was withdrawn after finding exactly these issues. Therefore, arguably, the cited paper is wrong (but that's probably not for wikipedia to decide), but definitely Wikipedia:EXCEPTIONAL.
- The paper was edited into the article on the same day the conference on which it was published concluded. It has been added by an author with no contribution other than promoting this paper. It has been readded even after deletion. It therefore looks like WP:SPAM.
I have therefore deleted these additions. If I'm mistaken in this, please let me know.
Recurrent answer (talk) 10:18, 26 August 2018 (UTC)
References
- ^ I. Aizenberg and A. Gonzalez “Image Recognition using MLMVN and Frequency Domain Features”, Proceedings of the 2018 IEEE International Joint Conference on Neural Networks (IJCNN 2018), Rio De Janeiro, July, 2018, pp. 1550-1557.
- ^ MNIST classifier, GitHub. "Classify MNIST digits using Convolutional Neural Networks". Retrieved 3 August 2018.
"Decision Stream" Editing Campaign
editThis article has been targeted by an (apparent) campaign to insert "Decision Stream" into various Wikipedia pages about Machine Learning. "Decision Stream" refers to a recently published paper that currently has zero academic citations. [1] The number of articles that have been specifically edited to include "Decision Stream" within the last couple of months suggests conflict-of-interest editing by someone who wants to advertise this paper.
Known articles targeted:
- Artificial intelligence
- Statistical classification
- Deep learning
- Random forest
- Decision tree learning
- Decision tree
- Pruning (decision trees)
- Predictive analytics
- Chi-square automatic interaction detection (new!)
- MNIST database (new!)
ForgotMyPW (talk) 17:13, 2 September 2018 (UTC) (aka BustYourMyth)
References
- ^ Ignatov, D.Yu.; Ignatov, A.D. (2017). "Decision Stream: Cultivating Deep Decision Trees". IEEE ICTAI: 905–912. arXiv:1704.07657. doi:10.1109/ICTAI.2017.00140.
Updating lowest error rate on introduction to reflect the "Classifiers" section.
editThe introduction of the article mentions "There have been a number of scientific papers on attempts to achieve the lowest error rate; one paper, using a hierarchical system of convolutional neural networks, manages to get an error rate on the MNIST database of 0.23%.".
Either that should be modified to reflect the new lowest 0.21% error rate mentioned in the "Classifiers" section or removed entirely to prevent the need for editing of the introduction to reflect the changes in the "Classifiers" section. In this specific instance there is no point in mentioning an error rate that is bigger than the lowest achieved.
Bit rot necessitates some updating.
editThere were no easily recognizable links to any version of this database.
Just now I added a, not prominent enough, link to NIST's newest offering.
At least one of the links in a citation is no longer freely available. Trying to hit that URL yields a login prompt. I added a remark to that effect within the citation.
The NIST web site does not make finding the MNIST dataset easy. In 10 minutes of searching their web site I was unable to find it. It might even be gone.
Yes, I do know about github, but NIST is primary source (and github material can also move).
2601:1C1:C100:9380:0:0:0:E052 (talk) 22:17, 11 April 2022 (UTC) A Nony Mouse
- I reverted the flagging of needing login since I had no problem. Perhaps some momentary glitch. Added back EMNIST info, but modified it some. StrayBolt (talk) 01:12, 12 April 2022 (UTC)