Talk:Cross-industry standard process for data mining
This article is rated Stub-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Link to IBM ASUM-DM
editThe Link to IBM ASUM-DM is broken. Can someone fix it?
Link to Shearer's paper on the Journal of Data Warehousing
editThe link is broken, it points to IBM's SPSS page. Can someone fix it?
--Lucas Gallindo (talk) 12:54, 28 July 2011 (UTC)
fixed broken links, removed other dead links, and updated some text Karl (talk) 01:19, 6 November 2012 (UTC)
Added sentence to compare CRISP-DM to SEMMA. The SEMMA page also recently got an ORPHAN wiki-tag, and this page seemed to be one of the most natural pages to link to SEMMA. Karl (talk) 16:18, 15 November 2012 (UTC)
Sources
editIs there any reason to believe that site kdnuggets.com is a reliable source? Deltahedron (talk) 21:38, 17 December 2012 (UTC)
- Good question. I think it might be one of those sources that is "in-between" in being viewed as a reliable source. And probably some people in the wikipedia community would consider it to be, and others wouldn't. My personal view is that it is a reliable source. However, it is certainly not a typical published source. The 2nd clause of Wikipedia:Identifying reliable sources indicates that if "the creator of the work" is "regarded as authoritative in relation to the subject" then it is a reliable source. In my view, this criterion is met by KDnuggets.
- Kdnuggets and the polls there are authored by Gregory Piatetsky-Shapiro, and in my personal POV he fits the description of an author who is "regarded as authoritative in relation to the subject." However, Wikipedia:Identifying reliable sources indicates that for a source to be reliable this authoritative standing would have to be "demonstrable to other people". So, I did a little digging today, to see what the available evidence is for Piatetsky-Shapiro being regarded in this manner. I saw that Piatetsky-Shapiro has over 40 publications listed at http://www.kdnuggets.com/gpspubs/index.html. Other places place that count over 60. Probably even more important to solidify his being recognized as an authoritative expert in the field, is that he founded the ACM's SIGKDD and was awarded the organization's 2000 SIGKDD Service Award. I also saw many indications of KDnuggets popularity, frequent citation, and high regard. While popularity is not in itself a proper indication of KDnuggets reliability or notability, I think when taken together with the other information about KDnuggets and Piatetsky-Shapiro, I think its popularity and frequent citation does speak positively about its reliability and notability. KDnuggets reports having over 22,000 subscribers, and over 50,000 unique monthly web visitors. In addition to being featured on Forbes and CNN, I also looked at some of the specific citations on this CRISP-DM wikipedia page -- this is what I found:
- The Marbán et.al. source cited 7 different KDnuggets polls, and 3 Piatetsky-Shapiro papers or books
- The Kurgan et.al. source cited 8 Piatetsky-Shapiro papers or books (http://dl.acm.org/citation.cfm?id=1166027)
- I did not have easy access to check any of the other post-2002 sources cited on this page.
- I also found that on the data mining wikipedia page, 3 KDnuggets polls are cited, and there are 2 other Piatetsky-Shapiro citations. Karl (talk) 04:26, 18 December 2012 (UTC)
FYI - Spanish Source
editFYI - Spanish Source, in case this is of interest to anyone: http://www.oldemarrodriguez.com/yahoo_site_admin/assets/docs/Documento_CRISP-DM.2385037.pdf. It reviews several data mining process models. I'm not sure exactly how to cite this source, or whether Wikipedia's convention is to cite sources in non-English languages or not. Karl (talk) 05:56, 18 December 2012 (UTC)
Inclusion of some CRISP-DM 2.0 material
editThe section titled CRISP-DM 2.0 was recently deleted. I agree that it does not warrant it's own section, but I think that some of the deleted material should be retained in the history section. I think that it is useful to readers to have information in this article that points out that the original consortium is no longer working together, the original www.crisp-dm.org website is gone, and that the initiative to create a revised/updated CRISP-DM 2.0 is no longer active (no activity for years, website gone, etc).
I've spent some time looking, and I can't find any sources to cite about CRISP-DM 2.0 being inactive. But I suppose that this is a challenge that other wikipedia pages have faced also. Can other people please suggest how to document this. When something goes away or stops, there's not always a published source that says it's gone. But (in my opinion) it can still be something worth noting on a wikipedia page.
For now, I will add some very brief information to the CRISP-DM page. But I welcome others to please edit it to improve it and make it comply better to wikipedia standards for this kind of thing. Thanks. Karl (talk) 15:34, 18 December 2012 (UTC)
- "I can't find any sources to cite about CRISP-DM 2.0 being inactive". Then we simply don't say that.
- Wikipedia:Verifiability policy requires that Even if you're sure something is true, it must be verifiable before you can add it. Deltahedron (talk) 17:50, 18 December 2012 (UTC)
- I agree in principle. So I have removed my interpretation that the "efforts have stalled".
- But I think that it is just a statement of fact that 1) the websites are now gone, and 2) the consortium leaders have not communicated to the CRISP-DM 2.0 SIG members (I was one of them, and I know other SIG members). It seems to me to be worthwhile to point these things out to readers of this page, because it gives readers a sense of the current status of CRISP-DM, and whether or not it was a static thing that happened around 2000, or if there is a vibrant group working to update it. However, it seems impossible to expect that the absence of a webpage is going to be backed up by a peer-reviewed journal citation. So, in my opinion, stating that a website is no longer there is not a controversial claim that needs a citation. Anyone who looks will see that it is not there. And, if in the future someone puts the website back up, then edits can be made to this wikipedia entry to reflect that the website is active.
- And since Deltahedron has had concerns about my COI in the past, let me state openly that:
- I participated in one meeting in 2006 that was held to discuss possible things to address if CRISP-DM 2.0 revisions were to move forward
- In 2006 I was on a CRISP-DM 2.0 SIG email list.
- However, I do not feel that these two things (over 6 years ago) mean that I have a COI with CRISP-DM. It is not central to my work or my life, and is not something I spend much time thinking about. Karl (talk) 18:50, 18 December 2012 (UTC)
- I can only reiterate Wikipedia policy on the matter: this is not optional. If you wish for further comments, there is always Wikipedia:Reliable sources/Noticeboard. The requirement is not for academic peer-reviewed sources but for "reliable, third-party, published sources with a reputation for fact-checking and accuracy". Using one's own personal experience is simply not acceptable. Stating that a website is "no longer there" asserts (1) that it was once there and (2) it is not there now. Part (2) may be capable of direct verification by the reader, but (1) certainly is not. If no one else in the world has published a comment on the status of this group, then it is clearly not worthwhile to do so here. Deltahedron (talk) 19:13, 18 December 2012 (UTC)
- Good points. I will revise it. However, it is my personal POV that such strictness reduces the overall quality of the article. I also feel that such strictness is not applied consistently across articles or even within this article. E.g., Each of the sentences in the first 2 paragraphs of the History section contain claims that are not backed up with citations. I think it is fine for the authors of those sentences to have written them the way they did, without citations for each point. Over time, if other wikipedia authors did not agree with the material, it would get modified. I personally think that this more relaxed standard should be applied now too. Karl (talk) 19:35, 18 December 2012 (UTC)
- OK, Deltahedron, I've modified it. In the end, I think it looks fine. Thanks for your coaching, even if I wan't always happy to hear it. OK, that's all the time and energy I have available to devote to this page now. I'll step back and let others make modifications to enhance it further. Happy holidays. Karl (talk) 20:26, 18 December 2012 (UTC)
- I can only reiterate Wikipedia policy on the matter: this is not optional. If you wish for further comments, there is always Wikipedia:Reliable sources/Noticeboard. The requirement is not for academic peer-reviewed sources but for "reliable, third-party, published sources with a reputation for fact-checking and accuracy". Using one's own personal experience is simply not acceptable. Stating that a website is "no longer there" asserts (1) that it was once there and (2) it is not there now. Part (2) may be capable of direct verification by the reader, but (1) certainly is not. If no one else in the world has published a comment on the status of this group, then it is clearly not worthwhile to do so here. Deltahedron (talk) 19:13, 18 December 2012 (UTC)
Standard Methodology for Analytical Models (SMAM)
editReferences to this Wikipedia article continue to appear in this article. This concept appears to be original research and has created a circular reference back to this article. I have not been able to find any other published material that relates to this subject - Glenryman (talk) 05:19, 6 July 2015 (UTC)
Source link for "CRISP-DM 1.0 Step-by-step data mining guide"? (current one is wrong)
editAs of right now the sources include the following link under the name "CRISP-DM 1.0 Step-by-step data mining guides":
ftp://ftp.software.ibm.com/software/analytics/spss/documentation/modeler/14.2/en/CRISP_DM.pdf
This leads to a document that has the title "IBM SPSS Modeler CRISP-DM Guide" - which is an entirely different document than stated in the name. (see https://inseaddataanalytics.github.io/INSEADAnalytics/CRISP_DM.pdf for a non-FTP link to a copy of that document)
I've found other online sources that have some version of the CRISP-DM Guide 1.0, but they are not so "official" / "reliable":
the above in turn has links to pdfs hosted by university of Kassel (unfortunately with some pixelated graphics):
http://www.kde.cs.uni-kassel.de/lehre/ws2016-17/kdd/files/CRISPWP-0800.pdf
https://www.kde.cs.uni-kassel.de/wp-content/uploads/lehre/ws2015-16/kdd/files/CRISPWP-0800.pdf
I've also found another pdf with better graphics here:
https://www.the-modeling-agency.com/crisp-dm.pdf
I think I'll change the source to reference on of the university of Kassel links and also the link with the better graphics. However, it would be great to have better / more reliable sources.