Archive 1Archive 2Archive 3

About original research

Wikipedia is great partly because of its rules, made by many sharp people over time. One of these rules is no original research; Wikipedia is not a place to showcase new papers under the guise of citations, since it suggests the new paper is a reliable source; the fact that it cites other sources does not mean that it itself can be viewed as a secondary source for our purposes; what is wanted is citations one-source-removed, such as established journals, newspapers, textbooks -- impartial analysts, looking objectively at primary sources. In this case, the citation added is a primary source -- a pdf file of a research paper; don't see why it is in this article other than to promote this particular paper using search engine optimization.--Tomwsulcer (talk) 17:51, 3 February 2014 (UTC)

Proper analysis is still needed

Headine-1: Big data: are we making a big mistake? March 28, 2014

[The science of 'statistical learning' and 'computer learning' can keep research on track.] — Charles Edwin Shipp (talk) 01:48, 30 March 2014 (UTC)

Canadian Open Data Experience

Another editor is insisting that this article include a mention of "the Canadian Open Data Experience (CODE) Inspiration Day event held at the University of Waterloo Stratford Campus located in Stratford, Ontario" at which "renowned Data Scientist Hilary Mason spoke about Big Data." I don't see how this adds to a reader's understanding of this topic and remain convinced that it should be removed. Can other editors please comment or contribute to this discussion? Thanks! ElKevbo (talk) 11:57, 7 April 2014 (UTC)

I agree with ElKevbo.--Tomwsulcer (talk) 22:30, 7 April 2014 (UTC)
I do not agree with Elkevbo because the mention is similar in tone to the previous line about the IBM-sponsored championships. Perhaps the editor could consider removing the Hilary Mason mention - and simply identify the event. Statdata (talk) 03:24, 11 April 2014 (UTC)
You're right - the IBM event needs to be removed, too. ElKevbo (talk) 03:47, 11 April 2014 (UTC)
Agree with ElKevbo again. Perhaps an external link could be included to this stuff but in the body of the text, Wikipedia's rules want secondary sources, impartial one-step-distanced from source material.--Tomwsulcer (talk) 11:06, 11 April 2014 (UTC)

Running the Duplication Detector report reveals the following:

Comparing documents for duplicated text:

    http://en.wikipedia.org/wiki/Big_data
    http://www.stanfordlawreview.org/online/privacy-and-big-data/privacy-and-big-data

Downloaded document from http://en.wikipedia.org/wiki/Big_data (239986 characters, 7914 words)


Downloaded document from http://www.stanfordlawreview.org/online/privacy-and-big-data/privacy-and-big-data (71117 characters (UTF8), 5217 words)
Total match candidates found: 1202 (before eliminating redundant matches)

Please run the report itself to see which sentences are a exact match (about 6) & which are close paraphrases (about a dozen or more).
Peaceray (talk) 02:51, 8 June 2014 (UTC)

Lablanche & Company

I proposed to add the CSS product designed by Lablanche & Company for data compression and data encryption in one step. It is possible?

SL — Preceding unsigned comment added by 90.50.49.149 (talk) 16:30, 10 January 2015 (UTC)

No, as there is no indication whatsoever that the company is notable in any way.--McSly (talk) 23:18, 10 January 2015 (UTC)

The CSS product has an interest for data compression and encryption and can be sold to big companies and institutions. This product can generate ten of millions of dollards, so i think you should accept to include it in the wilkipedia big data page (just one sentence). "The start-up Lablanche & Company commercializes a prototype named CSS for big data vizualization and data compression/encryption using the recent compressed sensing theory". SL — Preceding unsigned comment added by 90.50.49.149 (talk) 04:02, 12 January 2015 (UTC)

First, please us your account (Fgtyg78) instead of the IP to make communication easier. As mentioned on your talk page, the company you want to add here is completely unknown so it cannot be included (see WP:NOTABILITY). To be clear, there won't be any exception to that rule. It also looks like you have a conflict of interest which is not helping your case (see WP:COI). So I'm going to remove the link again. If you re-add it, I will file a report to have this page protected and the url added to the black list. --McSly (talk) 04:32, 12 January 2015 (UTC)

This company is not unknown because its website is referenced on Google and on the Compressed Sensing wilkipedia page. Fgtyg78

What do you have to answer to this? Fgtyg78

Please respond to appearance of conflict of interest. Peaceray (talk) 06:41, 12 January 2015 (UTC)

There is not conflict of interest because CSS is a unique software prototype for big data problems with an innovative mechanism of compression/encryption. This prototype has no equivalent in big companies and institutions. So this thing must appear in encyclopedy to inform people on the use of compressed sensing in big data.

Fgtyg78 — Preceding unsigned comment added by 90.50.49.149 (talk) 11:01, 12 January 2015 (UTC)

The repeated prominent posting of a non-prominent company in the lede of an article will always appear to be spam and a a conflict of interest. If you had paid attention, you would have noticed that no other company is mentioned in the lede. Please stop posting this change to the lede. Discuss the issue fully here first to arrive by consensus at the appropriate way to include the information, or not. Peaceray (talk) 16:57, 12 January 2015 (UTC)

Semi-protected edit request on 29 January 2015

Would it be possible to add link to e-Science page (https://en.wikipedia.org/wiki/E-Science) in the section discussing research applications of Big Data?

The e-Science page seems to provide more details of the examples mentioned on the Big Data one, and this linking could eventually motivate avoiding some duplication of the material.

Mheikkurinen (talk) 20:02, 29 January 2015 (UTC)

  Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format. It's already a wikilink. — {{U|Technical 13}} (etc) 20:23, 29 January 2015 (UTC)

Tone of article

Hi. A few reverts back the tone of the text changed entirely and now sounds like a poorly written magazine article. Can someone take the article back to a stable point please? Rui ''Gabriel'' Correia (talk) 08:10, 29 January 2015 (UTC)

I have since restored parts of an earlier version to strenghten the lede. Rui ''Gabriel'' Correia (talk) 12:21, 29 January 2015 (UTC)

I disagree with these edits, but am not going to engage in edit warring on a subject for which I have no passion. The most I could do would be to slap about 4 or 5 requests for clarification tags — which the editor would most likely remove, as he has twice done with the "tone" tags, without appreciating that these are there to help.

  • "Data has always been Big." — this is opinion (original research) and vague and meaningless.
  • "The one aspect that differs now (if compared with the past) would be the sheer scale and accessibility of Data, which is the direct result of the super efficient speeds in which data can now be computed." — I think the editor is confusing concepts of size and importance
  • Big Data is therefore an all-encompassing term for any collection of large data sets that were once difficult to process." — so, nowadays it is easy? Just like that? Why?

Rui ''Gabriel'' Correia (talk) 14:34, 29 January 2015 (UTC)

For what it's worth, I agree with Gabriel on all these points. And in particular, I think it's important that the opening paragraph be a clear summary of the topic--if someone sees just the first couple of sentences (e.g. in a search results page) they should have some idea of what the topic is about. (I also chimed in on the other editor's talk page, User talk:Jugdev#Manual_of_Style.)
I also don't want to get into an edit war here! I've asked for admins to help. (Dispute Resolution Noticeboard) In the meantime, I'll put the "tone" tag back up on this page with a link to the discussion here. AIUI, removing a tag like that without participating in the discussion is a clear violation of WP mores, so let's assume that won't happen again... -- Narsil (talk) 00:41, 1 February 2015 (UTC)
I've been told that there hasn't been enough discussion to merit admin intervention yet. So... As I agree with User:Rui Gabriel Correia's take, I'm going to restore his edits. User:Jugdev, please respond to the issues he and I raised here rather than just reverting! Thanks, -- Narsil (talk) 20:54, 1 February 2015 (UTC)
Thank you for your edits. I have noted my thoughts on your commentary:
  • "Data has always been Big." — this is opinion (original research) and vague and meaningless.
The concept of big data is vague. original research is varied and does not convey the multidisciplinary perspective of the quote in question. The only quote that encapsulates the concept in a clear manner is the one which has been included.
  • "The one aspect that differs now (if compared with the past) would be the sheer scale and accessibility of Data, which is the direct result of the super efficient speeds in which data can now be computed." — I think the editor is confusing concepts of size and importance
This is a open ended criticism, without any depth - could you please elaborate on what you think is being confused? If i understand correctly, size is the reason why the concept (Big Data) has been given its ranter unusual name. It's importance is not referenced in the opening paragraph - more prominence has been given to the term itself (i.e. the subject of the article) and how it has entered into the public sphere...
  • Big Data is therefore an all-encompassing term for any collection of large data sets that were once difficult to process." — so, nowadays it is easy? Just like that? Why?
This is the sentence that Narsil keeps reverting back to... In response to the question, big data will always be difficult to process but it is easier now, as tools previously used by statisticians and analysts were limited. The study of statistics and data will evolve and procedures will become more elaborate as data sets grow larger.
Here are a few quotes from the wiki guidelines on tone:
"Wikipedia articles, [...] should be written in a formal tone. Standards for formal tone vary depending upon the subject matter, but should follow the style used by reliable sources, while remaining clear and understandable"
I believe that my version of this article is in line with the standards noted above. The sources are reliable and the content is clear and easily digested.
"Normally, the opening paragraph summarizes the most important points of the article. It should clearly explain the subject so that the reader is prepared for the greater level of detail that follows."
I believe that the opening sentence in this article is clearer than the version written prior to my involvement. It summaries the most important aspects of this new topic, which is presently being debated within the academy (i.e. all academic institutions) as I write. — Preceding unsigned comment added by Jugdev (talkcontribs) 09:46, 2 February 2015 (UTC)

Jugdev. I offered to help, yet you rejected my help, claiming you understood what you were doing. Apparently not. I completely understand what you are trying to do by presenting the most recent developments first, but it does not work that way. Let's look at the first sentence of Elephant, for example: "Elephants are large mammals of the family Elephantidae and the order Proboscidea." Now, if I want to add that the market for ivory is driving the African elephant to extinction, where do I add this? Before the original text - i.e., present the most recent information as you are doing with big data, or after the existing sentence? Let's take a look:

  • 1. The illegal trade in ivory is driving the African elephant to extinction throughout most of Africa. Elephants are large mammals of the family Elephantidae and the order Proboscidea.
  • 2. Elephants are large mammals of the family Elephantidae and the order Proboscidea. The illegal trade in ivory is driving the African elephant to extinction throughout most of Africa.

Which one (1. or 2.) is a most logical sequence?

That is ONE aspect of it. The other is the tone. The tone is going wrong precisely because you are swinging around the logical order of the bits of information. Which is why you need to add "Data has always been Big", otherwise the next sentence hangs, because segments like "aspect that differs now", "compared with the past", trigger in the reader a sense that something is missing. So you patched in the bit about "Data has always been Big" to cover it up (and plagiarised - read below).

You also claim to be familiar with the styleguide and have now had ample opportunity to analyse your edits to see if they comply. It amazes me then that you keep on claiming that you edit is in line with the styleguide and yet you have not yet picked up that there is a problem with "sheer scale" and "super efficient speed". This is partly because you just plagiarised the source, then changed or moved one or two words around "Data has been “big” all along. What has changed now is not just scale and cross-channel inputs, but the sheer speed and accessibility of data".

Greetings. Rui ''Gabriel'' Correia (talk) 11:00, 2 February 2015 (UTC)

The way forward

I don't think it is very productive to have a whole team of editors monitoring one edit by one editor. We have done all that can be considered par for the course, we have pointed out what is deficient about the version the editor would like to use, all to no avail. If said editor cannot grasp a simple thing, such as not starting an article/ lede on a minor sentence, then he needs a tutor. I don't know if appointing a tutor is foreseen in the mechanisms to deal with stubborn editors. If not, progressive blocking seems to be the only solution. Regretably. Rui ''Gabriel'' Correia (talk) 23:57, 5 February 2015 (UTC)

RfC: Is the opening paragraph a good summary of the topic?

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Is the opening paragraph a good summary of the Big data topic? Narsil (talk) 19:16, 2 February 2015 (UTC)

Background: There has been disagreement among three editors about the topic paragraph for this article. (You can see the discussion in #Tone of article.) User:Jugdev, the author of the current version (here's one diff where he restored it), feels that with this version "the opening sentence in this article is clearer than the version written prior to my involvement. It summaries the most important aspects of this new topic, which is presently being debated within the academy". User:Rui Gabriel Correia and I feel that this version is unencyclopedic in tone, contains subjective statements, and does not meet WP style (in particular, the opening sentence is not a summary of the topic). We would prefer the summary as it stood before Jugdev's edits (version 644139720).
We've reached an impasse--our discussion is basically, one says X, the other says "I disagree, Y", the first person says "no not Y, X". ;-) The edits have basically consisted of us reverting each other repeatedly, which is not good! Since there are already three editors involved I didn't know if WP:3O was appropriate. I'm hoping input from more editors will settle this one way or the other. Narsil (talk) 19:27, 2 February 2015 (UTC)


Original The version which starts Data has always been Big. is certainly unencyclopedic in tone and such conversational style should be avoided. I much prefer the one sentence version. The current formatting (with bold Big, Data and Big Data in the first, second and third sentences) is clearly not per manual of style. I think the previous version which starts Big data is an all-encompassing is better although could possibly be improved by breaking down "difficult to process them using traditional data processing applications." though with my limited familiarity with the topic I wouldn't be sure how. SPACKlick (talk) 10:17, 3 February 2015 (UTC)

Thank you for your contribution. Firstly, the above overview by Narsil is a convenient version of events. Please refer to the talk page (pasted below for your convenience) for a more thorough overview and also my criticism of the changes made. In response to your comments SPACKlick, I loosely agree with your concern regarding the formatting, and in defense feel that the formatting may help the article aesthetically by allowing users to identify keywords. I however disagree with the first point, as the sentences in question happen to be a quotation from a well regarded publication about Big Data. It is my understanding that the quote evokes a particular frame of mind/ thought, which in turn allows the reader to begin grappling with the complex topic. I do not believe that we have enough evidence to completely revert the article. All we have seen is a critique of two sentences that happen to be from a publication that specialise on the subject in question.

Please see a summary of the talk page below: [quoted from User talk:Jugdev#Manual of style ]

Collapsed some material that was copied here from his user talk by User:Jugdev. Click to view. EdJohnston (talk) 04:09, 6 February 2015 (UTC)
The following discussion has been closed. Please do not modify it.

Manual of Style

Hi Jugdev. Please advise if you require assistance with the style of writing for articles in the Wikipedia. As a starting point, please familiarise yourself with this and this section. Shout if you need help. Enjoy the project. Rui ''Gabriel'' Correia (talk) 12:40, 29 January 2015 (UTC)

Hi Rui, thank you for your feedback on big data. In response, the introductory paragraph abides by wiki standards: sentence one is a fact; sentence two is a fact; sentence three is as it was before my edits. The first two sentences allow a layer of context to the third sentence, which is why I feel that the paragraph should remain as it is. I look forward to working with you. — Preceding unsigned comment added by Jugdev (talkcontribs) 04:57, 29 January 2015‎
I agree with User:Rui Gabriel Correia--the tone is not appropriate for Wikipedia, especially for the opening paragraph, which should be a terse summary, per MOS:LEAD--many people see only the lead paragraph, e.g. in search results, so it should be straightforward and stand alone. (If a person searches for "big data" and sees "Big Data is an all-encompassing term for any collection of data sets so large or complex that it becomes difficult to process using traditional data processing applications", they may have learned everything they need to know. If they see "Data has always been Big. The one aspect that differs now (if compared with the past) would be the sheer scale and accessibility of Data, which is the direct result of the super efficient speeds in which data can now be computed", they don't...) -- Narsil (talk) 19:13, 30 January 2015 (UTC)
Thanks, User:Narsil. Somehow I failed to convey that to the editor, who still feels that what he done is good and in line with the style for an ecncyclopaedia. I trust that he will have enough sense to not change it again. Rui ''Gabriel'' Correia (talk) 21:54, 30 January 2015 (UTC)
Thank you for your input. "The lead should be able to stand alone as a concise overview. It should define the topic, establish context, explain why the topic is notable, and summarize the most important points[...]" (MOS:LEAD) — Preceding unsigned comment added by ‎ Jugdev (talkcontribs) 06:13, 31 January 2015

I don't think any of us wants to get into an edit war here! ;-) I've asked for administrators to chime in: Wikipedia:Dispute_resolution_noticeboard#User_talk:Jugdev.23Manual_of_Style -- Narsil (talk) 20:09, 31 January 2015 (UTC)

I've been told that there hasn't been enough discussion to merit admin intervention yet. So... I've restored Gabriel's last version, and added my comments to the Big Data talk page (in Talk:Big data#Tone of article). Jugdev, if you want to discuss this further, please do so there I stead of here--and please don't just restore your version without discussing it!
Gabriel, if you're tired of the matter, that's cool--I can request a third opinion via WP:3O. But I'd be very glad of your further help here... ;-) Thanks! -- Narsil (talk) 21:07, 1 February 2015 (UTC)
 

You currently appear to be engaged in an edit war according to the reverts you have made on Big data. Users are expected to collaborate with others, to avoid editing disruptively, and to try to reach a consensus rather than repeatedly undoing other users' edits once it is known that there is a disagreement.

Please be particularly aware that Wikipedia's policy on edit warring states:

  1. Edit warring is disruptive regardless of how many reverts you have made.
  2. Do not edit war even if you believe you are right.

In particular, editors should be aware of the three-revert rule, which says that an editor must not perform more than three reverts on a single page within a 24-hour period. While edit warring on Wikipedia is not acceptable in any amount and can lead to a block, breaking the three-revert rule is very likely to lead to a block. If you find yourself in an editing dispute, use the article's talk page to discuss controversial changes; work towards a version that represents consensus among editors. You can post a request for help at an appropriate noticeboard or seek dispute resolution. In some cases it may be appropriate to request temporary page protection. Kuru (talk) 21:21, 1 February 2015 (UTC)

Thanks, Narsil, for your input. The user has since reverted your most recent change, despite:

  • 1a. being asked to leave as it and discuss,
  • 1b. just like before with the removal of tags on tone, despite requests to not remove until the matter had been settled.
  • 2. Recieving a warning that his

I've since had one more go at explaining to the user the issues with: 1. Intro 2. Editorialising, peacock and weasel words. Regards, Rui ''Gabriel'' Correia (talk) 11:14, 2 February 2015 (UTC)

-JG (talk) 12:47, 3 February 2015 (UTC)

User:Jugdev--I hope it's okay, I removed the copied-and-pasted version of the "Talk:Big data#Tone_of_article" section (people can scroll up to read it, and/or can follow the link I just pasted). Pasting the entire discussion was confusing the formatting of this talk page, since it included a heading and lots of paragraphs. If there are particular parts of that discussion you think are most relevant, feel free to copy those! (Though then it's usually good to mark them clearly as quotes so we can see who said what.) Narsil (talk) 20:11, 3 February 2015 (UTC)
Evidently you feel it's necessary to quote the whole discussion instead of providing a link. If you really really think so... I've at least set it off in a blockquote so people can see what's new and what's quoted. Narsil (talk) 19:13, 4 February 2015 (UTC)
To offer an opinion/response: I think it is quite appropriate for editors to offer a "critique of two sentences", since the very issue we are discussing is whether those two sentences are a good lead for the article! It may well be true that those two sentences come from a publication on the topic. For all I know, it's a very good publication indeed (I have to take your word for it since you don't give us a link). But even if it is, that publication probably has a whole lot of sentences that may be true and helpful but are not a good lead for the WP page.
Per WP:LEADSENTENCE, "If possible, the page title should be the subject of the first sentence." (If you look at other WP pages, the vast majority begin with something like "A Foo is a..." or "Bar were a...") I see no reason why we should defer defining "big data" until the third sentence. When people land on this page, the first thing we should tell them is what "big data" means. ...so, if it needs saying, I'm voting for Original version. -- Narsil (talk) 20:21, 3 February 2015 (UTC)
And by way of examples--I just hit "random article" 5 times, and got articles with these lead sentences:
(Sandcastle Waterpark) "Sandcastle is a water park located in the Pittsburgh suburb of West Homestead." — (Hypersthene) "Hypersthene is a common rock-forming inosilicate mineral belonging to the group of orthorhombic pyroxenes." — (Psychostick discography) "The following is the complete discography of official releases by Psychostick." — (Rainbow Gladiator) "Rainbow Gladiator is an album by the American jazz violinist Billy Bang recorded in 1981 and released on the Italian Soul Note label." — (Khanlar Safaraliyev) "Khanlar Safaraliyev was an Azerbaijani oil field worker, labor organizer, and Moslem social democrat."
So four out of five random examples begin with a sentence that uses the article subject as the sentence subject. It's not an ironclad rule--one of the five was an exception--but there needs to be a good reason for it (in this case, the exception is a page that contains a list, so it doesn't need a definition). Narsil (talk) 20:35, 3 February 2015 (UTC)

I've added a "tone" tag to the page to direct visitors to this discussion. User:Jugdev, please do not remove the tag. The tag is there to indicate that editors disagree about whether the tone is appropriate, and this disagreement clearly exists. Don't remove the tag just because you think the tone is good--we know that! honestly!--the tag is so other editors will come here and give their opinions. If they agree with you about your edits, then they'll say we should remove the tag, and this should get wrapped up sooner. But if you remove the tag yourself, this could be considered disruptive (per WP:DISRUPT) or even edit-warring. Narsil (talk) 03:15, 4 February 2015 (UTC)

I disagree with the tone tag - just to repeat myself : the paragraph in question is a published quotation from a highly regarded title from the field of big data. -JG (talk) 08:57, 4 February 2015 (UTC)

So what part don't you get? Wikipedia tone is wikipedia tone. What your "highly regarded title from the field of big data" does is its own business. If it is such a "highly regarded title", I guess it has a style guide. And guess what - that style guide is for their publications; we have ours, as do all other big and serious publiations each have their own. Rui ''Gabriel'' Correia (talk) 14:46, 4 February 2015 (UTC)
I believe my question is clear. Please let me know if you need me to rephrase in a more digestible manner. the title has been referenced... which publication do you write on behalf? -JG (talk) 15:00, 4 February 2015 (UTC)
The issue is, the opening paragraph is not in keeping with wikipedia tone. It may be from an extremely reputable source, but that doesn't mean it's written in WP style. The Iliad and Finnegan's Wake are both highly respected books, but that doesn't mean they're written in the right style for Wikipedia articles.
Since you aren't offering any response on the issue we're discussing--whether the opening paragraph is in Wikipedia style--I'm going to revert it again. Please do not re-revert it until after you've offered a response on that issue here. Right now every WP editor who's commented has agreed that your edits are not appropriate for Wikipedia--saying the quote comes from a very good book doesn't change that.
(To answer your question to Gabriel, "which publication do you write on behalf?"--Gabriel writes on behalf of Wikipedia. So do I. So do you, while you're writing here. So if you're writing here, write in WP style!) -- Narsil (talk) 19:13, 4 February 2015 (UTC)
apologies for the delay. In response, I have quoted the wikipedia style guide above, which suggests that the opening sentence is within the requirements. Slightly confused why this has been reverted again... -JG (talk) 09:19, 5 February 2015 (UTC)
You have not quoted the style guide in any way that is relevant to the conversation. The part you keep quoting says "The lead should be able to stand alone as a concise overview. It should define the topic, establish context, explain why the topic is notable, and summarize the most important points". But you have not responded to the issues we're discussing. For example: (WP:LEAD) "If possible, the page title should be the subject of the first sentence"; (WP:TONE) "English language should be used in a businesslike manner". ...User:Rui Gabriel Correia, User:Bluerasberry, User:SPACKlick--would one of you be willing to restore the original version? I frankly feel that User:Jugdev is engaged in disruptive editing but if I'm the only one reverting his changes we don't have a very clear case... Narsil (talk) 19:02, 5 February 2015 (UTC)
  • Comment I find this RfC to be malformed. In my opinion, a better way to do it is to propose a specific change to the lead of the article. After that, ask whether that change should be enacted. Obviously the proposed change is controversial. It should not be live without consensus as there is opposition. It would be best to find if some parts are problematic and other parts acceptable, or to otherwise find the nature of the dispute. In any case - the usual way to manage this would be to revert to the previous version, then only update it if there is consensus on the talk page. That is not special advice for this case, but in my opinion, the usual workflow of Wikipedia. Blue Rasberry (talk) 19:32, 5 February 2015 (UTC)
    • Thanks much! I'm not sure how to play that, though. I created this RfC because User:Jugdev had already changed this page to the current one ("Data has always been big"). This change is certainly controversial--I'm not finding any other editors who agree with it--but if people try to revert it, he just reverts it back. Admins told me there had not been enough discussion to justify admin intervention, so that's why I created this RfC (to try to get more editors involved). I would love to go with your approach--switch to the old lead ("Big Data is an all-encompassing term for any...") and keep it that way until there's consensus for a change. But as we have seen, JG will simply revert to his version, and say "my version is in keeping with WP style". So what's our next play? Narsil (talk) 19:47, 5 February 2015 (UTC)
Narsil This is not a complicated issue yet. The article had been stable for years with small changes till about a week ago. At this time, a user made changes. Multiple people immediately called for more discussion.
Per WP:BRD, the user who made the change was WP:BOLD, then someone WP:REVERTed the change, and now it is time to discuss it here. If the result of the discussion is that the change should stand, then it does. If there is no consensus to keep the change, then it is not kept. The fact that this RfC came shortly after the change is not relevant to the BRD process which is typical on Wikipedia. Blue Rasberry (talk) 19:54, 5 February 2015 (UTC)
  • Obvious oppose to opening with "Data has always been Big." Narsil and Rui have covered the high points. This RfC is very unclear, amongst a host of other problems (e.g. why is there a cut & pasted chunk of Jugdev's talk page here, warning template and all?). Jugdev aka JG, you say, "the paragraph in question is a published quotation from a highly regarded title from the field of big data". If you are talking about the lead paragraph -- or any paragraph without a direct quotation -- then it must be removed per WP:COPYVIO. I suggest an immediate close of this RfC per WP:SNOW to avoid wasting the time of anyone drawn into this. Manul ~ talk 00:41, 6 February 2015 (UTC)
Just to clarify, I opened the RfC to get the page changed back from "Data has always been Big" to the dryer version. ;-) At the time, there'd only been two editors involved besides Jugdev, and one of those two had said he was giving up--the RfC seemed like the best way to get other editors involved. Apologies if it was the wrong approach! (As for why we quoted the entire discussion from the other talk page--that's because JG pasted it here and insists on keeping it, and I only wanted to fight one battle at a time...) Narsil (talk) 02:23, 6 February 2015 (UTC)
My comment sounds harsher than intended. I appreciate that you were looking to obtain outside input in order to avoid warring with Jugdev. When an editor's preferred change is almost nonsensical ("Data has always been Big"), and everyone else opposes the change, and the editor doesn't back down, it's like a Randy from Boise situation. What to do? Perhaps the original research noticeboard would be the closest fit -- after all, if "Data has always been Big" isn't nonsense then it is WP:OR. But I would say that if an editor doesn't agree to stop inserting stuff that's universally recognized as weird, it becomes more of a conduct issue. Jugdev, would you please agree to drop this? "Data has always been Big" will never gain consensus, so there's no need for this RfC. Manul ~ talk 03:46, 6 February 2015 (UTC)
Although I still disagree, I will not revert again. I will find a better quote as the present one does not work - I look forward to working with you all soon. -JG (talk) 09:42, 6 February 2015 (UTC)

Jugdev, writing for the Wikipedia is not about collating quotes. It is about making sense of information found in reliable sources, conveying the information found in the sources in your own words in an encyclopaedic style and citing the sources consulted. If you are going to use quotations, this must be done sparingly, where applicable and justifiable, but not as the opening of a lede or article. Regards, Rui ''Gabriel'' Correia (talk) 10:58, 6 February 2015 (UTC)

Rui, I've been told the nothing encapsulates the essence of a debated topic the way a published quotation does. I will find one that's fit for our purpose. -JG (talk) 11:14, 6 February 2015 (UTC)

Thanks, fixed. And you used a wrong word. As for quotation, I am certain you will do as you please - as always. And cheers, this is the very last time you hear from me. Rui ''Gabriel'' Correia (talk) 11:26, 6 February 2015 (UTC)
thanks - I look forward to working with you. — Preceding unsigned comment added by Jugdev (talkcontribs) 03:30, 6 February 2015
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Lede Sentence 3

The trend to larger data sets equates to additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, prevent diseases, combat crime and so on." This is badly written I tried to correct it but I don't understand what it's trying to say enough to repair the English. SPACKlick (talk) 17:05, 10 February 2015 (UTC)

As I read that sentence, you could rephrase it as "When a large amount of related data is in a single data set, it is possible to derive information from it that could not be derived from an equivalent amount of data in smaller data sets. This allows correlations to be found..." But that sentence isn't supported by the citation (in The Economist: Data, data everywhere), which just talks about the total amount of data, and not whether that data is in one data set or many. OTOH, as I read it, the original sentence has the same problem (that it's drawing a conclusion not supported by the source). So I'd just cut out the whole bit about numbers of data sets (one big vs many small), and change it to The trend to larger data sets allows new correlations to be found to "spot business trends, prevent diseases, combat crime and so on." Narsil (talk) 19:23, 10 February 2015 (UTC)
Quite right; it used far too many words to say too little of value, and perhaps to suggest something that isn't even true. I further adjusted your words to "Analysis of these larger data sets can find new correlations, to 'spot . . .'" partly because of my personal dislike of passive voice. Feel free, as usual, to point out where I may have gone wrong. Jim.henderson (talk) 13:46, 12 February 2015 (UTC)

CERN

Previously it was described as they filtered 99.999% of data. Upon reading further the following Thesis, it looks like they filter more than that. I've updated things accordingly and thrown in what was surely a clumsy citation. Feel free to clean it up, and then delete this talk entry.

https://cds.cern.ch/record/1504817/files/CERN-THESIS-2013-004.pdf

L1 filtering 40Mhz to ~60-65Ghz (so ~.015% data retained). L2 filtering 65Khz to 6Khz so (10% of data retained) L3 filtering 5-6Khz to 500-600hz so (10% of data retained). So 99.99995 % of data was filtered. — Preceding unsigned comment added by 98.200.115.85 (talk) 13:34, 24 March 2015 (UTC)

What happened to the 3V's?

The current article no longer mentions volume, velocity, and variety as ways of characterizing Big Data. Why? I though the combination was a good way to describe important aspects of Big Data. 108.212.231.175 (talk) 15:45, 22 March 2014 (UTC)Mark Kerstetter

Someone must have put it back in, along with a couple other Vs. But I have a question about the para in the "Characteristics" sxn that refers to Variety:

Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data.

I think this is supposed to be a definition, but all it really says is that variety is important, not what it means. And it says that in several uninteresting ways ("very essential", "needs to be known", "helps...effectively use", "to their advantage", "upholding the importance"). Sounds like a college essay where the student is using a bunch of verbiage to disguise the fact that he doesn't know the answer.
I can imagine several meanings for variety: variety of data type (string/ int/ real...), variety of data structures (objects in the object oriented programming sense, maybe), variety in how much of a given data structure is included for a given entity (like bibliographic information might have date of birth for Einstein, but not for Socrates), variety in the ways data structures are encoded (XML vs. plain text vs. tables), etc. So is the Variety in Big Data one of these, or all, or something else? Mcswell (talk) 20:58, 15 February 2015 (UTC)
Replying to myself: there are definitions of the Vs (and the C) here [[1]] which look to me to be much better than the existing definitions. But I'm not an expert on this stuff, so I hesitate to paraphrase it, and I'm unclear on the wikipedia requirements for quoting something verbatim. Mcswell (talk) 21:09, 15 February 2015 (UTC)
The 3V's defined by Gartner are volume, velocity and variety. Variability is a specific issue and it doesn't characterise size of Big Data. Neither does veracity; sure veracity is an issue if you gather your data from social networks but it's not an issue if you use a huge (IoT-like) sensor network, it's too specific and doesn't relate to size. Complexity is derived from variety. 3V's are supposed be a measure Big Data in general, not specific problems one could face while handling Big Data. — Preceding unsigned comment added by 109.60.82.216 (talk) 14:43, 16 May 2015 (UTC)

Random unrelated comment

The link to Reference 4 is dead. — Preceding unsigned comment added by 97.119.162.121 (talk) 23:55, 23 December 2015 (UTC)

Notability of cartoon

 
Cartoon critical of big data application, by T. Gregorius

Funny though it is, what is the ultimate source of the cartoon? Is it just a WP editor? WP:NOTBLOG... 121.103.176.27 (talk) 08:05, 13 August 2014 (UTC)

Hi. As you can see on the file page this is a cartoon by T. Gregorius. I'm not aware that he is a Wikipedia editor (and I rather doubt that). IMHO, the cartoon is a good and very compact visualisation of the criticism aimed at Big Data. Well, of part of the criticism, of course. I did give this some thought before putting it in the article, whose criticism section is rather abstract. It's really hard to give a meaningful and comprehensible illustration of the Big Data paradigm. I think it's a perfect fit. We could add "Cartoon critical of big data application, by T. Gregorius" if you think that this would make it clearer that this is not "Wikipedia's" commentary. BTW, Wikipedia contains a lot of schematic illustrations (and these often have to leave something out, some complex aspects) - are they notable? (And are their authors notable? Wikipedians?) It comes down to editorial decisions. Are those illustrations appropriate, unduly biased, educational, help explain the subject etc. I think this cartoon is a perfect fit in the criticism section, it really makes the text more comprehensible. --Atlasowa (talk) 10:31, 13 August 2014 (UTC)
Thanks for the clarification, although I still have some doubts on Thierry Gregorius's notability, and of this work in particular, if it is just self-published. Anyway, I'm fine with your amended caption. Thanks. 121.103.176.27 (talk) 12:11, 13 August 2014 (UTC)

And then a year later the illustration was removed by User:McGeddon with an unfounded allegation in the edit summary ("cut a Wikipedia editor's joke cartoon") and without any discussion here. An editor that only contributes deletions to this article. Great. :-( --Atlasowa (talk) 06:55, 3 March 2016 (UTC)

My mistake - it was actually a Flickr user's joke cartoon? If we wouldn't write "Flickr user Thierry Gregorius humorously speculated as to whether one day passport control might include checks of a visitor's Amazon and Twitter history" in the article, we shouldn't put it in an image either. If there was actually some sourced content in the article about passport control, it would seem more appropriate to illustrate that with a WP:FORMAL photo of a passport or check-in desk than with a jokey cartoon.
(And yes, the majority of my few edits to this article have been to revert vandalism.) --McGeddon (talk) 09:47, 3 March 2016 (UTC)

Further reading & spam assessment

Please, let's go through Further reading section source by source here & not ax the entire section. I am an IT professional & I do not agree with HelpUsStopSpam that all the sources are spam. Please discuss each source first before removing it. Peaceray (talk) 19:55, 16 September 2016 (UTC)

So, which ones do you consider to be fundamental work on big data? "New Horizons for a Data-Driven Economy" has 0 citations. Clearly non-notable work yet. "A BRIEF REVIEW ON LEADING BIG DATA MODELS" has just 27 citations since 2014, another spam from a third-tier journal. "Technical Report CLOUDS-TR-2013-1" is a tech report, not even peer reviewed; so no fundamental work either. "Encrypted search & cluster formation in Big Data", as the name implies, is on a very specific subtopic only. "Big data for good" - 3 citations. The G&E whitepaper: not peer reviewed, very few citations. "Product Lifecycle Management: Vol 2. The Devil is in the Details." it's the appendix that is cited, and this book has 0 citations. and so on.
Why don't you name which of the sources you consider notable, and why? We can always add them back. But currently, the list consists of entries collected because of spam, not because someone decided to survey literature and identify the most fundamental work. HelpUsStopSpam (talk) 19:57, 17 September 2016 (UTC)
For the only somewhat well cited source, you can here see how they added this with very much advertising text: [2] at a time when it did not have many citations yet. Surprise: the IP is at the same university as the authors... so yes, this qualifies as WP:CITESPAM, and I have thus also removed this now. If you want some more notable broad source, consider ACM XRDS magazine, http://dl.acm.org/citation.cfm?id=2331042 HelpUsStopSpam (talk) 20:21, 17 September 2016 (UTC)
  • "A BRIEF REVIEW ON LEADING BIG DATA MODELS" has 27 citations since 2014, & that's a lot for the Computer Science field, especially for only 2 years, so reinstating that.
  • Big Data computing and clouds: Trends and future directions has 126 citations, I'll take that as notable. Reinstating that.
  • Comment for removal of Product Lifecycle Management was that it was uncited. However, scholar.google.com lists 705 citations, so reinstating that & updating citation information
  • The comment for removing "Big Data and the History of Information Storage" was "Spam link". While the link was to a page on Winshuttle's site, an SAP vendor, I don't accept that the page linked to was trying to sell something or otherwise qualified as spam as defined under WP:SPAM. Nevertheless, the information on the site was taken from an Forbes magazine article, "A Very Short History Of Big Data", so I will use the original source instead. Cited by 29 according to scholar.google.com, & one should not expect to see any more for that since we are talking Computer Science history.
Peaceray (talk) 00:21, 18 September 2016 (UTC)
P.S. I think it would be more constructive at this point to find replacements if you have better sources. Summary executions Wholesale removal of Further reading citations would be disruptive at this point. While I am generally supportive of being bold in removing spam links, I believe further deletion of citations/external links from this section would no longer address that goal. Please get consensus here on this talk page before removing any citations from this section. Peaceray (talk) 00:46, 18 September 2016 (UTC)
@Peaceray: 27 citations is meagre for a broad topic such as big data, not "a fair amount". And J-stage is not the best publisher: [3]. Big data is not something very special. If you take e.g. this book ISBN 0544002695 (I have no idea if it's good) from 2013 titled "Big data: A revolution that will transform how we live, work, and think", it has 1886 citations on Google scholar. Silverman "Qualitative research", which probably is already too specific, has 2957 citations. Danah Boyds "Critical questions for big data" has 916 citations... Jure Leskovecs "Mining of massive datasets" has 1043 citations. And you call 27 citations good?!? You must be kidding. "Product Lifecycle Management", only the main book has citations. Not the appendix of vol 2, which is a separate book (appeared a year later). No, none of these are notable, and not on the main subject Product Lifecycle Management != Big Data. They are not "further reading", but "other crap that happens to mention big data somewhere". Let me spell out the requirements of Wikipedia:Further reading to you: Topical (not just related), Reliable (which would probably be 1000+ citations here, and rule out techreports and low-quality journals), Balanced (not just specific details) and Limited (no, we do not need to include everything; but we should include the most on-topic relevant literature only, i.e. the highly relevant textbooks). The section *never* was a good selection. It was spam right from the beginning, nobody chose these works because they were on-topics and good... thus, boldy removing them is the proper way, rather than keeping this useless random selection. HelpUsStopSpam (talk) 19:58, 18 September 2016 (UTC)
  • A couple things about the "A BRIEF REVIEW ON LEADING BIG DATA MODELS":
    • You are confusing the validity of the journal source, Data Science Journal, with the site that is hosting the paper, Japanese Science and Technology Information Aggregator, Electronic. Regarding that, I place much more significance to the Scimago Journal & Country Rank for Data Science Journal than the individual's blog entry that you posted in proof that J-Stage is not the best publisher. Indeed, the blog post was not about J-Stage, but a journal aggregated by J-Stage, the Journal of Physical Therapy Science. So this is totally irrelevant to our discussion here. However, if I can find a more direct link to the article I will use it, but right now http://datascience.codata.org/ seems broken. Done
    • Big data is still in many ways an emergent field & while there is much that is written about aspects of this field, there are aspects that are not as well covered. My search for scholarly research for "big data" & "data modeling" together yields sources with a similar number of citations. Is data modeling crucial to big data? I think so, but with an MS concentrating on data modeling & systems analysis & over 2½ decades in IT, what do I know? Perhaps I am biased ...
  • As someone who has 14,000+ plus edits on en.wikipedia, I have read Wikipedia:Further reading several times, although I can always use a refresher. As stated on the page "This page is an essay, containing the advice or opinions of one or more Wikipedia contributors. Essays are not Wikipedia policies or guidelines. Some essays represent widespread norms; others only represent minority viewpoints." That said, there is no numerical criteria in that essay for how many citations makes a source reliable. Indicating that 1,000 citations makes something notable is essentially an opinion about an opinion essay. As a former part-time university reference librarian (9 years). I know that in some more obscure areas, as little as a dozen citations can be considered a lot.
  • What really is at debate here is our approach. I categorize myself as an inclusionist whereas I believe you to be an exclusionist. There is a place in Wikipedia & other Wikimedia projects for both, & there is a need for balance & cooperation.
  • As I stated before, if you can find comparable but better sources to replace those currently in Further reading, please go ahead.
Peaceray (talk) 23:40, 18 September 2016 (UTC)
I am an inclusionist as far as notability is given, and the sources are on-topic. Also, for an inclusionists perspective, nothing is actually lost; the further reading links are not part of the actual textual content... For many of these sources, they are neither. The J-Stage journal is really low quality and low impact. The very link you gave, Scimago Journal & Country Rank for Data Science Journal usually has it in the worst quarter. The h index of 11 is that of a single low-reputation author. That tech report is not peer reviewed at all. By keeping off-topic links such as "Product Lifecycle Management" in the further reading, you encourage spam, instead of textual contributions. I can live with the Forbes History of Big Data. Forbes is reputable enough, and this one is also on-topic. I have offered you some high reputation and likely on topic sources, why don't you consider them for "further reading" instead of that spam? HelpUsStopSpam (talk) 18:44, 26 September 2016 (UTC)
"A Brief Review on Leading Big Data Models" was spammed on April 30, 2015 [4] by the first author. It is pretty much the definition of WP:CITESPAM. He tried before, [5] in January, twice [6] but back then someone noticed. It is spam, it must be removed. HelpUsStopSpam (talk) 18:51, 26 September 2016 (UTC)
Similar, have a look of the edits that led to the "Product Lifecycle Management" spam: Special:Contributions/LGB2015. This is spam. HelpUsStopSpam (talk) 19:09, 26 September 2016 (UTC)
@HelpUsStopSpam: First, thank you for your recent edits to the Further reading section. I welcome replacing the sources there with those of higher quality & number of citations by others. I believe that article is much improved due to the addition of quality sources rather than merely excising the section. Second, with regard to the H-index, you express an opinion but do not cite any sources. I want to know where to find the criteria is for judging what quality an H-index is, as per the discussion at Wikipedia talk:Notability (academics)#Further on h-index... Since the scale & relevancy of an H-index can vary widely from one field of study to another, I am searching for an authority on this. Peaceray (talk) 21:49, 26 September 2016 (UTC)
@Peaceray: a highly respected journal for big data is this: Scimago Journal & Country Rank for VLDB Journal (VLDB = very large data bases). It has h-index 63. So within the domain of big data, 63 is big. And aforementioned Jure Leskovec has an h-index of 67 on Google Scholar. danah boyd, who I also mentioned, had 46. Jeffrey Ullman has 106. John Langford, who wrote one of the articles in the XRDS magazine, has 49. Yes, h-indexes need to be taken relative to other authors in the same field. Computer science has very high h-indexes. HelpUsStopSpam (talk) 09:44, 27 September 2016 (UTC)

Capitalization

It appears to me that all of the three variants "Big Data", "Big data" and "big data" are used throughout the text. This should be cleaned up.

Grstein (talk) 07:08, 16 April 2015 (UTC)

MOS:CAPS seems to be telling me, no proper names, hence no caps here. Except of course at head of sentence, in name of book, and other usual capitals. Jim.henderson (talk) 03:46, 18 April 2015 (UTC)
Yes, agree. It seems more and more excessive capital letters are creeping back in. I think this might be from marketing sources that do not use English but marketing-speak, where capital letters are use to promote terms by making them Sound More Grand. In particular, every term for which an acronym is coined, does not get capital letters unless it too is a proper noun. W Nowicki (talk) 21:59, 28 September 2016 (UTC)

Definition of Big Biomedical and Health Data

Suggested for inclusion at the bottom of the Big Data Definition the six-characteristics of Big Biomedical and Health Data:

In biomedical and health sciences, the 3Vs qualitative euphemism describing "Big Data" has been extended to a constructive definition including characteristics like large size, diverse sources, multiple scales, incongruences, incompleteness, and complexity [1]. This description illustrates explicitly the types of methods, techniques, tools and services that are necessary to tackle intricate big data and predictive analytics challenges.
Wikipedia is NOT for advertising your own research. This definition lacks wide enough acceptance to be noteworthy for Wikipedia yet. Maybe in two years, when it has received a lot of citations? HelpUsStopSpam (talk) 17:26, 25 November 2016 (UTC)

References

Critical data studies

I found the Critical data studies article while doing new-page patrolling, and it seems to me that it is a rather narrow viewpoint, and should either be expanded with content from the Critique section of this article, or should even be merged into that section. I don't know much about the scholarship in this field, however, so I'm not confident to do this myself. --Slashme (talk) 08:59, 21 December 2016 (UTC)

Danah Boyd (photography) really needed on the article

Does really Big data article needs a photo of one of thousands of researches such as ... Danah Boyd ??? April 2017. — Preceding unsigned comment added by 223.197.149.174 (talk) 12:22, 13 April 2017 (UTC)

More history on the origins of the "Big Data" phrase ~1993, certainly in public use by 1994

This may or may not be usable, but it is the real history, put together from old slides after Steve Lohr wrote in NY Times and meetup groups wanted talks. See Big Data - Yesterday Today and Tomorrow = Slides or Big Data - Yesterday, Today and Tomorrow -video at Stanford Slide 16 shows the use in "Hardware, Wetware, Software", the general-purpose talk I used 1994-1996 (and maybe a bit during 1993), which was captured on video by University Video Communications in 1996. It was the opening keynote for TRI-Ada conference, November 1995.

For a few years, most of the "Big Data" use was in my talks. By 1996, it was part of external marketing. Slide 15 shows "Big Data" as part of SGI booth at SC'96, supercomputing conference in Pittsburgh Slides 23-25 show sample slides from 1997, "Big Advantages from Big Tools for Big Data". JohnMashey (talk) 06:20, 19 May 2017 (UTC)

Big Data vs Data Lake

There seems to be a lot of that same information used within Big Data and Data lake. Are these topics the same ? And are any of these different than a data warehouse ? -jim 07:45, 5 July 2017 (UTC) — Preceding unsigned comment added by Jwilleke (talkcontribs)

Everything "big data" these days is nothing but "big hot air" anymore because of business buzzword bingo... "Data lake" = "Data warehouse without structure". And most companies that buzzword "big data" don't know what to do with the data - except storing it. Originally, "big data" also meant being able to analyze all this in a meaningful way, not just store it... HelpUsStopSpam (talk) 16:10, 5 July 2017 (UTC)

Heading organization

The "information technology" section probably should be reorganized. Retail and real estate are not subdivisions of information technology.

There probably should be a "finance" section. See alternative data and surveillance capitalism, two not very good articles which relate to that. Crunching on lots of data for finance purposes is a very real activity. Is it covered elsewhere on Wikipedia? Technical analysis covers crunching on pure financial data, but lately there's a trend towards looking at miscellaneous data outside the financial markets for financial purposes. John Nagle (talk) 20:00, 11 August 2017 (UTC)

Added a brief finance section, mostly wikilinks to other articles. John Nagle (talk) 20:05, 11 August 2017 (UTC)

Government

The texts on U.S., India, and UK were copied from https://www.ijedr.org/papers/IJEDR1504022.pdf. Copyright issue?--K3vinvmp (talk) 19:49, 12 September 2016 (UTC)

The material in the article appears to pre-date that publication; at least the US part. Kuru (talk) 22:09, 12 September 2016 (UTC)
Some journals like these do not care about copyright violations, and surprisingly many authors (in particular from india and china) copy from Wikipedia and get away with it. When seeing such an overlap, always check the Wikipedia version at that time - it most likely already contained the text before that paper was published... HelpUsStopSpam (talk) 20:03, 17 September 2016 (UTC)

[1]

  Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format. SparklingPessimist Scream at me! 03:12, 9 September 2017 (UTC)

References

  1. ^ M. Maciejewski, To do more, better, faster and more cheaply: using big data in public administration,International Review of Administrative Sciences, Vol 83, Issue 1_suppl, pp. 120 - 135 doi: 10.1177/0020852316640058 http://journals.sagepub.com/doi/pdf/10.1177/0020852316640058

Requested move 29 January 2019

The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this section.

The result of the move request was: not moved (closed by non-admin page mover) SITH (talk) 14:28, 5 February 2019 (UTC)



Big dataBig Data – It is a name, not just a normal sentence. I can't move the page myself, so this is more of a move request rather than a discussion. Bageense (talk) 13:36, 29 January 2019 (UTC)


The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.

History section

Eventually this article will need a history section. I am not sure what that I will look like. Here is one treatment of the subject -

Blue Rasberry (talk) 15:30, 5 August 2019 (UTC)

Reassessing article

I don't know if this article was ever good enough to deserve a ranking of B, but it seems to have deteriorated into a lot jargon that is not well cited. If there are no objections in the next little while, I'm downgrading it to class C.Ethanpet113 (talk) 22:48, 28 August 2020 (UTC)

@Ethanpet113: seems reasonable, I did it for you. I also cleaned this talk page. Blue Rasberry (talk) 22:50, 28 August 2020 (UTC)

Suggestion: Separating the topics Big Data and Big Data Analytics

This topic describes Big Data as a field of science. Big Data is an object, the field is Big Data Analysis or Big Data Analytics. Big Data Analytics links to this page but maybe if they were two separate pages it would be easier to improve both pages.

softwaretestwriter (talk) 00:27, 20 January 2021 (UTC) NmuoMmiri

There are old discussions at Talk:Big data analytics (permalink) that may be of relevance to this suggestion. Primefac (talk) 12:45, 9 February 2021 (UTC)

Proposed merge of Veracity (data) into Big data#Characteristics

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
The result is no consensus per WP:Consensus. Discussion closed. --Whiteguru (talk) 08:15, 10 December 2021 (UTC)

No need for a separate article - add to existing section PamD 08:34, 2 November 2021 (UTC)

Veracity is also valid outside the bigdata context. See the recent reference of ISWC 2021 that was my motivation for creating the article. I didn't even know that the term also exists in the context of big data. I also think it is to early to say whether merging makes sense - it IMHO depends on the growth of the article. --WolfgangFahl (talk) 11:08, 2 November 2021 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.