Talk:Data
This is the talk page for discussing improvements to the Data article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This level-5 vital article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later. |
Incorrect and Circular Definitions!
editWikiProject Computing
editThis article deals with WikiProject Computing and there is an article for Data (Computing). Why we have that inconsistency?
Data Definition: Circular
editData is defined as: "is a set of values of qualitative or quantitative variables. Up to here, it is ok, but constrained to the Computing world. Should we constrain the definition to computing only? (This seems a good decision indeed though).
"restated, pieces of data are individual pieces of information". The definition of Data includes, naively, the term "pieces of data".
Indeed, I just came here to post a similar comment. So I agree that the definition is naive and circular. martyn.jones@cambriano.es
Data Information Distinction as Definition?
editAnd finally infers: "pieces of data are pieces of information". Data=Information. I am the only one reading that?
Information Article
editInformation is defined as: "Information (shortened as info or info.) is that which informs," Information is that which informs...: Tautologic Expression.
"i.e. an answer to a question, as well as that from which knowledge and data can be derived (as data represents values attributed to parameters,".
Information defined as Data!
editNote also that Information is "that from which knowledge and data can be derived". Circular Definition toward Data.
"and knowledge signifies understanding of real things or abstract concepts" We ALL know that data is not information, and that is not knowledge, but including as the keystone for the definition?? I simply cannot agree. It is like to say: "Data is a piece of information, and no, beware that is not information." and "Information is which informs, and no, beware that is not knowledge"
Please guys, anybody supports or agree with this?? — Preceding unsigned comment added by Hypfco (talk • contribs) 20:19, 31 May 2015 (UTC)
Data Definition vs Information
editAs above author(s) noted, the current definition given for data: (a) says that data is information; and (b) implies that the word is used in computing only (particularly in the 2nd paragraph).
Please consider the following:
- The retina captures data from photons of light that enter the eye. The optic nerve and brain organise that data into visual information. Thus, information is generated by organising data into a format that is capable of eliciting meaning – information comes from data, not the other way around. As the 2nd sentence in the body of the article specifies: "Data is collected and analyzed to create information suitable for making decisions..."
- Using "variables" in the definition suggests that static values are not data. If that interpretation is not the intended meaning, the I propose that using "variables" to identify fixed values is misleading: values that can vary are variable; but values that are fixed are not variable.
- The "restated..." phrase at the end of the first sentence is both confusing and misleading, and so the brief definition would be more effective if the sentence ended before that "restated..." phrase.
- Including "values" in the definition implies that attributes generate data only upon being captured. If this is the case, then the detailed definition should clarify it.
I propose the following definition:
- Data is any set of quantitative and/or qualitative values. An information system either captures data by measuring the attributes of a phenomenon, or it is provided from another information system – be it mechanical (eg. a computer), or biological (eg. a human).
Rwilkin (talk) 02:24, 12 August 2015 (UTC) Rwilkin (talk) 03:36, 12 August 2015 (UTC) Rwilkin (talk) 05:48, 12 August 2015 (UTC)
Weasel Words
editIt looks like there are some disagreements on the usage of data as a mass noun. It is excellent that we have many citations on it's usage, but instead of stating statistics from the sources, we are having minor edit wars on words such as "many"/"most", "often/usually", etc. Perhaps we should instead cite the statistics from the source and leave speculation to the reader.Gsonnenf (talk) 11:36, 2 April 2009 (UTC)
I'm going to try clearing out the weasel words from the Usage in English section and see what is allowed to stick. Citation needed tag added because it's so specific, not that it's disputed.Acronymsical (talk) 16:59, 28 February 2011 (UTC)
Old discussion(s)
editIt look difficult for me to understand A datum is a statement accepted at face value..
What do think about definition and explanations like this:
Data is ~evidence (or some another term) on the input of information system. Data is subject of data processing by information system. Data could contain usefull information and could not.
I think, it is good, when a definition uses other wikipedia terms. Not just plain English. Kenny sh 08:30, 10 May 2004 (UTC)
- Also I think the current definition is wrong. A datum is a datum regardless of whether or not it is accepted. --(talk to)BozMo 10:36, 23 May 2004 (UTC)
- Hello. There are a couple of serious problems with the above definition. The main problem is that it says data has to do with "information systems", "data processing", and "information". Either it's assumed these terms have to do with computers, in which case this definition is much too narrow, or not, in which case it's needlessly vague. A secondary problem is that the definition can't be understood without looking up some other terms. The existing definition, which uses only ordinary English words, is terse, comprehensible, and yet quite general. The proposed new definition does not have these merits. Regards, Wile E. Heresiarch 14:06, 10 May 2004 (UTC)
A separate page for datum is needed. In geology/cartography/geography and surveying a datum is a reference surface. For instance, sea-level is often used as a datum below which depths (or above which heights) are measured.
Hello COMPATT, to address your comments about the distinction between data and information -- I agree that programs are a form of data, but I think it's important to keep in mind that the word "data" has a history of usage that goes back much farther than computer science. The distinction between data and information, which is made in the article, is that information is derived from an interpretation of data. Some data don't have any obvious interpretation, and so we might noodle over ancient inscriptions for a long time, but some other data have such an immediate interpretation, especially in a given cultural context, that the interpretation is held to be the same as the data -- for example if I look at a photograph, I might immediately see "a dog" instead of "a pattern of silver particles which suggests a dog". I think the interpretation aspect, and its dependence on context, might be emphasized in the article. Well, I've rambled on long enough! Have a great day, Wile E. Heresiarch 14:33, 18 Mar 2004 (UTC)
Hello, as a comment on the edit that I just made. I put a new, short intro paragraph at the beginning, to hopefully get straight to the point. (The article was noodling around in etymology a little too much before getting to the punch line. Hopefully that's corrected now.) As the term "data" is rather general, I've attempted to give a general definition, and then immediately describe one of the most-used types of data (measurements & observations). I'm hoping that there is a right level of generality now. Happy editing, Wile E. Heresiarch 15:44, 19 Mar 2004 (UTC)
Usage in English
editThere's another meaning of the singular datum. In the US Navy, the term is applied to the last known position of a submarine whose precise location is no longer known. I don't think I ever heard it used in the plural; there just aren't that many submarines and there's a great deal of seawater under which to spread them. Dick Kimball (talk) 18:20, 2 April 2008 (UTC)
Hi,
- I inserted most general and shortest functional definition of data (see function definition)
- about
Referring to the sentence "this is all the data from the experiment", the assertion that "this usage is inconsistent with the rules of Latin grammar and traditional English" seems odd. If the word data is being treated as a mass noun, then surely the sentence is consistent with "traditional English".
Meaning of data and information
editI changed it. In my opinion: - too much information noise (uncertainty of the author (?)) in this paragraph.
- As it is, the phone number is not actionable - you know it is a phone number, but it is of no use. This information becomes knowledge when you can act on this information, either to solve a problem (for example, to call Helen, whose phone number it is), or to gain insight into an issue (e.g. by noting that other phone numbers have the same exchange). People or computers can find patterns in and between data to perceive relationships between information, creating or enhancing knowledge. Since knowledge is prerequisite to wisdom, we always want more data and information. But, as modern societies verge on information overload, we especially need better ways to find patterns.
This in not about data, it is not necessary digresion – I removed.
See also: http://en.wikipedia.org/wiki/Talk:Knowledge about DIKW.
I do not find (on the Web) any articles which confirm the interpretation of the DIKW model which were suggested.
--Adam M. Gadomski 18:01, 4 November 2005 (UTC)
- Adam, please read again Wikipedia:No original research. You are linking extensively to your own research. Wikipedia is not the place to publish your original research. Also see Wikipedia:Guide_to_writing_better_articles. You seem to write in a heavy duty academic prose style, which isn't really used here. Some of what you write might have been OK but I can't tell it apart. Sbwoodside 22:30, 4 November 2005 (UTC)
Simon, your reply is a meta-response. Is it a style of "Space-invaders"? You copy the original research with not proper references - is it correct???
You (and only you) inserted DIKW in Wikipedia in a few articles.
Why do you do it?
- I see that your self-promotion on the Web is perfect, my congratulations, but I would like to see your sc.publications too - maybe this information could clear my doubts why "you are linking extensively" to and "update" this subject.
--Adam M. Gadomski 16:41, 24 November 2005 (UTC)
Information#Information is not data does not seem available anymore. --Inkiwna 15:42, 2 March 2007 (UTC)
Data WAS the plural of datum
editThe first line of this article needs to change. Datum WAS the plural of datum, but no one uses it this way. In fact, in surveying, datum and data are too completely different words. Datum is a coordinate system for locating a point on the earth, while surveyors use data to mean what everyone else does. The plural of survey datum is datums, since data has a completely different meaning.
English does not follow the rules of a dead language that it happened to borrow a word from. See the back-formation article for numerous examples. You'll note that no one ever complains that "asset" is incorrect usage.
- Well, Datum has its own article, but I guess you're right that for this article probably the first line could be rewritten because in this context I think most people just talk about data and rarely use "datum" (not enough to justify the first sentence position). The first sentence / intro should summarize the article :-) Sbwoodside 19:22, 22 September 2006 (UTC)
- By the way, lots of people have been talking about changing the intro, why not be bold? WP:BOLD Sbwoodside 19:23, 22 September 2006 (UTC)
- The weakest statement that I am willing to make is this: at least one person still studiously treats datum as the singular, and data as plural. I execrate the mass noun treatment of data. But then, perhaps I'm one man crying in the wilderness ... Hair Commodore 20:02, 13 May 2007 (UTC)
- I can't think of the last time I saw "datum" used, even by people who routinely treat "data" as plural ("data are"). I'd say that in popular usage, people tend to treat data as a mass noun as they would information (and use them interchangably). And, for good or ill, this popular usage seems to be crowding out the traditional academic/professional treatment. At work yesterday I reviewed a draft policy document regarding the handling of sensitive data; one paragraph used "data are" the next use "data is." I pointed this out and the second paragraph was changed to "data are" as the correct construction. Go ask a sample of demographers, social scientists, physicists, doctors, market researchers, and other people who work with data professionally and a significant majority will say that the "data are" construction is correct (and the others are wrong ;-) XKL 16:08, 26 May 2007 (UTC)
- Educated folk have no problem using "datum" in the singular and "data" in the plural in English sentences. This whole discussion is an attempt to justify Newspeak, and is little more than a sorry excuse for mental laziness. The English Wikipedia wasn't to be written in Ebonics; that Wiki is yet to be created. —QuicksilverT @ 22:58, 5 December 2007 (UTC)
- I have rewritten the intro for the article in an attempt to capture the meaning and usage of the word without introducing the controversy in the first sentence. Quicksilver, you are arguing ad hominem with your "Educated folk" remark. This wikipedia article is (should be) attempting to reflect reality, and diversity of opinion within it, not your own view. Personally, as an "educated folk" myself, I am strongly of the opinion that English is defined as far as possible by the people who speak it, and that examination of usage indicates a strong preference for regarding data as a mass noun (eg. the formation "database"). However I am content to have that debate elsewhere. Joffan (talk) 00:40, 4 January 2008 (UTC)
The statement "but these are English sentences, so Latin grammar rules do not apply" seems to be an unencyclopaedic opinion tagged on to an otherwise neutral sentence stating the status of the word as plural in Latin. The rules applied in English sentences are clearly rules of English grammar, not Latin, but English happens to have the same rule as Latin in this instance, i.e., that a plural noun requires a verb in the plural. The debate is not whether Latin rules should apply to English, but whether the word data is plural or singular in English, based on etymology and usage. I propose to delete the clause "but these are English sentences..." if there is no further discussion. GKantaris (talk) 15:45, 2 January 2008 (UTC) - OK, as there is no discussion, I've deleted the clause. GKantaris (talk) 16:21, 14 January 2008 (UTC)
The problem is not one of right vs wrong but of precision. In general English usage, 'data' is used interchangably with 'information' so it feels more natural to use it as a mass noun. For more technical use, 'data' must be pluralised to distinguish it from 'datum' and 'information'. (15 (a datum), is part of 15-08-65 (data) which is my birthday (information).
Many words, such as 'average', 'intellegent' or 'fruit' have precise technical meanings that differ from the way they are used in everyday speech and there is nothing wrong with this. —Preceding unsigned comment added by 194.150.177.249 (talk) 14:33, 24 November 2008 (UTC)
- While I don't curse the traditional use of a plural verb form with data, I more often than not find it awkward, and I think most modern speakers treat data as a mass noun (like water, air, information), regardless of their education level. Below is the usage note at the entry for data in the American Heritage Dictionary of the English Language (3rd edition, 1992). I think it presents the issue well:
Data originated as the plural of Latin datum, "something given," and many maintain that it must still be treated as a plural form. The New York Times, for example, adheres to the traditional rule in this headline: "Data Are Elusive on the Homeless." But while data comes from a Latin plural form, the practice of treating data as plural in English often does not correspond to its meaning, given an understanding of what counts as data in modern research. We know, for example, what "data on the homeless" would consist of — surveys, case histories, statistical analyses, and so forth — but it would be a vain exercise to try to sort all of these out into sets of individual facts, each of them a "datum" on the homeless. (Does a case history count as a single datum, or as a collection of them? Is a correlation between rates of homelessness and unemployment itself a datum, or is it an abstraction over a number of data?) Since scientists and researchers think of data as a singular mass entity like information, it is entirely natural that they should have come to talk about it as such and that others should defer to their practice. Sixty percent of the Usage Panel accepts the use of data with a singular verb and pronoun in the sentence Once the data is in, we can begin to analyze it. A still larger number, 77 percent, accepts the sentence We have very little data on the efficacy of such programs, where the singularity of data is implicit in the use of the quantifier very little (contrast the oddness of We have very little facts on the efficacy of such programs).
- The Boston Globe reviewof the AHD gives insight into the philosophy of that dictionary's editors. Eric talk 14:08, 2 October 2009 (UTC)
- Here is some of what Merriam-Webster's Dictionary of English Usage, 1994, USA, pp 317-318 says on the subject:
To summarize, data has never been a plural of a count noun in English. It is used in two constructions — plural, with plural apparatus, and singular, as a mass noun, with singular apparatus. Both constructions are fully standard at any level of formality. The plural construction is more common.
Inaccurate pronunciations
editPronounced "Day-Ta" (US) and "Dar-Tar" (AU & UK*)
Living in the UK, I've only ever heard it pronounced as the former, "Day-ta"; only from Americans have I heard the latter, "Dar-Tar".
- Living in the Southern and Mid-Atlantic U.S., I've only heard it pronounced "Day-ta". JD Lambert(T|C) 01:54, 15 July 2007 (UTC)
I've lived in many states in the US, from the west coast to the east coast to the midwest. I've never heard anyone say dar-tar. I've heard day-ta and daa-ta (like Dagwood). Never dar-tar. Entbark 03:48, 23 July 2007 (UTC)
- Entbark, you may not have been to Massachusetts, or may not have heard someone from the Boston area, as they seem to be fond of injecting gratuitous "r"s into their speech. For example, listen to Norm Abram on The New Yankee Workshop. —QuicksilverT @ 23:41, 5 December 2007 (UTC)
Data synonym for information
editSomeone changed the page to say data is not a synonym for information. They should look it up in the dictionary: http://www.dict.org/bin/Dict?Form=Dict1&Query=data&Strategy=*&Database=* Daniel.Cardenas 15:34, 25 April 2007 (UTC)
- How can you post a reference which denies your own statement ??? From your link :
Data on its own has no meaning, only when interpreted by some kind of data processing system does it take on meaning and become information.
...
1234567.89 is data.
"Your bank balance has jumped 8087% to $1234567.89" is information.
Bob Novak 06:42, 26 April 2007 (UTC)
- I would also like to point you to some introductory material on information theory, like the one at MIT open course ware - Information and Entropy, where concepts like information, data and code are explained. Bob Novak 07:57, 26 April 2007 (UTC)
That is classroom material applicable to computer science people and the like, but not 100% applicable to the rest of the world. Thanks for the link. Daniel.Cardenas 15:04, 27 April 2007 (UTC)
Data: verb or noun?
editThis statement, 'The word data is the plural of Latin datum, neuter past participle of dare, "to give", hence "something given",' is a little confusing. If datum and data are both nouns, they cannot also be past participles since participles are verb forms. That statement makes it sound like the noun datum is a particple of dare. Nouns cannot be particples. The same word can be used as both a noun and a verb (e.g., "I scream" and "I heard a scream"), but a noun is NOT a participle EVER.
Oh, and I found where that phrase was taken from: http://www.johntcullen.com/sharpwriter/content/data_is.htm. Hardly a trustworthy source. He doesn't list any references, much less know the difference between a verb and noun.
Entbark 19:49, 12 July 2007 (UTC)
So, if no one is opposed to me changing it, I will modify the etymology section in a few days. Entbark 03:53, 23 July 2007 (UTC)
- The English usage section is still confused. Rather than try and win a debate, this needs to take a NPOV stance and observe there are two viewpoints:
- 1. That this is a Latin neuter noun and therefore the rules for a Latin plural apply.
- 2. That this is an uncounted noun and legitimately used in the singular.
- Clearly, we need a convention for this article. Common usage is the uncounted or mass noun. This seems to be backed up by the OED [1] which has this note on usage. Traditionally and in technical use data is treated as a plural, as in Latin it is the plural of datum. In modern non-scientific use, however, it is often treated as a singular, and sentences such as data was collected over a number of years are now acceptable. The etymology seems a little suspect though as we are told it is actually derived from a verb, yet the arguments used are that it takes the form of being a Latin singular neuter noun. Also, we know that datums is a legitimate plural usage of geological datum and people accept this, odd that the use of datums is not derided there through etymological argument. Spenny 13:59, 11 September 2007 (UTC)
- It is a declined form of the past participle of the Latin verb dare, "to give". The Latin "data" would translate as an adjective, "given", or as a noun, "given things"; it is equivalent. Because it is a participle, it grammatically functions as a noun or an adjective, and so follows the same pluralization rules as nouns and adjectives: singular -um, plural -a. --Nucleusboy (talk) 03:00, 28 November 2007 (UTC)
Data as plural
editI prefer "these data" because it makes everyone pause, and reflect on how wrong their notions of grammar are. —Preceding unsigned comment added by 71.193.226.225 (talk) 07:58, 2 April 2008 (UTC)
Of course, 'data' is the plural of 'datum', just as 'bacteria' is the plural of 'bacterium', 'media' of 'medium','phenomena' of 'phenomenon', 'criteria' of 'criterion', etc. etc. After all, English has a huge legacy from Latin (and Greek) to cherish, which shouldn't be chucked out for the sake of dumbing down. The term 'mass noun' is a licence to forced collectivization. The mere fact that 'data' (and 'media') are treated often as singular today is a sign of the degenerative grammatical dementia rampant in these supposedly advanced modern times. --Artefactme (talk) 10:16, 22 April 2015 (UTC)
mass, plural and determiners
editGrammatical rules dictate that a mass or uncountable noun, when appended to a determiner, must choose a determiner of the same type.
So if data is treated as a mass noun, one would ask " How much data was collected?" On the other hand if data is treated strictly as a countable, one would ask "How many data were collected?"
Does anyone else find this awkward? —Preceding unsigned comment added by 63.201.67.93 (talk) 06:11, 16 July 2008 (UTC)
- I agree completely! "How much data was collected" does sound awkward and a little childish. Dave (djkernen)|Talk to me|Please help! 20:24, 6 December 2011 (UTC)
PAISA PAISA PAISA —Preceding unsigned comment added by 220.226.199.10 (talk) 09:53, 19 July 2008 (UTC)
Citation to add
editThe current page says:
Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three.[citation needed] For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".
for the needed citation I propose
Most frequently the data - information - knowledge - [wisdom] hierarchy is attributed to Ackoff
Ackoff, Russell L (1989). “From Data to Wisdom” Journal of Applied Systems Analysis, v. 16 pp. 3-9
but it has been presented by earlier authors:
Kochen, Manfred (1974) Principles of Information Retrieval John Wiley & Sons Inc. (Ch 3)
I think that a careful reading of Kochen or Ackoff would lead one to argue that knowledge resides within the human mind (as soon as it is written down it becomes information) Thus I would change the example by deleting
" and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge"."
and substituting
"and the practical understanding of an experienced climber of the best way to reach Mt. Everest's peak may be considered as "knowledge".
CarlD (talk) 13:47, 14 August 2008 (UTC)CarlD
- Hmm. Under the usual rules of information theory, information seems to be an even lower level of abstraction than data; for example "e34t q3y y5i39.53yq3 53y q53q" would contain information, but not data, since it doesn't mean anything, whereas "4353675.7436" is data because it indicates a specific number. On the other hand, strings are data, whereas they only contain information; so in that sense information is more abstract than data. Ben Standeven (talk) 17:50, 24 August 2008 (UTC)
Data refers to a collection of organised information
editI think this statement is incorrect. I have always understood data to be raw unprocessed, where as information was determined by the data. Yet this statement seems to be saying that Information is the raw form and data is the processed form. I have checked up on Google and it seems that this article is the only place which refers to data as process form see
http://www.diffen.com/difference/Data_vs_Information
or
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/foils/introToCP1500.html
or
http://www.cs.siena.edu/~ebreimer/courses/csis-114-s08/lectures/Data%20vs.%20Information%20(4).ppt
To see what I mean each case they are talking about data raw facts without reference, information is facts with reference which can be used.Harvyk (talk) 04:23, 2 October 2008 (UTC)
- I would like to take exception with the definitions selected. They are expressed as personal opinion (2 out of 3 have been taken down). It would be much better to use valid technical documents such as the Bell System Technical Journal: https://monoskop.org/images/a/a6/Hartley_Ralph_VL_1928_Transmission_of_Information.pdf This reference from 1927 by R. V. L. Hartley states very clearly that information is a sequence of symbols which can be interpreted. This is the definition that Claud Shannon references in his 1948 paper published in the same journal: https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf This paper is considered the begining of Information Theory and the Information Technology industry. So Shannon the Father of Information Technoloy used the definition that information is a sequence of symbols which can be interpreted.
- On the other hand, the word “data” has been associated with the scientific method for hundreds of years. In the scientific method, the data are the recorded measurements determined by experimentation. These data are usually recorded in a log book and are accompanied by extensive explanations of the experiment from which the measured values were taken. Usually an experimental log book will have several values arranged in a table with the table positions clearly associated with measured features of the specific experiment. So, we can establish a scientifically precise categorical meaning of a recorded datum value by where it is recorded in the table. So, the datum’s categorical meaning is determined by its location in a data structure and the value recorded in that location completes the datum’s meaning.
- So, information is a sequence of symbols that can be interpreted (Hartley & Shannon), and data is preprocessed information whose categorical meaning is determined by its location in a data structure (the scientific method). Allyn Shell (talk) 04:22, 10 September 2023 (UTC)
I agree, as this is the academic interpretation I have always heard. I was surprised to find it reversed in the article. —Preceding unsigned comment added by 68.46.139.114 (talk) 18:37, 9 October 2008 (UTC)
A matter of personal importance
editIt is plainly evident that the question of whether "data" is a plural or is a mass noun is relevant only to the argument itself. This discussion exists to perpetuate the sense of correctness felt by the arguers on either side, nothing more. Struhs (talk) 18:56, 29 September 2009 (UTC)
Though, there exists some sense of correctness felt by arguers, as exists in all debates, etymology and word usage IS the domain of encyclopedic knowledge. The debate is important to linguists, authors of style guides, and academic sources. Many etymologists may even find evolution of the word striking and exciting.71.222.241.78 (talk) —Preceding undated comment added 07:44, 9 July 2010 (UTC).
Consistency in this Article
editI know that the whole data is/data are debate is a hot button on this page, but Wiki policy does require one to at least be consistent. Since the article opens with the assertion that data is the plural of datum I believe would should use it that way consistenly in this article, except of course where we are presenting examples of its usage as a singular noun. So I went through the few places where its use was inconsistent and, um, regularized it. Dave (djkernen)|Talk to me|Please help! 20:28, 6 December 2011 (UTC)
- The problem with this is that the article also goes on to say that the most common usage is that of the singular, and Wikipedia's policies all go towards common usage rather than correct usage. Plus, the statement that data is the plural of datum, is, as you've noted, controversial, and thus its inclusion without qualifiers means it violates Wikipedia's Neutral Point of View policy. It would not be a good idea to make a choice based on a violation.
dictionary
editdata is a type of chart that you do in data for example projects — Preceding unsigned comment added by 98.77.248.196 (talk) 21:28, 28 August 2012 (UTC)
Times Quote
editThe article says: 'Some major newspapers such as The New York Times use it either in the singular or plural. In the New York Times the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day.' However the author of this sentence is parsing the second example incorrectly. The verb 'is' refers to 'the first year,' definitely singular, and not to the word 'data.'
- Nope, the "is" refers to the word "data"— imagine an example with another plural word: "The first year that people are interested", "The first year for which children were vaccinated", etc. Good one, though. KDS4444Talk 00:17, 4 June 2015 (UTC)
Data(computing) in an Operational definition states Data are the quantities, characters, or symbols on which operations are performed by a computer.... Characters/symbols include what is commonly referred to as texts. Examples of texts where operations are performed by a computer include: computer programs (say written in COBOL), word processing, and a Google search of millions of web pages. As I understand the Theoretical definition of data given here, texts in general are neither qualitative nor quantitative, thus texts in general are not data (some texts, "male/female" for example, may be qualitative data)
If texts in general are not data then is the following sentence correct? Data and texts are the quantities, characters, or symbols on which operations are performed by a computer.... Rather than simply adding and texts, is their a better correction? It would be awkward to have an article titled Data(computing) that in its first sentence expands the article beyond just data.
It seems unfortunate to have excluded texts (and language!) from being data. If that was not intended then possibly changes here .... 50.136.247.190 (talk) 08:35, 14 July 2013 (UTC)
Data in different contexts
editI am a newbie, please advise if there is a better way to go about what I am trying to do. Which is to suggest that there are fundamental problems with the Data entry with the hope that it may be improved.
(1) There is an entry for "Data" and an entry for "Data (computing)", but the talk page for "Data" refers to "WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology". This sounds more like a talk page for "Data (computing)" than for "Data". I would have thought that the entry for "Data" would encompass contexts other than computing. Examples follow.
(2) Consider for example data in the physical sciences, the life sciences, the social sciences, in statistics and in "official" statistics.
(2a) Data in the physical sciences tends to measurements, e.g., of the positions of stars that were the motivation for Gauss' development of the normal distribution. Note Wikipedia entry Accuracy and precision.
(2b) Data in life and social sciences often consists of counts, e.g., numbers of persons in a population or numbers of births during a time period. Note Wikipedia entry Population biology.
(2c) Official statistics refers to data produced by governments, from which various kinds of statistics are derived, e.g., economic statistics, demographic and social statistics, and environmental statistics. The United Nations Statistics Division website http://unstats.un.org contains extensive information on official statistics.
(2d) Data in Statistics incorporates all these contexts. Note the definition of a statistic as "a function of the data" ([[Estimator]]).
(3) As the word is used in all these contexts, "Data" does indeed refer to "a collection of organised information". This shows that usage in these areas is not consistent with the usage given by http://www.diffen.com/difference/Data_vs_Information. Either one accepts that the same word is used with different meanings in different contexts, or one makes a choice for one meaning or the other. I suggest that in this instance extensive and at least roughly consistent usage of the word in Life sciences, Social science, Official statistics and Statistics ought to override the usage proposed in http://www.diffen.com/difference/Data_vs_Information.
(4) As the word is used in these contexts, the characterization can be sharpened beyond "organized information", which is so broad as to encompass nearly anything. "Data" is used more specficially to refer to systematic information about entities in some well-defined aggregate. "Systematic" signifies that the same information is provided for every entity in the aggregate, undefined values (age at first marriage for never married persons) and missing values excepted. "Well-defined" signifies conditions that define membership in the aggregate ("Emperor penguins in Antarctic on midnight 31 December 2013/1 January 2014"). Data in this sense is more specific than Information.
(5) From this perspective, at least, the sentence with which the Data entry begins, "Data are values of qualitative or quantitative variables that belong to a set" is deeply confused. Data provides values of variables, and the variables it provides values for constitute a set, but there is no reference to the set of entities the variables refer to.
(6) Data in this sense may or may not be "raw". The "raw" data captured from Population census (this redirects to Census, which is far more general) census questionnaires is processed by "editing" to produce "clean" data. The processes are described in detail in the United Nations Principles and Recommendations for Population and Housing Censuses and Handbook on Population and Housing Census Editing.
(7) Data in this sense is information, but of a very specific kind. Information is far more general that data in this sense.
(8) "Data" probably encompasses too much to manage with a single meaning for all contexts. The challenge is to identify a manageable number of meanings and characterize them well. The characterization sketched above may not be able to accommodate literary texts regarded as data, for example, and this may be a well established and defensible usage. It is probably necessary to say a good deal more about Data structure, though not only in the context of computing. The content of the [[Data]] and [[Data (computing)]] entries does not to me justify the distinction.
(9) This discussion is pertinent to improving the Data quality assessment entry, currently in a primitive state. Considering data quality assessment issues might be a useful for clarifying what "data" is.
Data - Plural misused as singular - Claimed Mass Noun
editThe article claims that the word data is now considered a mass noun. However, this is only true in some contexts (see http://grammarist.com/usage/data/ ). Additionally, data cannot truly be considered a mass noun the way words such as audience is. Someone might easily say "The whole audience burst out laughing" and no one would think twice, but if we said "The whole data is stored on this flashdrive" most people would object and say "No, all the data is stored there."190.81.202.250 (talk) 18:03, 23 February 2015 (UTC)
- The simple response to that argument - What is a single datum (not counting anything listed here. Oiyarbepsy (talk) 23:00, 16 April 2015 (UTC)
- Data is a mass noun the same way that "sand" is a mass noun. To compare it to your examples, "all the sand" makes sense, while "the whole sand" doesn't. Red Slash 03:38, 23 April 2015 (UTC)
- I think the facts are perhaps more complicated than this. "Data" technically is a plural of "datum", but it is no longer consistently used that way in modern English. Occasionally we use it in plural form, and in others we treat it as a mass noun, and the rules of which form of which verb to use are rather inconsistent (and therefore not exactly "rules", ay?). Maybe we need to distinguish a rule about what is right (vs. wrong) from one about what is preferred (vs. not) from one that indicates what is acceptable (vs. unacceptable). What is acceptable may not be preferred or right; what is preferred must be acceptable but not necessarily right; what is not preferred may still be acceptable; what is right is usually also acceptable but might not be preferred (?).KDS4444Talk 01:14, 4 June 2015 (UTC)
- I made a Venn diagram. I make a lot of diagrams. KDS4444Talk 02:22, 4 August 2015 (UTC)
- Data is a mass noun the same way that "sand" is a mass noun. To compare it to your examples, "all the sand" makes sense, while "the whole sand" doesn't. Red Slash 03:38, 23 April 2015 (UTC)
I just want to say that this Venn diagram is beautiful. Red Slash 17:23, 2 November 2016 (UTC)
- I want to second my appreciation for that diagram! Alex Jackl (talk) 16:30, 20 September 2017 (UTC)
- I'm coming to the party even later than the rest of you, and I also wholeheartedly agree with the sentiments expressed by User:Red Slash and Alex Jackl! CruiserBob (talk) 16:29, 11 August 2020 (UTC)
Data Page Opening Paragraph
editIt seems to me that the opening statement is not as clear as it could be. I almost think the more technical paragraphs below the opening statement might be better as an opening. I didn't want to just change it because there is good stuff there and I didn't want to overwrite anyone's good work. Alex Jackl (talk) 16:30, 20 September 2017 (UTC)
Data vs. Information
editThis topic has been broached again and again so I am hesitant to just make changes to the open paragraphs of the article without a discussion here. Although "surprising" data may be more informative, that is hardly the key distinction between data and information. As has been said many times here in this talk page and referenced , information is data with context. This is reference din the "Meaning" section of the article. I suggest taking the overly academic second sentence of the opening paragraph out and replacing it with one for a more general audience. If I get no objection here I will do that in a week or two. Alex Jackl (talk) 17:02, 5 December 2018 (UTC)
- Seeing no objection I made the change and added a basic citation. Alex Jackl (talk) 22:29, 11 December 2018 (UTC)
I think on 1:29, 13 July 2022, Pooryorick~enwiki made an edit (not Discospinster at 17:22, 17 August 2022, as I previously thought) that replaced the very first sentence:
Data are individual facts, statistics, or items of information, used for evidence-based logical reasoning.
with something that eventually became
In a conceptual model, data is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. A datum is an indivdual value in a set of data.
I think this was a big mistake. The sense of "data are individual facts", previously the very first words of the article, is completely lost and not recovered elsewhere in the article. Moreover the replacement refers to what is quite clearly a different concept, namely raw information in the sense of information theory, or perhaps "data structures" in the sense of computer science. I strongly support reverting this. Jimmymath (talk) 17:55, 18 August 2022 (UTC) edited Jimmymath (talk) 18:43, 26 August 2022 (UTC)
Moreover, maybe I misunderstand things, but it seems that a few editors are making several edits at random places at high frequency without any sort of consultation on the talk page first. An example is this first paragraph over the last month or two. Can this be limited? Jimmymath (talk) 18:43, 26 August 2022 (UTC)
Phillipabatz.Wikia
editThat part about data, information, knowledge, and wisdom looks like it *could* have come straight from my Wikia. AWESOME! Do give credit where it is due though, please. TheLastWordSword (talk) 11:25, 4 March 2019 (UTC)
- Hi there! Could you provide us a link to your WIkia page here? I just looked at the reference it links to and there is a decent conversation in that source (the Joint Chief's document on intelligence) . If someone did use your WIkia as a source then we should reference it but there are MANY sources on this out there. But more sources is better than fewer! Alex Jackl (talk) 21:10, 7 March 2019 (UTC)
Singular/Plural/Mass (again!)
editIn reading the article, I found it a bit jarring when I hit the sentence, "For example, the height of Mount Everest is generally considered data." That sentence is problematic whether we're considering data to be a mass noun or as the plural of datum.
If the word data is being used as the plural of datum, the sentence should read, "For example, the height of Mount Everest is generally considered a datum," or perhaps "For example, the height of Mount Everest is a datum." On the other hand, if the word data is being used as a mass noun, it should probably be, "For example, the height of Mount Everest is generally considered to be a piece of data," given that discrete units/portions/items of things that make up mass nouns are typically (almost always? always?) distinguished from the mass noun - 'a grain of sand,' 'a piece of glass,' 'two gallons of water,' 'an iota of courage,' etc.
I'm going to change it to one or the other ('a piece of data' or 'a datum') but I want to see if there's any kind of consensus on the issue. If you have an opinion, please share it. I'm leaning toward 'a piece of data' given that one can reasonably argue that 'a piece of data' can reasonably be considered to be a singular form of the plural data, whereas 'a datum' seems much less likely to appease those who think that data is (and always must be) a mass noun. So if I don't get any comments by the time I get back to it, 'a piece of data' is what I'm going to go with. CruiserBob (talk) 17:24, 11 August 2020 (UTC)
Data as a function of Observation - or NOT
editI removed the reference for collecting through observation because, although the OECD defines it as such, many data are generated through inference or deduction and therefore are not directly generated from observation. It was not "incorrect" just potentially misleading and too narrow a definition for the general page for "Data" in Wikipedia. Alex Jackl (talk) 17:06, 7 October 2021 (UTC)
Computer
editWhat type of information was drawn from the data? 2409:4063:4D87:747E:409C:334C:B82E:C1ED (talk) 11:26, 7 July 2022 (UTC)
- In the IT industry, Information does not come from data. Information was clearly defined by Claude Shannon in the 1940's when the IT industry began. From Christmas of 2009 to about April Fools Day of 2013 Wikipedia's Information page started with the statement "Information, in its most restricted technical sense is an ordered sequence of symbols that can be interpreted as a message." This is how Shannon use the term. (For most of that time another comment in the introduction included a statement about information as a concept or as "the message." The latter became the definition during the great philosophy rewrite of Wikipedia in 2013 and has led to great confusion. It almost seems that they were trying to make the IT industry ceased to exist.)
- Data on the other hand is processed information. It is stored in data structures. The categorical meaning of the data is know by its location in the data structure. (Think tax form.) Sometimes the data structures are stored in databases. At NASA we receive large amounts of information from our space telescope satellites. We call that facility the data capture facility. It starts by copying the raw information for later use if necessary. Then the information is scanned and identified as the time stamp, the latitude and longetude of the picture, and the picture taken by the telescope. This data is then stored in the data storage silos.
- So the words "information" and "data" are unambiguous to people in the IT industry. But cell phone sales people seem to consider "information" too dry and impersonal, so they substitute the word "data" to make it warmer and more personal, but not so in the IT industry.
- Information is a sequence of symbols that can be interpreted. (Noise cannot be interpreted.)
- Data is preprocessed information whose categorical meaning is known by its location in a data structure.
- These definitions are not confused by IT people (unless they are employed by a communications organization, talking with the public). 50.206.176.154 (talk) 18:33, 9 December 2022 (UTC)
Plural
editData is now treated as a singular mass noun, even in books, slightly more than as a plural: ngrams Red Slash 15:38, 14 July 2022 (UTC)
- Alright, it's been months; if anyone objects to this, please put it on the talk page. Almost the entire article already treated "data" correctly, as a singular noun; it's now been brought into alignment in the lead, as well. Red Slash 22:03, 9 December 2022 (UTC)
"Data-driven (disambiguation)" listed at Redirects for discussion
editThe redirect Data-driven (disambiguation) has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 February 8 § Data-driven (disambiguation) until a consensus is reached. Duckmather (talk) 22:50, 8 February 2024 (UTC)
Data-driven
editI converted what used to be an unsourced list of "data-driven" activities, and put these all into the See Also section instead. The adjective of "data-driven" doesn't really come up in this article, and its existence as an unsourced section is somewhat disruptive to the current flow of the article. Having the same exact links be in the See Also section I feel does this job perfectly well. Utopes (talk / cont) 20:11, 11 February 2024 (UTC)
- @Duckmather:, after the reversion, are you intending to provide verification for the inclusion of the data-driven tangent? Utopes (talk / cont) 08:12, 14 February 2024 (UTC)
- @Utopes: Yes. First, I am planning to have the RfD sorted out though (as the fate of the redirects Data-driven and Data-driven (disambiguation) are intimately tied with the fate of this section). I'll add a cite or two soon, but I might not get around to the entire section for a while. Duckmather (talk) 14:32, 14 February 2024 (UTC)
- @Duckmather: See, I don't fully agree with the two fates being tied as you describe. At the RfD where you described your reversion, you said: "A lot of things are called "data-driven", so it seems like we should have some content about this concept, even if only a little." With what you said here, I agree! There are a lot of things that are driven by data, and "data-driven" is a likely search term too, so it'd be good to have something for readers. If just having content about the concept of data-driven is what you're looking for, there's a wikt:data-driven that is totally sufficient to this. Or, if you're specifically talking about material about "data-driven" on Wikipedia, mentions of "data-driven" can be inserted with sources (preferably sooner than later to justify it asap). But none of this is any justification for indiscriminately listing verbs that are data driven and putting them into an uncited list, and pointing a PTM disambiguation redirect to it.
- If you would like to add prose about "data-driven" that reads like a paragraph with citations, you have my encouragement. But a section that does nothing but list "data-driven" things is extremely WP:UNDUE focus within the context of the article, and has no reason to be more important than the rest of the "see also" links, (which is where all of the blue links should be added to as their only mentions, to avoid going off-topic). If anything, the concepts for "data bank", "data integrity", and "data structure", which only appear once in this article's see also section, are far more important links to highlight and guide readers to than "data-driven control systems", "data-driven marketing", and the other data-driven-verb red links.
- From my point of view, there's no home for "data-driven (disambiguation)", as there shouldn't be any WP:PTM disambiguation that takes place on the page; too distracting. But "data-driven" and "data driven" are totally fine, and could either point at wiktionary, or even at Data without mention, I don't see either being a problem. I would just discourage the idea that "data-driven and data-driven (disambiguation) are intimately tied" with one section, because they really don't need to be, and there are alternatives that still meet your initial goal of "content about this term". Utopes (talk / cont) 23:20, 14 February 2024 (UTC)
- Oh wait, I just realized that in addition to this, the section already doubles up its purpose with Data#In other fields, which is the location to talk about anything related to data, in other fields. With everything being said, I'm going to remove the section header and treat the links equally in the See Also again. If you would like to add more about data-driven, please make sure it is cited and verified beforehand. Best, Utopes (talk / cont) 23:34, 14 February 2024 (UTC)
- @Utopes: Yes. First, I am planning to have the RfD sorted out though (as the fate of the redirects Data-driven and Data-driven (disambiguation) are intimately tied with the fate of this section). I'll add a cite or two soon, but I might not get around to the entire section for a while. Duckmather (talk) 14:32, 14 February 2024 (UTC)
"🆥" listed at Redirects for discussion
editThe redirect 🆥 has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 August 24 § 🆥 until a consensus is reached. Duckmather (talk) 21:04, 24 August 2024 (UTC)