Wikipedia talk:Plagiarism/Archive 10
This is an archive of past discussions on Wikipedia:Plagiarism. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 5 | ← | Archive 8 | Archive 9 | Archive 10 |
Avoiding plagiarism
The section "Avoiding plagiarism" has problems, there is no example of how copying text from copyleft and PD sources is handled on Wikipedia. Either the title needs changing or more examples need to be added.
I suggest that the title is changed to "Avoiding plagiarism from text under copyright" and a top note added to link to the section called "Copying material from free sources". -- PBS (talk) 16:37, 11 October 2014 (UTC)
The "usually" part
- As you saw, I already noted my problems with the section here and here. Right now, the lead and that section currently relay that, "In addition to an inline citation, in-text attribution is usually required when quoting or closely paraphrasing source material." I don't agree with the "usually" part; this is because, per Wikipedia:Citing sources#In-text attribution, presenting basic facts with in-text attribution can be misleading. All in-text attribution does in a "basic facts" case is make a fact look like an opinion. Sure, we can add that the statement is the majority view or is a widespread view, like the WP:In-text attribution guideline states and like Moonriddengirl acknowledged with this edit, which seemed to be a response to my complaint. But it's not always a "view" matter; sometimes it's simply a "fact" matter, and I don't see the problem with close paraphrasing in those cases, without naming the author who commented on that fact. And if it's best not to go with close paraphrasing in such cases, then why can't simply putting the quote in quotation marks be enough? If a reader or editor wants to know who made that statement, they can check the reference (unless it's an offline reference or "pay to read" reference that requires more work to access). And, sure, we can simply reword everything in our own words, but sometimes the source has stated the matter better than can be paraphrased. And sometimes, like WP:Close paraphrasing notes, close paraphrasing is unavoidable.
- It often annoys me to see in-text attribution unnecessarily used for non-WP:Notable authors or commentators; just seeing the quotation marks for the quote without the author being mentioned in the text tells me that the Wikipedia editor is aware of plagiarism and is not trying to take credit for the text. The text often flows better to me without seeing the WP:Notable or non-WP:Notable author, commentator or publication mentioned, especially if it keeps those instances from being WP:Red linked; for example, when a Wikipedia editor is quoting a dictionary in the lead of an article (Wikipedia editors usually do that without in-text attribution, by the way), I don't like seeing that dictionary mentioned unless necessary, especially since in-text attribution in that case makes it look like that dictionary is the only dictionary that holds that view. I often feel similar to Template:According to whom: "Do not use this tag for material that is already supported by an inline citation. If you want to know who holds that view, all you have to do is look at the source named at the end of the sentence or paragraph. It is not necessary to inquire 'According to whom?' in that circumstance." Flyer22 (talk) 03:52, 12 October 2014 (UTC)
- Just wanted to note that using your source's words just because they stated it better is not necessarily transformative. :) If the source is non-free, there needs to be a valid reason to quote, such as those described at WP:NFC. Where close paraphrasing is unavoidable and quotation may not be the answer is when there are limited ways to express the fact. In that case, creativity is low and plagiarism is less of an issue. I think it would be ridiculous to state in an article, "John Smith was, in the words of Fred Jones, born in Hartford, Connecticut." :) But if we're closely paraphrasing because the author has said it particularly well - not just because everybody would say it about the same way - then I think in-text attribution is generally appropriate. In that case, we are closely following the author's words because the language is creative (and artistic and apt) and not because it isn't. Citation doesn't, in the absence of quotation marks, suggest in any way that we are appropriating expression without in-text attribution, and plagiarism is not just about acknowledging who holds views - it's about acknowledging creative expression of views.
- WP:MOSQUOTE sets out the minimal attribution requirement - in-text for full sentence or more. Sometimes in-text may be appropriate for smaller chunks, and sometimes perhaps not. I think it is a question of local style whether it's better to say "Foo is defined as 'the definition of foo'.[citation]" or "Foo is defined, in the words of Fee, as 'the definition of foo'.[citation]" (That said, copying from dictionaries can be risky, at least if our own article is a stub. Some people think the brevity of the content makes it okay, but in fact the heart of dictionary writing is far more concise than encyclopedia writing.) But I always worry when I see articles constructed with liberally quoted phrases and no in-text attribution at all that what we are seeing is an editor who is using somebody else's words in a non-transformative manner because they cannot or will not create free content themselves. :/ --Moonriddengirl (talk) 13:36, 12 October 2014 (UTC)
- Thanks for commenting, Moonriddengirl. It's because of your and my comments above that I still feel that the aforementioned use of "usually" is not appropriate in the guideline. I think "often" would be better. If we are to keep "usually," then we should at least amend that with something about basic facts and perhaps with one or two other things we stated above. Again, there is the matter of briefly quoting from a publication, such as a dictionary, while putting that quote in quotation marks without mentioning that the quote is from that publication. And I'm speaking of "basic fact" matters, rather than "opinion" matters, though dictionaries can sometimes differ in their statements. What do you think of the quoting from dictionaries, without naming the dictionaries, at the Slut article? I did not add those quotes or quotation marks, but I see that type of quoting often on Wikipedia, including in WP:Good and WP:Featured articles, and especially in articles about words or in Etymology sections. And what these dictionaries are stating are usually in every other dictionary, or almost every other dictionary, which is why I see WP:In-text attribution in those cases as usually or often inappropriate. Also, from my several years of experience on Wikipedia, most editors (and I mean a significant number of experienced Wikipedians as well) seemingly don't even check the WP:Plagiarism guideline or don't know about it. They are more likely to know about and check the WP:In-text attribution guideline. Flyer22 (talk) 15:23, 12 October 2014 (UTC)
- Dictionary definitions are fully copyrightable, so they should always differ in their statements unless there is license or they are using an out-of-copyright base. :) I myself would identify the source in that use, somewhat like the definition in this guideline is clearly identified. I wouldn't call it plagiarism, though. --Moonriddengirl (talk) 19:47, 12 October 2014 (UTC)
- Thanks for commenting, Moonriddengirl. It's because of your and my comments above that I still feel that the aforementioned use of "usually" is not appropriate in the guideline. I think "often" would be better. If we are to keep "usually," then we should at least amend that with something about basic facts and perhaps with one or two other things we stated above. Again, there is the matter of briefly quoting from a publication, such as a dictionary, while putting that quote in quotation marks without mentioning that the quote is from that publication. And I'm speaking of "basic fact" matters, rather than "opinion" matters, though dictionaries can sometimes differ in their statements. What do you think of the quoting from dictionaries, without naming the dictionaries, at the Slut article? I did not add those quotes or quotation marks, but I see that type of quoting often on Wikipedia, including in WP:Good and WP:Featured articles, and especially in articles about words or in Etymology sections. And what these dictionaries are stating are usually in every other dictionary, or almost every other dictionary, which is why I see WP:In-text attribution in those cases as usually or often inappropriate. Also, from my several years of experience on Wikipedia, most editors (and I mean a significant number of experienced Wikipedians as well) seemingly don't even check the WP:Plagiarism guideline or don't know about it. They are more likely to know about and check the WP:In-text attribution guideline. Flyer22 (talk) 15:23, 12 October 2014 (UTC)
- If you look at this, this, this and this dictionary source for the term slut, they have the same or close to the same definitions. So that's what I mean by not crediting a sole dictionary as though it's the only one with that definition, and is also what I meant by "though dictionaries can sometimes differ in their statements." Flyer22 (talk) 00:02, 13 October 2014 (UTC)
- Right, but that argues to the point that we can, in fact, write our own definition. :) There will be close paraphrasing, but this is one of those cases where content is closely paraphrased because there are limited ways to say the same thing. It isn't plagiarism in that case. (Nevertheless, each definition differs because of the copyright factor; there is some creativity in each, I would imagine.) In a case like that, I'd say it myself and cite several definitions or quote and attribute in-text. But that's a question of style. Again, I wouldn't call it plagiarism. --Moonriddengirl (talk) 10:50, 13 October 2014 (UTC)
- If you look at this, this, this and this dictionary source for the term slut, they have the same or close to the same definitions. So that's what I mean by not crediting a sole dictionary as though it's the only one with that definition, and is also what I meant by "though dictionaries can sometimes differ in their statements." Flyer22 (talk) 00:02, 13 October 2014 (UTC)
- I understand your point. Still, I don't see the "there is some creativity in each" aspect for some of those; again, some of those sources have the same definition. I suppose the creativity aspect in those cases can be argued if they reposition those same definitions, such as place them in different orders (which I do see in some cases, whether for the term slut or different terms). Flyer22 (talk) 20:50, 13 October 2014 (UTC)
- They all look like they have different definitions to me. :) They do use some of the same language in those definitions, but that's entirely to be expected - that's where the definitions are not creative. They're simply using well-known and accepted synonyms to define the words. What's creative is their organization of said language. To see if I can explain this better, I wouldn't regard it as plagiarism to say "'Slut' has been defined in part as a slovenly woman" (I would cite several, to show the wording is common) without quotation marks. However, I would regard it as plagiarism to say "'Slut' has been defined as a saucy girl", since those words are distinctive to one source in the definition set we have, as is the more straightforward "an insulting word for a woman whose sexual behavior is considered immoral". (Again, with the quotation marks and a citation, I wouldn't regard it as plagiarism at all, though I also don't regard it as best practices.) --Moonriddengirl (talk) 10:52, 15 October 2014 (UTC)
- I understand your point. Still, I don't see the "there is some creativity in each" aspect for some of those; again, some of those sources have the same definition. I suppose the creativity aspect in those cases can be argued if they reposition those same definitions, such as place them in different orders (which I do see in some cases, whether for the term slut or different terms). Flyer22 (talk) 20:50, 13 October 2014 (UTC)
- Per my "20:50, 13 October 2014 (UTC)" post and your "10:52, 15 October 2014 (UTC)" post, we are in agreement that "[w]hat's creative is their organization of said language." Because, otherwise, phrasings such as "a slovenly woman" is in more than one of those dictionaries. Flyer22 (talk) 10:59, 15 October 2014 (UTC)
I think that the current examples in section complicate the issue because they are mixing up the desirability for in-text attribution to meet non-NPOV issues and the desirability of in-text attribution to satisfy plagiarism requirements. Let us suppose that instead of:
- "Political transitions brought about by the collapse of authoritarian rule, democratization, or political reforms also make states particularly prone to violence."
The sentence read
- "Political transitions brought about by the collapse of authoritarian rule may be large."
Would all the examples given remain the same? -- PBS (talk) 13:47, 20 October 2014 (UTC)
Is there a difference between free and and non-free when judging whether a text has been plagiarized
Just noting that PBS moved some text and added the heading above, so my comment below became separated from its context. It was made in response to:
The section "Avoiding plagiarism" has problems, there is no example of how copying text from copyleft and PD sources is handled on Wikipedia. Either the title needs changing or more examples need to be added.
I suggest that the title is changed to "Avoiding plagiarism from text under copyright" and a top note added to link to the section called "Copying material from free sources". -- PBS (talk) 16:37, 11 October 2014 (UTC)
SlimVirgin (talk) 17:21, 21 October 2014 (UTC)
- PBS, I would oppose doing that, and I think we ought to remove any implication that there is a difference between free and non-free when judging whether a text has been plagiarized. If we insert a source's words into an article and don't acknowledge the source in the text, it makes no difference whether the source material is still under copyright when judging whether it has been plagiarized. That is, when we decide to write "Smith (2014) argued that" to give Smith credit, the same considerations apply to Smith (1814).
- The only difference between the two is that, with the latter, there is no copyright violation if we reproduce the text, but that's a separate issue. SlimVirgin (talk) 06:07, 13 October 2014 (UTC)
- SV if it is a matter of attribution of a POV then yes one would write "Smith argued that", as of course one would attribute things like poetry, but for the majority of text that is copied form PD and copy left sources, the way that attribution is done is not usually through in-text attribution. How it is done is laid out in detail in Wikipedia:Plagiarism#Copying material from free sources and while it may include in-text attrition, attribution can be done in other ways as well, but surly you must already know this from reading the guideline and the long conversations on these talk pages, in which you have been involved. -- PBS (talk) 06:43, 13 October 2014 (UTC)
- I've never agreed with the guideline on those points, though, PBS, and I don't know where the consensus came from for that view to be added. The situation is this: If we add text to an article, it must be sourced, per V and NOR. Those sources must be acknowledged as inline citations (in footnote or parenthetical style), per V. When we quote or closely paraphrase, we need in-text attribution too, per V and CITE. Whether the source material is free or non-free makes no difference.
- I think you may be confusing publishers with sources. For example, if we were to insert a Citizendium article into WP, we would note at the end that it had come from Citizendium. But it would still have to be sourced, and where appropriate would still need in-text attribution (regardless of Citizendium's rules; it's now on WP, so WP rules apply).
- Perhaps there are editors who are creating articles by mixing up text they have written and text that other authors have written, if the latter articles are free, and then not sourcing them? If that's happening, it's not a good thing – particularly when it's an older text such as the EB 1911, which would almost never be appropriate for WP – in part because it confuses author-as-publisher/author-as-source, and copyright/plagiarism. SlimVirgin (talk) 06:58, 13 October 2014 (UTC)
- You may disagree with it but there are Wiki projects dedicated to importing text from PD sources (see for example Wikipedia:WikiProject Dictionary of National Biography), so the consensus is clearly against you (as you must know from your previous involvement in conversations on this talk page). The fact that an experienced editor like you is confused show that the issue I raised at the start of this section needs addressing as less experienced editors are even more likely to be confused by the current wording. -- PBS (talk) 07:22, 13 October 2014 (UTC)
- As to you last mentioned concern "Perhaps there are editors who are creating...", If this guideline is followed then the appropriate attribution will be in place for PD sources (inline citations) and attribution where appropriate. There are individuals such as DavidBrooks and Charles Matthews who spend a lot of time fixing articles that do not meet the requirements of this guideline. -- PBS (talk) 07:36, 13 October 2014 (UTC)
- Perhaps there are editors who are creating articles by mixing up text they have written and text that other authors have written, if the latter articles are free, and then not sourcing them? If that's happening, it's not a good thing – particularly when it's an older text such as the EB 1911, which would almost never be appropriate for WP – in part because it confuses author-as-publisher/author-as-source, and copyright/plagiarism. SlimVirgin (talk) 06:58, 13 October 2014 (UTC)
- I see that WP:V is now taking a tougher line on inline citation. While in principle I agree (in most cases), end-citation has been considered sufficient in the past, and there is of course is a large legacy of articles done that way. (I note also that deWP, more stringent in a number of ways about content, is less concerned with inline citation in general.) I would regard it as good practice to do attribution at least per para for imported PD text. There are actually bigger issues, such as subediting and factchecking the text. Charles Matthews (talk) 07:48, 13 October 2014 (UTC)
- "When we quote or closely paraphrase, we need in-text attribution too, per V and CITE. Whether the source material is free or non-free makes no difference." I don't agree with that -- always needing in-text attribution. But I and others have already been clear as to why we disagree with that, in the latest discussions at this talk page and at the WP:Village pump. There is clearly WP:Consensus that in-text attribution is not always required for quoting or closely paraphrasing. It's why the WP:Plagiarism guideline uses the "usually" wording (which, as noted in the Break section below, is wording that I object to), instead of "always" wording (granted, it did used to state "almost always" until days ago). It's also why the WP:In-text attribution guideline doesn't support an "in-text attribution in every quoting or close paraphrasing case" angle. And if the WP:Verifiability policy and WP:Cite guideline give a different impression on all of this, that needs to be fixed. We don't need any policies or guidelines contradicting each other on this matter. But then again, WP:In-text attribution is a part of the WP:Cite guideline. And I obviously don't mean "fixed" with an "in-text attribution in every quoting or close paraphrasing case" angle. Any alteration to the guidelines to state such is something that I will object to and revert, and then start a wide-scale WP:RfC on it if needed, even though we pretty much already had a wide-scale WP:RfC on this matter. Like I stated, I don't see the dictionary examples I gave below as needing in-text attribution. Flyer22 (talk) 07:38, 13 October 2014 (UTC)
- Flyer, could we keep this section for the discussion of plagiarism v copyright? It's something that I believe has been confused on WP for years (equally, PBS believes that I am wrong about it), so it would be good to keep this section separate from the general question of when in-text is needed. SlimVirgin (talk) 07:46, 13 October 2014 (UTC)
Our article, Amber, contains an example of the problem. (I found this by following "what links here" from the EB 1911 template, then picking an article from the A list.)
- Wikipedia (no source, i.e. no inline citation and no in-text attribution): "Then nodules of blue earth have to be removed and an opaque crust must be cleaned off, which can be done in revolving barrels containing sand and water."
- Encyclopaedia Britannia, 1911: "The nodules from the 'blue earth' have to be freed from matrix and divested of their opaque crust, which can be done in revolving barrels containing sand and water."
That's plagiarism, and because it's unsourced we don't know who is saying that it's correct.
The only sign that this has happened is the template in the references section: "This article incorporates text from a publication now in the public domain: Chisholm, Hugh, ed. (1911). "Amber (resin)". Encyclopædia Britannica (11th ed.). Cambridge University Press." But it doesn't tell us which sentences have been copied. Maybe it's just that one sentence, or maybe there are others.
If free text is going to be incorporated, it needs sources just as any other text does. SlimVirgin (talk) 07:43, 13 October 2014 (UTC)
- Yes it does and that is what is stated in the guidance: Wikipedia:Plagiarism#Copying material from free sources. In the case of Amber, it was I who made the mistake when I edited the article on 30 April 2011 to include full attribution and inline citations. I missed that paragraph, but it is easily fixed. Using Earwig's Copyvio Detector it is possible to check the whole article, and based on that I have added the necessary inline citations. The place to look at this is problem in detail is in the hidden categories for the various attribution templates. For example for EB1911 in the "Category:Wikipedia articles incorporating a citation from the 1911 Encyclopædia Britannica" there are just over 12,000 articles with a {{EB1911}} template indicating that they incorporate EB1911 text, probably the majority of those still do not have an EB1911 article name (currently 8,320 EB1911 citations do not) which means picking an article at random has a better than fifty fifty chance of NOT meeting the plagiarism guideline. Most of these are a decade or more old and are gradually being brought up to the requirements as laid out in Wikipedia:Plagiarism#Copying material from free sources if the text is copied, and WP:V if article is summarised as if under copyright. The category Category:Wikipedia articles incorporating a citation from the 1911 Encyclopaedia Britannica with Wikisource reference tends to be much closer to what is required. Take for example the first entry under A: Andreas Aagesen (as it is now) and as it was when started in 2005. An article closer to the type your describe that incorporates multiple sources including text from a PD source is Alexander I of Russia and which meets the plagiarism requirements for Wikipedia articles incorporating EB1911 text. Patriarch Acacius of Constantinople is an example of coping and close paraphrasing three merged PD source in one article that meets the plagiarism guideline for text copied from PD sources. -- PBS (talk) 11:31, 13 October 2014 (UTC)
- PBS, I'm sorry for picking an example you had edited. I hadn't intended to do that; I just picked one at random from the As. Also, thanks for the other examples, which I'll look over later. I realize there are legacy issues, but it doesn't seem appropriate to add the copyright/plagiarism confusion to this guideline just because of those legacy issues. SlimVirgin (talk) 16:06, 13 October 2014 (UTC)
- Since I have been pinged: there are many levels of problem with the EB1911 import of 10 years ago or so. The first is that most of them were tagged with a naked {{1911}} which, while accurately reflecting the state of "copy or close paraphrase" doesn't tell me which article it came from. Usually it's obvious, but not usually enough. PBS has used automation to at least categorize those cases. Second, there are about as many articles (a) that have been essentially unaltered except for headings and language modernization; (b) that have had drive-by facts added to them; (c) or that have been significantly updated with carefully written and sourced information added (and some removed). A smaller number (d) are fully distinct articles but with some unattributed claims
orthat can be backed up from EB1911 or sometimes verbatim text. A careful treatment of those will (a) complete the citation with a full {{EB1911}} in the tail and a one-source tag (b) in addition, add a citation-needed to the added claims (c) identify the original text inline with an inline-style ref or (d) make judicious use of {{Cite EB1911}}.
- Since I have been pinged: there are many levels of problem with the EB1911 import of 10 years ago or so. The first is that most of them were tagged with a naked {{1911}} which, while accurately reflecting the state of "copy or close paraphrase" doesn't tell me which article it came from. Usually it's obvious, but not usually enough. PBS has used automation to at least categorize those cases. Second, there are about as many articles (a) that have been essentially unaltered except for headings and language modernization; (b) that have had drive-by facts added to them; (c) or that have been significantly updated with carefully written and sourced information added (and some removed). A smaller number (d) are fully distinct articles but with some unattributed claims
- I've been ending up with a lot of (c), adding inline citations to phrases, sentences, paragraphs (usually) or sections. The problem is that it takes time to match the two sources. Another thing that takes time is copying EB1911 articles' small-print authority footnotes as transitive citations. I've even gone to the length of modifying AWB and writing some apps of my own to help. Meanwhile there are those thousands of articles languishing without even the more shallow edit of expanding the naked {{1911}} template with a fully-populated {{EB1911}} (PBS has handled a heroic number of those). I'd continue with depth if I planned to live for ever, but maybe breadth is more important. David Brooks (talk) 16:46, 13 October 2014 (UTC) (struck stray word David Brooks (talk) 17:58, 13 October 2014 (UTC))
That is OK SV, I have over the years messed around with scores of this type of page and often they come up alphabetically, so there is a reasonable chance that by picking one at random which still has some errors it will in part be down to me (Those 8K without even an article titles are not ones I've looked at). I am more than happy to fix any articles like Amber where you find such problems (although I hope it is a case of "teach a man person to fish...").
As Charles Matthews has pointed out over the years the requirements (I my opinion rightly) of meeting WP:V have risen and the problems that DavidBrooks has highlighted are often to do with legacy articles. That, when they were originally sourced with a general reference, was at the time considered adequate.
How editors fix the plagiarism in those articles that use text from PD sources such as EB1911 is not usually by using in-text attribution, but like the fix to Amber, it is done by placing an attribution prominently in the References section and using short citations in the body of the text to link the text to that prominent attribution. This is how this guideline advises that articles that contain PD text are handled to avoid plagiarism. This is why I think that the current section "Avoiding plagiarism" needs to be altered so that it is clear to inexperienced editors that different advise to avoid plagiarism in PD and copyleft sources is given in this guideline (lower down the page). The reason I think this is important is because if an experienced editor Eric Corbett clearly did not understand this back in 2011, it is likely that inexperienced editors today are going to be confused by the current layout. -- PBS (talk) 19:06, 13 October 2014 (UTC)
- Unfortunately PBS and I have used different approaches. As he says, he adds a complete page-specific attribution in the References section and short citations in the body. I use full in-text attribution ({{EB1911|inline=1...}}), which gets embodied as a footnote containing the full attribution. I adopted that habit maybe out of what I understood as the best convention around last November. His makes the text of the footnote shorter ("Chisholm 1911") compared with my often verbose ones, but mine avoids a double-redirect. I think either approach is valid; both are open and unambiguous; the inconsistency is unfortunate. David Brooks (talk) 19:41, 13 October 2014 (UTC)
- I should add in my defense that the short footnote to me implies a mere citation. It takes the double-redirect (to the footnote, and from there to the full attribution) to see that this is a direct copy. That's why I'm staying with the inline full footnote with prescript until there is a clearer consensus. David Brooks (talk) 02:30, 15 October 2014 (UTC)
For whatever it's worth at this point in the conversation, I agree with the guideline's current formulation on this issue at Wikipedia:Plagiarism#Copying material from free sources. In-text attribution and quotation marks should not generally be required when copying or closely paraphrasing a free source; using attribution templates should be sufficient. Under the current guideline, a great benefit to Wikipedia editors and readers alike is that the use of free sources does not have the same limitations as the use of nonfree sources; if a free source expresses an idea in the best way possible, then there is no need for an editor to spend time (which might be significant) attempting to rewrite it or to disrupt the article's flow with in-text attribution, and the reader will receive the benefit of the better original wording. Furthermore, the current guideline provides an incentive to use free sources, which in turn decreases the likelihood of copyright violations throughout the encyclopedia from the use of nonfree sources. And as a practical matter, if we were to change the guideline now to abolish the distinction between free and nonfree sources, then many current articles, including some FAs, that rely on its current conception would need to be drastically changed to come into compliance. Given the benefits of the current guideline, such time and energy would be better spent improving the encyclopedia elsewhere. –Prototime (talk · contribs) 14:58, 15 October 2014 (UTC)
- No, here's the problem. From Wikipedia:Plagiarism#Copying material from free sources: "For sections or whole articles, add an attribution template; if the text taken does not form the entire article, specifically mention the section requiring attribution...In a way unambiguously indicating exactly what has been copied verbatim, provide an inline citation and/or add your own note in the reference section of the article." Many PD-imported articles - about half, I'd say - either have the PD source at their core, or have it in a clearly segregated chunk, but significant parts of the article are later inserted from other sources, and all too often those insertions lack citation into the bargain. By using a single attribution template in the References section, information about the true nature of the source is lost. Worse, it may imply the entire article is public domain. My habit is to use the generic attribution if the article is substantially identical to the PD original (and I've also taken to adding a comment, a kinda invisible {{one source}}), but to use in-line in the other class. Very occasionally I feel a customized note is right. The problem with the first is that subsequent editors may make later incremental changes without changing the template, and the problem with the second is that it takes so much time. While I agree that "such time and energy would be better spent improving the encyclopedia elsewhere" I'm opposed to half-measures in this fix. When an article is removed from the 1911 verification tables, I expect it to be right. David Brooks (talk) 20:15, 15 October 2014 (UTC)
- I'm not clear on what we're disagreeing about; are you saying that although you presently use attribution templates, you feel the plagiarism guideline should be reformed so that in-text attribution is always required when copying or closely paraphrasing free sources? Either way, I can understand your concern with using a general attribution template in the references section to recognize a free source; doing so does not respect source-text integrity and, as you say, is susceptible to becoming outdated with subsequent changes to the article. I'd be open to reforming the plagiarism guideline to say that using a general attribution template in the references section is unacceptable. However, I don't understand the concern with recognizing a free source using an attribution template that is placed inside of an inline citation, which does preserve source-text integrity. If I understand you correctly, your concern is that this is time-consuming. But I don't see how that's anymore time consuming than the time required to add inline citations in the normal course of writing an article. In accordance with the current guideline, my own habit is to place a {{citation-attribution}} template between the <ref> tags of an inline citation to recognize that a source is in the public domain, and the time it takes to copy and paste that template is negligible; certainly less time then using in-source attribution in addition to an inline citation every time the source is cited. –Prototime (talk · contribs) 20:49, 15 October 2014 (UTC)
- I don't think we disagree here. I'm happy with general attribution when the text is exactly or substantially the same as the source, despite the possible problems with future editing. In practice, we're talking about obscure topics that will likely not be further edited, now that so much time has passed since their creation. And typing the templates isn't what's time-consuming (I use PasteMore in AWB). I was referring to the time to do the "DNA match" on the original vs WP text. Ideally you want to be able to call out additional or, sometimes, missing text. Reading an article twice side-by-side can be both edifying and stultifying. I tried to write a tool to do it but my text match theory isn't up to it. David Brooks (talk) 22:58, 15 October 2014 (UTC)
- @Prototime. (edit clash) David Brooks has mentioned that he does things slightly differently from me (AFAICT) closer to the method that you use. As such there are some disadvantages to that method over the one I use, but they are relatively small, and there are some advantages, it depends on editorial judgement and the use in specific artiles. I reject your assertion that source-text integrity is lost following this guideline, because it is exactly the same method as is used with long and short citations, and no one has raised that as an objection to using short inline citations and long citations in the References section (as described in WP:CITE). An advantage of placing the attribution in the References section is it is still visually clear to a reader that some of the text is copied and it is linked to the long citations (see for example Patriarch Acacius of Constantinople). The disadvantage of placing the attribution into the inline citation is that each time it is used it adds at least 102 bytes to the size of the citation (roughly 1K for every ten citations), and it may look a mess, for example I do not think that that pages like First English Civil War and John Lilburne, would benefit from having each inline citation expanded by using {{citation-attribution}}. However whether your method is followed or the one used recommended in this guideline, both are far superior to the mess to be found the 8,000 articles listed in this hidden category, so the debate is angles on pinheads compared to that. -- PBS (talk) 23:23, 15 October 2014 (UTC)
- @David Brooks perhaps [Duplication Detector] or [Earwig's Copyvio Detector (set to * URL comparison:) could help with that problem. -- PBS (talk) 23:23, 15 October 2014 (UTC)
- @Prototime. (edit clash) David Brooks has mentioned that he does things slightly differently from me (AFAICT) closer to the method that you use. As such there are some disadvantages to that method over the one I use, but they are relatively small, and there are some advantages, it depends on editorial judgement and the use in specific artiles. I reject your assertion that source-text integrity is lost following this guideline, because it is exactly the same method as is used with long and short citations, and no one has raised that as an objection to using short inline citations and long citations in the References section (as described in WP:CITE). An advantage of placing the attribution in the References section is it is still visually clear to a reader that some of the text is copied and it is linked to the long citations (see for example Patriarch Acacius of Constantinople). The disadvantage of placing the attribution into the inline citation is that each time it is used it adds at least 102 bytes to the size of the citation (roughly 1K for every ten citations), and it may look a mess, for example I do not think that that pages like First English Civil War and John Lilburne, would benefit from having each inline citation expanded by using {{citation-attribution}}. However whether your method is followed or the one used recommended in this guideline, both are far superior to the mess to be found the 8,000 articles listed in this hidden category, so the debate is angles on pinheads compared to that. -- PBS (talk) 23:23, 15 October 2014 (UTC)
- I don't think we disagree here. I'm happy with general attribution when the text is exactly or substantially the same as the source, despite the possible problems with future editing. In practice, we're talking about obscure topics that will likely not be further edited, now that so much time has passed since their creation. And typing the templates isn't what's time-consuming (I use PasteMore in AWB). I was referring to the time to do the "DNA match" on the original vs WP text. Ideally you want to be able to call out additional or, sometimes, missing text. Reading an article twice side-by-side can be both edifying and stultifying. I tried to write a tool to do it but my text match theory isn't up to it. David Brooks (talk) 22:58, 15 October 2014 (UTC)
- I'm not clear on what we're disagreeing about; are you saying that although you presently use attribution templates, you feel the plagiarism guideline should be reformed so that in-text attribution is always required when copying or closely paraphrasing free sources? Either way, I can understand your concern with using a general attribution template in the references section to recognize a free source; doing so does not respect source-text integrity and, as you say, is susceptible to becoming outdated with subsequent changes to the article. I'd be open to reforming the plagiarism guideline to say that using a general attribution template in the references section is unacceptable. However, I don't understand the concern with recognizing a free source using an attribution template that is placed inside of an inline citation, which does preserve source-text integrity. If I understand you correctly, your concern is that this is time-consuming. But I don't see how that's anymore time consuming than the time required to add inline citations in the normal course of writing an article. In accordance with the current guideline, my own habit is to place a {{citation-attribution}} template between the <ref> tags of an inline citation to recognize that a source is in the public domain, and the time it takes to copy and paste that template is negligible; certainly less time then using in-source attribution in addition to an inline citation every time the source is cited. –Prototime (talk · contribs) 20:49, 15 October 2014 (UTC)
- @PBS: Yes on all counts, and especially the mote/beam issue. I fixed a typo in John Lilburne for you :-), and two outstanding issues are evident in that very article. 1. We both tend to use the full attribution template, which is a complete sentence beginning with the word "This", and follow it with an introduction of transitive sources that also begins with "This". My English teacher's grave rumbles every time I save one of those. 2. I still don't see a consensus on when to use the Attribution subhead (I've stopped). David Brooks (talk) 18:48, 16 October 2014 (UTC)
- @DavidBrooks: Ah, I understand what you meant now, thanks. –Prototime (talk · contribs) 19:29, 16 October 2014 (UTC)
- @PBS: After looking at Patriarch Acacius of Constantinople, I believe you're correct; source-text integrity is not a concern with general attribution so long as inline citations are also provided. My concern was with using general attribution without using inline citations, which I think we agree is just as unacceptable regardless of whether the source is free or nonfree. The method I use is actually more of a blend of your method and placing {{citation-attribution}} inside every inline citation individually; I generally use named <ref> tags and the {{rp}} template for page numbers, so that even though {{citation-attribution}} is used, it isn't repeated with every cite to to the source (see, for example Voting Rights Act of 1965). But yes, so long as inline citations are used and recognition that the source is free is somehow provided, copying or closely paraphrasing from such source should not be considered plagiarism or a source-text integrity problem. –Prototime (talk · contribs) 19:29, 16 October 2014 (UTC)
- @PBS: Yes on all counts, and especially the mote/beam issue. I fixed a typo in John Lilburne for you :-), and two outstanding issues are evident in that very article. 1. We both tend to use the full attribution template, which is a complete sentence beginning with the word "This", and follow it with an introduction of transitive sources that also begins with "This". My English teacher's grave rumbles every time I save one of those. 2. I still don't see a consensus on when to use the Attribution subhead (I've stopped). David Brooks (talk) 18:48, 16 October 2014 (UTC)
Top note
I have added a top note to the section with this edit which added:
- For avoidance of plagiarism of text copied from compatibly licensed copyleft, and public domain, publications see the section below: Copying material from free sources
-- PBS (talk) 22:22, 19 October 2014 (UTC)
- I have added an "also". The advice in that section does not apply only to non-free, fully reserved content, but can also be used for public domain or copyleft text. --Moonriddengirl (talk) 23:22, 19 October 2014 (UTC)
- The problem with "also" is that there are statements in the section that are clearly not applicable to Copyleft (including internal copies) and PD sources and I think that "also" implies that they all do. The section is basically describing instances of potential plagiarism from copyright sources. For the sake of making descriptions simple let's number the tick boxes 1-7. The premise on all of the wording in this section is that the text has been copied from the cited source, but that may not be so. If the text is from a compatible copyleft source including internal copies then just about all of the examples are misleading. As it happens all the examples are from 2001 with a known author so it is unlikely that they are examples of copyright free text (although they might be). But let us suppose that instead of 2001 the date was 1801 when considering the usage of PD sources. PD sources under the "free sources section" contradicts numbers 2 and 3, but there is also a problem with examples 4-6 because they are misleading (for various reasons but particularly implying that in-text attribution and quotations are desirable). The examples are further complicated because they mix up avoidance of plagiarism and NPOV and what ever the position on plagiarism in-text attribution is necessary because it expresses a specific POV (but I think this last point is an issue for the previous section). -- PBS (talk) 13:36, 20 October 2014 (UTC)
- I disagree with this premise: "there are statements in the section that are clearly not applicable to Copyleft (including internal copies) and PD sources." If you said "not applicable to all", I would agree, but I cannot agree that public domain sources should not receive in-text attribution and quotation simply because they are public domain (or copyleft). Let's say we wanted to discuss, oh, I don't know - Mark Twain's review of James Fenimore Cooper's Leatherstocking Tales. Say "Now I feel sure, deep down in my heart, that Cooper wrote about the poorest English that exists in our language, and that the English of Deerslayer is the very worst that even Cooper ever wrote." What would make it undesirable for us to write (as per 4):
- Mark Twain wrote, "Now I feel sure, deep down in my heart, that Cooper wrote about the poorest English that exists in our language, and that the English of Deerslayer is the very worst that even Cooper ever wrote." (per example 4)
- Mark Twain voiced his certainty that Cooper was no wordsmith, writing "about the poorest English that exists in our language" with Deerslayer singled out as "the very worst that even Cooper ever wrote." (per example 5)
- Mark Twain asserted that Cooper's use of English language in his writing was poor, with Deerslayer the nadir of his writing. (per example 6)
- I disagree with this premise: "there are statements in the section that are clearly not applicable to Copyleft (including internal copies) and PD sources." If you said "not applicable to all", I would agree, but I cannot agree that public domain sources should not receive in-text attribution and quotation simply because they are public domain (or copyleft). Let's say we wanted to discuss, oh, I don't know - Mark Twain's review of James Fenimore Cooper's Leatherstocking Tales. Say "Now I feel sure, deep down in my heart, that Cooper wrote about the poorest English that exists in our language, and that the English of Deerslayer is the very worst that even Cooper ever wrote." What would make it undesirable for us to write (as per 4):
- The problem with "also" is that there are statements in the section that are clearly not applicable to Copyleft (including internal copies) and PD sources and I think that "also" implies that they all do. The section is basically describing instances of potential plagiarism from copyright sources. For the sake of making descriptions simple let's number the tick boxes 1-7. The premise on all of the wording in this section is that the text has been copied from the cited source, but that may not be so. If the text is from a compatible copyleft source including internal copies then just about all of the examples are misleading. As it happens all the examples are from 2001 with a known author so it is unlikely that they are examples of copyright free text (although they might be). But let us suppose that instead of 2001 the date was 1801 when considering the usage of PD sources. PD sources under the "free sources section" contradicts numbers 2 and 3, but there is also a problem with examples 4-6 because they are misleading (for various reasons but particularly implying that in-text attribution and quotations are desirable). The examples are further complicated because they mix up avoidance of plagiarism and NPOV and what ever the position on plagiarism in-text attribution is necessary because it expresses a specific POV (but I think this last point is an issue for the previous section). -- PBS (talk) 13:36, 20 October 2014 (UTC)
- Suppose that the content is taken from a free licensed journal, like PLOS Biology. Where the source is a reliable one and is being used in an article just as any other source, in-text attribution and quotation are hardly undesirable.
- General attribution is acceptable in all cases, but not mandatory in all. It is entirely possible for a person to actually just include a single sentence or two from such a source, and in such cases (where the source is a reliable one and is being used in an article just as any other source) I see no reason for separating it out from copyrighted sources in handling. I believe that the header you placed in the beginning is misleading in implying that the section has no bearing on public domain and copyleft sources. Sometimes it does, and in those cases, plagiarism can indeed be avoided by the recommended treatment.
- In the alternative, a nuanced description of the difference may serve, but I don't think we should be telling people the section does not apply. Sometimes it does. --Moonriddengirl (talk) 14:10, 20 October 2014 (UTC)
- Please forgive me if I'm missing a broader point, but in the examples you provided, wouldn't in-text attribution be required by WP:ATTRIBUTEPOV regardless of how the plagiarism guideline differentiates between free and nonfree sources? And if so, would pointing this out in the guideline help address your concerns? Also, I'm not entirely sure I understand your statement "It is entirely possible for a person to actually just include a single sentence or two from such a source, and in such cases (where the source is a reliable one and is being used in an article just as any other source) I see no reason for separating it out from copyrighted sources in handling." Does mean that you think that, as a general rule, the plagiarism guideline should place the same attribution requirements on the use of free sources as it does on the use of nonfree sources? –Prototime (talk · contribs) 16:48, 20 October 2014 (UTC)
- Yes, but no. It's quite possible to come up with examples that are not POV-based, but - for instance - striking language to describe fact. We have many articles that are taken wholecloth from public domain or compatibly licensed sources, but there are some that simply use them in the same way they use fully reserved sources - to support a sentence or two. In these cases, they can and likely should be handled in the same way as nonfree sources - if they are reliable sources and are being used to support content. If I thought that was the general rule, however, I wouldn't say "see also", I'd say, "Let's do away with the general attribution" option. :) --Moonriddengirl (talk) 17:58, 20 October 2014 (UTC)
- Please forgive me if I'm missing a broader point, but in the examples you provided, wouldn't in-text attribution be required by WP:ATTRIBUTEPOV regardless of how the plagiarism guideline differentiates between free and nonfree sources? And if so, would pointing this out in the guideline help address your concerns? Also, I'm not entirely sure I understand your statement "It is entirely possible for a person to actually just include a single sentence or two from such a source, and in such cases (where the source is a reliable one and is being used in an article just as any other source) I see no reason for separating it out from copyrighted sources in handling." Does mean that you think that, as a general rule, the plagiarism guideline should place the same attribution requirements on the use of free sources as it does on the use of nonfree sources? –Prototime (talk · contribs) 16:48, 20 October 2014 (UTC)
- In the alternative, a nuanced description of the difference may serve, but I don't think we should be telling people the section does not apply. Sometimes it does. --Moonriddengirl (talk) 14:10, 20 October 2014 (UTC)
- @Moonriddengirl Aren't the points you are raising also covered by the text in the section "Copying material from free sources" (For example the paragraph that starts "If the external work is in the public domain, but contains an original idea, or is a primary source ...")? -- PBS (talk) 18:33, 20 October 2014 (UTC)
- PBS, it's included under "public domain", but not at all under "Sources under copyleft", which is the case with articles in PLOS Biology for instance. The language originally included suggests that the recommendations under "Avoiding plagiarism" do not apply to PD or compatibly licensed sources. As it has stood, even through modifications since its introduction in 2010, what is under "Avoiding plagiarism" has been true for all sources, with additional information on other appropriate ways to avoid plagiarism from copyleft and public domain sources below. I think it's fine to point out that those are not the only way to avoid plagiarism, but we don't want to ever imply that they are not ways to avoid plagiarism in free and public domain sources. --Moonriddengirl (talk) 19:13, 20 October 2014 (UTC)
- @Moonriddengirl: Thanks for the clarification. To clarify further, do you mean to say that whether a free source must be attributed in the same way as a nonfree source should depend on the quantity of material duplicated from the free source? That if an article is taken "wholecloth" from a free source, general attribution is fine, but if only a "sentence or two" is taken from a free source, the same attribution requirements should apply as if it were a nonfree source? If so, I'm not sure why the quantity of material from a free source should dictate what type of attribution is necessary for it to not be considered plagiarism. Could you elaborate on that reasoning some more? I'm also not convinced that this would be a workable standard; what if it's less than wholecloth but more than a sentence or two--say a paragraph, or several paragraphs, or numerous sentences scattered throughout several paragraphs--how would the lines be drawn? –Prototime (talk · contribs) 19:18, 20 October 2014 (UTC)
- @Moonriddengirl Aren't the points you are raising also covered by the text in the section "Copying material from free sources" (For example the paragraph that starts "If the external work is in the public domain, but contains an original idea, or is a primary source ...")? -- PBS (talk) 18:33, 20 October 2014 (UTC)
User:Prototime, it is possible and permitted by this guideline to attribute generally (where appropriate) or intext according to how material is used. PBS notes above some language in the public domain section, for instance, that has for many years now more explicitly spelled this out. If a public domain or compatibly licensed source is being replicated wholecloth, it's being mined for content, not (generally) cited as a source. So, one might not say, "According to the DNB" and drop the entire article from the DNB. In that case, the general attribution template works fine. If one is writing a nuanced article on a subject and citing the DNB, which (for example) includes a fact that contradicts another source, you might very well say, "According to the DNB...." and give that fact. This technique is appropriate and true regardless of the origin of the text. I am not proposing a change to the guideline. I am simply opposed to adding confusion over what the guideline has said for years. Plagiarism can be avoided by these techniques (as listed in the top section) or, if the source is compatibly licensed or public domain, by the alternate techniques given below. --Moonriddengirl (talk) 19:27, 20 October 2014 (UTC)
- So, when copying or closely paraphrasing material from a free source, regardless of the quantity that is used, the editor has the choice of using either the attribution techniques listed in the "Avoiding plagiarism" section or the "Copying material from free sources section"? That's been my understanding, and I just wanted to make sure we're on the same page on what the guideline says. Thanks. :) –Prototime (talk · contribs) 19:51, 20 October 2014 (UTC)
- Basically, although there's always nuance which I'll go into in a moment, that has always been my understanding of the guideline, Prototime, and it makes sense to me. :) If plagiarism "is presenting someone else's work as your own, including their language and ideas, without providing adequate credit," then any method of providing adequate credit can avoid plagiarism, so long as it is appropriate to the use of the content in question. Appropriate to the use is key here, though. So, for instance, if the content is from a source that isn't reliable, you might mine it for content but would not generally site it. You can copy content from another Wikipedia article or Wikia, but most of the time won't use the quotation and citation method for that. The only exception would be if Wikipedia (or Wikia) were reliable in that context - for instance, if we are discussing the Wikipedia article that was the source of the content itself. --Moonriddengirl (talk) 10:06, 21 October 2014 (UTC)
Break
I'm concerned that this page is turning into one about copyright. Plagiarism and copyright are separate issues. If we insert word-for-word text from another author without signalling that we've done it (again using the Amber from above):
The Vienna amber factories, which use pale amber to manufacture pipes and other smoking tools, turn it on a lathe and polish it with whitening and water or with rotten stone and oil. The final luster is given by friction with flannel.[1]
... it's plagiarism whether copied from the EB 1911 or 2014. Adding the EB 1911 as a source means that we have to paraphrase that source, or if copying word-for-word acknowledge the source in-text. (And use other sources for other points of view.)
When you copy these free texts, you are using the EB 1911, not as a source, but as an author. But then you cite it in a footnote as if it were merely a source. It's a category mistake. SlimVirgin (talk) 17:16, 21 October 2014 (UTC)
- I don't think this discussion is confusing plagiarism with copyright. Under the current plagiarism guideline, the copyright status of a source has an impact on what we consider plagiarism; unlike nonfree sources, free sources can be attributed per Wikipedia:Plagiarism#Copying material from free sources to avoid plagiarism, which is a laxer standard than is required for nonfree sources (in that in-text attribution generally is not required), but it's a standard that still requires more attribution than is required by copyright law for free sources. For the reasons I've stated above, I don't favor changing the plagiarism guideline to require in-text attribution for material that is copied or closely paraphrased from free sources. –Prototime (talk · contribs) 18:08, 21 October 2014 (UTC)
SV in the article Amber there are two parts to the citation to the source, a short part in the footnote, and a long part in the the references section. The two parts together make up the full citation and the full citation is clearly marked with an attribution statement:
- Notes
- ^ Rudler 1911, p. 793.
- References
- public domain: Rudler, Frederick William (1911). "Amber (resin)". In Chisholm, Hugh (ed.). Encyclopædia Britannica. Vol. 1 (11th ed.). Cambridge University Press. pp. 792–794. This article incorporates text from a publication now in the
Adding a statement to the citation that "This article incorporates text from a publication now in the public domain:" means that we are acknowledging the copy of the text in the article. That means that there is no attempt by Wikipedia editors to hide where some of the text has come from, therefore it is not plagiarism. In the same way if text is copied from one Wikipedia article to another the copyleft requirements are met by adding attribution to the history of the edits to the page, and when other text is copied under copyleft licensing that too is done under the licensing requirements, and whether or not the licensing requirements request it attribution is also added to the References section at the bottom of the page.
As a group of users (editors), I think that we deserve a pat on the back for the influence Wikipedia has had on this whole area of free information distribution, for example see the British National Archives licensing policy Open Government Licence v2.0. That licence says "acknowledge the source of the Information by including any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence" and that "These terms are compatible with the Creative Commons Attribution License 4.0 and the Open Data Commons Attribution License, both of which license copyright and database rights". -- PBS (talk) 00:01, 23 October 2014 (UTC)
Location of "Copying material from free sources" section
Earlier I was looking for the template to attribute material that was copied as permitted under a Creative Commons license. I eventually found it ({{CC-notice}}), but I never would have thought to look for the description on the Plagiarism page. Is there any way we can make this section easier to find?
I thought that perhaps we could factor this section out to a separate article. Then we could refer to it as appropriate here and on various other pages, such as Wikipedia:Attribution and perhaps Wikipedia:Citing sources. The references could be a short summary paragraph + a link, or a hatnote, or a See also, or whatever. And also add it to any categories that might be relevant. – Margin1522 (talk) 03:36, 22 January 2015 (UTC)
- Great idea. --Hroðulf (or Hrothulf) (Talk) 11:17, 22 January 2015 (UTC)
- It can not go on the Wikipedia:Attribution that is a "essay"/failed attempt to merge NOR and verification. It ought not to go in Wikipedia:Citing sources as that how to cite sources, and does not include things like the attribution referred to in this guideline. If you are going to mention {{CC-notice}} what about all the other types of licence we have which is referred to on this page? Perhaps a section in Wikipedia:Citing sources summarising this guideline and a mention that there are suitable templates for avoidance of some forms of plagiarism. -- PBS (talk) 11:35, 22 January 2015 (UTC)
Avoiding plagiarism - facts
I find the examples in the section Wikipedia:Plagiarism#Avoiding plagiarism exactly the way I expected them to be, but the problem is that those examples are too easy. It's just natural to quote and to mention the author name when we copy someone's opinion or declaration (even more so when they are on socio-political topics). Or to quote hard to believe opinions (like Evenimentul Zilei states: "On 28 september 1993, in Pașcani, Iași county, a mirracle took place: a hen gave birth to two chichen"). But it doesn't help me to understand how to do it when I copy (complying with non-free content policy) dry information like facts or events. Examples:
- The Austrian-based group holds one manufacturing facility in Romania, at Cluj-Napoca (Sanex), after it shut down production units in Lugoj and Bucharest in 2008 and June 2009 respectively. (source).
- Birmingham-based Wagon employs over 4,000 workers across Europe. source
Do I have to mention the author name and publication name in the text? I am going with the "no change in text" version.
Do I have to do it like this?
- Xander Popescu of Wall-Street.ro writes: "The Austrian-based group holds one manufacturing facility in Romania, at Cluj-Napoca (Sanex), after it shut down production units in Lugoj and Bucharest in 2008 and June 2009 respectively.'"[1]
and
- Rămurel Connor of BBC writes: "Birmingham-based Wagon employs over 4,000 workers across Europe".[2]
To me, this looks completely unnatural. Copying such facts (complying with the policies) should not require quotes and author/newspaper name. The reference should be quite enough in such cases.
References
- ^ Lasselsberger Romania faces criminal charges, 8 December 2009, Xander Popescu, Wall-Street.ro, retrieved at 5 February 2010
- ^ Wagon Automotive cutting 292 jobs, 19 December 2008, Rămurel Connor, BBC
Thanks. — Ark25 (talk) 02:39, 1 July 2015 (UTC)
- Great question. No, you don't copy verbatim when you use bare facts. Every bare fact that is more than a couple of words can be re-written in your own words, so a verbatim quote doesn't pass our fair use test (and it is unnatural, as you said.) Instead you could write the following, including the citations in footnotes.
- Lasselsberger, based in Austria, closed two out of its three Romanian factories (Lugoj in 2008 and Bucharest in June 2009), leaving only Cluj-Napoca (Sanex).
- Wagon is based in Birmingham, England. Its European operations have more than 4,000 employees.
- The exception is public domain or other compatibly-licensed text, where you can rewrite as little or as much as you like with a brief attribution - details later in the page.
- --Hroðulf (or Hrothulf) (Talk) 13:15, 1 July 2015 (UTC)
- To me, what is unnatural is to mention the author and the publication name (in-text attribution) in such context. When quoting opinions, declarations, scientific studies, hard do believe statements or hard to verify facts (e.g.: In Geographica, Strabo writes: "The original name of the Dacians was Δάοι "Daoi""), it is natural to use quotes and in-text attribution, not because the law requires it but simply because that's how it makes sense to write such things. So my first question is: why is this guide asking for quotes and in-text attribution even there is an inline citation? Is the law asking for it?
- Now, when I rephrase such factual statements describing events or dry facts (Wagon employs over 4,000 workers), I see it a bit dishonest to rephrase. When I translate a text — I am copying it — even if I put it in other words. The same applies here. If I cut because I only need a part of the text, that's ok, rephrasing is natural there - many times (not always, I admit) you have to rephrase such cropped text in order to give it a good wording. Or if I put my own interpretation then rephrasing is even more required. But if I want to make use of all the text, and I am rephrasing it, then I am being a bit dishonest because what I am doing in fact is to copy the entire information given there and just using other words, just like in a translation. The translation is basically "saying the same things, using other words" and it is copying, and the same should apply with rephrasing an entire sequence. To me, rephrasing in such cases is "trying to make it look like I am not copying while in fact I am copying".
- The last green examples using paraphrases can't be applied to newspaper reports (imo). It is quite hilarious to say "Rămurel Jackson-Mămăligă of BBC suggests/writes that "Birmingham-based Wagon employs over 4,000 workers across Europe".
- And I think the last example is not even correct, because I don't think it's ok to generalize someone's opinion to an extent he clearly didn't refer to (he refers to a specific kind of political changes, and I am putting my own opinion, generalizing it to any political change). I would rather copy an opinion than "enriching" Wikipedia with my own interpretation (which is in fact my opinion) of what someone else says. So I think that example should be rewritten.
- Sorry to make my message this long but I want to mention that in Romania, the Romanian Council of Press, (RCP) which represents the journalists and media outlets in the country made a request to the media not to copy more than 500 characters (spaces not included) and to make sure to specify the source when they do it. There are many Romanian laws for copyrights but I'm not sure if any of them states how many words/characters you can copy without breaking copyright. And then, even if this is not a law for this but it's just a regulation, the RCP request is used as a rule everywhere in Romania. Nobody ever tried to challenge the 500 word limit and nobody sued someone for copying under 500 characters. So there is no need to rephrase the wording of such texts because you can copy them, unles you want to be extra-zealous into applying a (very unclear) law. Is there any fair rule or widely-accepted convention like this in the English speaking countries?
- Anyways, I think the examples there should include such an example like adding a daily life information, since they are the bulk of the Wikipedia edits, much more abundant than opinions/declarations, therefore making the guide really useful. I didn't even needed this guide to know that I have to quote opinions. An editor needs the guide for knowing how to deal with more common cases where they are not really clear how to do it. — Ark25 (talk) 00:28, 2 July 2015 (UTC)
- User:Ark25, plagiarism is not a legal matter. Plagiarism and copyright are very different issues. There is no rule in the United States - the country that governs Wikipedia - about how many words you can copy from a source. That determination is made by assessing multiple factors. The rule on Wikipedia, as set out at WP:NFC, as that you must put all content you copy from your sources in quotation marks, and you can only use brief excerpts for good reasons - in order to comply with the US law on transformation. Even if you find it dishonest, you need to put the bulk of content you place on Wikipedia in your own words. This is because under U.S. law, the information is not protected, but the way it is expressed is. --Moonriddengirl (talk) 02:01, 2 July 2015 (UTC)
- Great answer, thanks! Does the US law apply for information presented by the Romanian newspapers (those based in Romania, not Romanian-language newspapers based in US) too? If the information is not protected, then can I use in my edits (the information contained in) an entire newspaper article or book by rephrasing or by translating it? — Ark25 (talk) 07:43, 2 July 2015 (UTC)
- User:Ark25, yes, it does. Wikipedia:Non-U.S. copyrights talks a little bit about the complexity here. The document you link, I'm afraid, doesn't have the weight of law. It is a statement of belief by the Consiluil de Onoare that 500 words is a reasonable limit (provided that does not constitute more than half), and the Consiluil de Onoare is not the copyright owner of all press material in Romania, so their opinion is not binding. We would need actual legislation verifying that Romanian news sources were free to that limit to use content. Romanian news stories should, until we have that, be treated just as English news sources and paraphrased or briefly quoted, I'm afraid. --Moonriddengirl (talk) 13:36, 2 July 2015 (UTC)
- Thanks again. So if I am not quoting and and not using in-text attribution, then I can not copy more than two words, do I understand it right? (Hroðulf sugggests that more than a couple of words should be re-written)
- When quoting and and using in-text attribution, how many words or characters I can copy? Is there such a limit?
- How about my other question, about using the information from an entire newspaper article or book information by rephrasing or by translating?
- Is there any program that can help me to detect parts of (Romanian) Wikipedia articles that are identical with parts of online newspaper articles?
- Thanks. — Ark25 (talk) 18:28, 2 July 2015 (UTC)
- User:Ark25, yes, it does. Wikipedia:Non-U.S. copyrights talks a little bit about the complexity here. The document you link, I'm afraid, doesn't have the weight of law. It is a statement of belief by the Consiluil de Onoare that 500 words is a reasonable limit (provided that does not constitute more than half), and the Consiluil de Onoare is not the copyright owner of all press material in Romania, so their opinion is not binding. We would need actual legislation verifying that Romanian news sources were free to that limit to use content. Romanian news stories should, until we have that, be treated just as English news sources and paraphrased or briefly quoted, I'm afraid. --Moonriddengirl (talk) 13:36, 2 July 2015 (UTC)
- Great answer, thanks! Does the US law apply for information presented by the Romanian newspapers (those based in Romania, not Romanian-language newspapers based in US) too? If the information is not protected, then can I use in my edits (the information contained in) an entire newspaper article or book by rephrasing or by translating it? — Ark25 (talk) 07:43, 2 July 2015 (UTC)
- User:Ark25, plagiarism is not a legal matter. Plagiarism and copyright are very different issues. There is no rule in the United States - the country that governs Wikipedia - about how many words you can copy from a source. That determination is made by assessing multiple factors. The rule on Wikipedia, as set out at WP:NFC, as that you must put all content you copy from your sources in quotation marks, and you can only use brief excerpts for good reasons - in order to comply with the US law on transformation. Even if you find it dishonest, you need to put the bulk of content you place on Wikipedia in your own words. This is because under U.S. law, the information is not protected, but the way it is expressed is. --Moonriddengirl (talk) 02:01, 2 July 2015 (UTC)
- Anyways, I think the examples there should include such an example like adding a daily life information, since they are the bulk of the Wikipedia edits, much more abundant than opinions/declarations, therefore making the guide really useful. I didn't even needed this guide to know that I have to quote opinions. An editor needs the guide for knowing how to deal with more common cases where they are not really clear how to do it. — Ark25 (talk) 00:28, 2 July 2015 (UTC)
Complete article plagiarism of public domain
- I understand the swing to being allowed to copy/paste text from public domain to articles.
- 1)- Are we moving to a direction that entire articles can be exactly copied, verbatim with exceptions of adding sections and subsections, as long as there is attribution? USS Valley City (1859) is an example from this source, attributed in the reference section: "This article incorporates text from the public domain Dictionary of American Naval Fighting Ships. The entry can be found here (it is a [dead link ]), and I have found many others.
- 2)-If this is acceptable does there need to be a direct link to the source or does a blanket reference suffice?
- I ask these questions so I will know to start avoiding all these types of articles altogether. Otr500 (talk) 09:17, 7 September 2015 (UTC)
- The answer to #1 is that we have always been there; the projects to pour PD reference works into WP were started more than 10 years ago. I'm particularly familiar with Encyclopaedia Britannica Eleventh Edition (aka EB1911) as a source. Certainly there are hundreds, perhaps thousands, of articles that are verbatim copies of EB1911, with no additional material (other than standard metadata). I'm sure that's true of other PD works, like the British Dictionary of National Biography. After all, sometimes there's nothing else that anyone can say about that obscure 17th Century French poet.
- On #2, public domain inclusion projects have their own conventions. For example, the {{EB1911}} template is used to indicate that the article is wholly or substantially a copy, and its parameters allow you to point to a Wikisource article, or to an external URL (e.g. on archive.org). If the WP article includes other text interleaved with the EB1911 text, we can use the same template with the inline parameter. The initial mass copy did not include explicit references (this was before WP began demanding references to primary sources), and one day that will be fixed; see Category:Wikipedia articles incorporating a citation from the 1911 Encyclopaedia Britannica with no article parameter. I believe all the above is true of other projects that have templates subclassed from {{Cite encyclopedia}}. David Brooks (talk) 00:08, 8 September 2015 (UTC)
- As well as copyright expired sources that David has mentioned, there is a whole category of templates for Public domain sources see Category:Attribution templates these include templates to US government sources that are in the public domain. A recent development has been the move by Commonwealth countries to publish documents under the same or similar licences to those used by Wikipeia. See for example the British Open Government Licence text "These terms are compatible with the Creative Commons Attribution License 4.0 and the Open Data Commons Attribution License, both of which license copyright and database rights". This change of heart by the British Government has been directly influenced by Wikipeida ("Open Government Partnership: UK national action plan 2015 launch" ) :
- We still have an Encyclopaedia Britannica approach to government. Too much policy making is still done by well-intentioned people in Whitehall sitting in a room, thinking very hard about how to solve a problem. It’s expensive, cumbersome, dates quickly and the citizen is a bystander.
- We need to move to a Wikipedia world. That means more collaboration on policy design, recognising that knowledge and evidence is widely dispersed throughout society not locked away in Whitehall.
- —Minister for Cabinet Office Matt Hancock
- So the British Government for one has licensed it text in such a way as to make it possible for Wikipedia to import it. If I was working for a commercial company I would not be paying "consultants" to sneak my text into a Wikipedia article, I would put my web pages under the "CC BY-SA 3.0 License" and then there would be a good change that some Wikiepdia editor would cut and past the text into the appropriate article with no cost to the my company :-O -- PBS (talk) 14:20, 8 September 2015 (UTC)
- Thanks; While I understand that making an entire article from imported public domain information creates no legal issues I do not like it. When an article incorporates both imported public domain content and referenced content, that is not properly tagged, it makes it hard to tell when original research is included. Otr500 (talk) 05:16, 10 September 2015 (UTC)
- I don't think either PBS or I suggested an article should not be properly tagged. But the fact is that many articles aren't, and you can help Wikipedia by finding them and either providing a citation or adding a {{citation needed}}. Again returning to EB1911: if the entire article is a copy, then a general reference suffices. If it intermixes EB1911 with other text, then there should be footnote references for both in appropriate places. There are gray areas: often I find an article that is EB1911 with a single added sentence, and in that case I use a general reference along with a specific one for the interloper. And sometimes some EB1911 text is already footnoted with a different citation; either the other source copied EB1911, or they were written by the same author with a lax approach to copyright at the time, or it's just the case that there is really only one way of stating a particular fact. David Brooks (talk) 06:27, 10 September 2015 (UTC)
- Both David and I do a lot of work on text from old sources that have not been changed by advances in science of scholarly research. For example it is unlikely that the DNB biography on a what is today a relatively obscure puritan divine like Richard Capel (1586–1656), will be much changed by more recent histories.
- As they say on cooking programmes "Here is one I prepared earlier". In this case I added DNB text to an existing article to flesh it out. David and I differer slightly how we cite articles. He tends to put the attribution in-line while I put it in the References section (both are acceptable). I do not use general references any more, but always use in-line citations (usually short ones, but I go with the flow and use long ones if others have used them consistently). @David sometimes when an article has no citations or just the {{1911}} general reference (without article name or volume number), people who do not realise that the text comes from EB1911 my find other sources to support the same facts even if their wording is different (actually they are often the same when the cited fact is a well known quote by the subject of the biography and all quotations have to have citations to support them (WP:V)). In which case I usually leave the in-line citation in place (particularly if the source is significantly newer than 1911), but add the EB1911 to it to indicate that one or more preceding sentences are copied from that source. -- PBS (talk) 09:42, 10 September 2015 (UTC)
- I don't think either PBS or I suggested an article should not be properly tagged. But the fact is that many articles aren't, and you can help Wikipedia by finding them and either providing a citation or adding a {{citation needed}}. Again returning to EB1911: if the entire article is a copy, then a general reference suffices. If it intermixes EB1911 with other text, then there should be footnote references for both in appropriate places. There are gray areas: often I find an article that is EB1911 with a single added sentence, and in that case I use a general reference along with a specific one for the interloper. And sometimes some EB1911 text is already footnoted with a different citation; either the other source copied EB1911, or they were written by the same author with a lax approach to copyright at the time, or it's just the case that there is really only one way of stating a particular fact. David Brooks (talk) 06:27, 10 September 2015 (UTC)
- To be clear, "tends to put the attribution in-line" still means that the citation appears in the reference section, because the inline citation is within <ref>...</ref>. Also, I find cases where an article contains both a literal copy and a supportable fact; in that case I use short citations for the latter. Here's one I prepared earlier (and, @PBS:, I think the cooking programs are themselves referring to the original usage in Blue Peter ). Otherwise, I take your point about people finding other sources, and thank them for it. David Brooks (talk) 02:34, 11 September 2015 (UTC)
When a large section is copied from a PD article, can we add fact tags?
I'd like to see a statement on the appropriate policy or guideline saying that we can. Years ago I tried to add fact tags to some dubious material and they were removed with the statement that it was from a PD source (not the EB, perhaps the Jewish Enchclopedia). Now that's ridiculous - if something's a century out of date - or just plain dubious - we should be able to add fact tags. Doug Weller (talk) 16:05, 10 September 2015 (UTC)
- @Doug Weller, you probably know this but for the benefit of those reading this section who do no know: the {{fact}} template is now a redirect to {{citation needed}}. If a PD or the copyleft source does not meet the requirements of WP:V then of course all the facts can be dabed with a request for citations over and above a simple text attribution. However if the sentence is taken from a PD source that meets the requirements of WP:V and the paragraph or sentence carries an inline citation to support that fact that you are concerned about then I think {{citation needed}} is inappropriate. But there are other tags that are more suitable. For example for inappropriate text:
- {{dubious}}
- {{clarify}}
- {{POV-statement}}
- {{by whom}}
- There is a list of these type of alternative templates in the documentation of {{citation needed}}.
- If you think that the PD source is out of date then you might consider tagging the source itself with {{Better source}}, but that template is usually used for websites that do not meet the requirements of WP:SOURCE (for example because they are self published), but have been found by experienced editors to contain accurate information for example see {{Rayment}}. I personally would remove the {{Better source}} template from citation to {{cite EB1911}} and {{cite Jewish Encyclopedia}}. There are also some other specialised options such as {{Update-EB}} that might in the case of science and geography subjects may be more useful (it populates Category:1911 Britannica articles needing updates). There does not seem to be the equivalent for{{cite Jewish Encyclopedia}}, but there is no reason why such a template and category can not be created.
- -- PBS (talk) 17:22, 11 September 2015 (UTC)
- Thanks. I was being lazy writing fact tags, sorry. I don't think I've ever even used one. I would like to see some guidance we can point to so that no one can simply reverse a tag by saying it's already cited, probably using some of your useful suggestions. Doug Weller (talk) 17:27, 11 September 2015 (UTC)
- This actually clarified a lot for me. Thanks, Otr500 (talk) 20:52, 11 September 2015 (UTC)
- Thanks. I was being lazy writing fact tags, sorry. I don't think I've ever even used one. I would like to see some guidance we can point to so that no one can simply reverse a tag by saying it's already cited, probably using some of your useful suggestions. Doug Weller (talk) 17:27, 11 September 2015 (UTC)
- @PBS:, as you often say, some material copied from century-old texts is still accurate (a sufficiently complete biography of a dead poet so long as it doesn't contain contemporaneous critical commentary, for example). Some other material is probably out of date and we want to encourage editors to modernize it by adding a tag. When you say "there is no reason why such a template and category can not be created", do you mean a generic template? I would prefer that to a set of domain-specific ones like {{Update-EB}}. David Brooks (talk) 05:15, 12 September 2015 (UTC)
- I was suggesting specific ones. There is no reason why there should not be a generalised one, providing it can be worded in such a way that it is targeted (by using prameters?). I notice that the editor who created {{Update-EB}} was yourself. I also notice that there is quite a backlog of articles in Category:1911 Britannica articles needing updates and it is quite a large template to place on the top of an article if it is going to be there for months or years. It really depends on whether it is primarily a warning template for readers or a request for maintenance between editors. I suggest that if the latter the talk page is the more appropriative location for such a template. However this talk page is for plagiarism issues and we are getting off topic now. -- PBS (talk) 15:17, 12 September 2015 (UTC) [Taking this to talk pages. David Brooks (talk) 17:29, 15 September 2015 (UTC)]
- I do not like plagiarism at all for several reasons. When information from free content is mixed with supposed referenced material it just creates another level of potential problems. I realized I hinted that I would stay away from these types but I guess I am hard-headed. To that end you can take your pick of one, two, or all three, between copy/paste of free content, plagiarism, and whatever combination you would like to mix that might make it WP:Copyvio, or I could have just missed something (I would hope). This would be USS Charles R. Ware (DD-865) and the talk page if someone would care to take a look. Otr500 (talk) 20:20, 14 September 2015 (UTC)
- I was suggesting specific ones. There is no reason why there should not be a generalised one, providing it can be worded in such a way that it is targeted (by using prameters?). I notice that the editor who created {{Update-EB}} was yourself. I also notice that there is quite a backlog of articles in Category:1911 Britannica articles needing updates and it is quite a large template to place on the top of an article if it is going to be there for months or years. It really depends on whether it is primarily a warning template for readers or a request for maintenance between editors. I suggest that if the latter the talk page is the more appropriative location for such a template. However this talk page is for plagiarism issues and we are getting off topic now. -- PBS (talk) 15:17, 12 September 2015 (UTC) [Taking this to talk pages. David Brooks (talk) 17:29, 15 September 2015 (UTC)]
- @PBS:, as you often say, some material copied from century-old texts is still accurate (a sufficiently complete biography of a dead poet so long as it doesn't contain contemporaneous critical commentary, for example). Some other material is probably out of date and we want to encourage editors to modernize it by adding a tag. When you say "there is no reason why such a template and category can not be created", do you mean a generic template? I would prefer that to a set of domain-specific ones like {{Update-EB}}. David Brooks (talk) 05:15, 12 September 2015 (UTC)
Anyone want to help check a case?
Please see this one. Possibly just a website which copied from WP, but some feeling by User:Obenritter that it might not be.--Andrew Lancaster (talk) 08:33, 18 September 2015 (UTC)
Misappropriated Wars of Cyrus the Great
Please see the Wars of Cyrus the Great, where earlier in the day were found large blocks of text taken verbatim from an 1881 online text—situation at first resolved by converting two sections to long quotes, though this makes the section content based on historiography 130 years old. On further review, I found a paragraph taken all but verbatim from a recent 2012 scholarly text. Given the remaining large blocks of text with few or no sources, it is likely that the rest of the article will be similarly unmasked. In short, the article should be pulled as a plagiarised piece. Copyvio tags were used to mark content, but this seems a misuse of these tags. How does one show that Wikipedia is taking this seriously? Le Prof Leprof 7272 (talk) 00:33, 18 October 2015 (UTC)
- Looking beyond this one article, it looks as if all of the Cyrus military history material is similarly suspect, to one degree or another. Leprof 7272 (talk) 00:34, 18 October 2015 (UTC)7
- Postscript, the reasons the citations look in good shape in the Wars article—a deception, as one would see if looking to the edit history, to the article's state before today—is that the existing citations, while being checked, were also completed (i.e., not left without date, author, publisher, title, URL, etc.). They appear "clean," not because the article is clean, but because they were made clean in determining the article was "dirty." Leprof 7272 (talk) 00:38, 18 October 2015 (UTC)
Leprof 7272, thank you for your note. I don't know off the top of my head if there's a template for that. Either way, what matters is that we fix the problem. Copyvios need to be removed; it's as simple as that. Plagiarism needs to be fixed through editing. Now, the article was created by History of Persia, and I'm curious to see what they have to say. Doug Weller, consider this your usual courtesy ping. Drmies (talk) 00:53, 18 October 2015 (UTC)
- I am only peripherally interested in the specific issue,
more deeply interested in the general.Strike throughs are not meant in disrespect, only to move the general discussion — I am separating and duplicating content so as to separate the specific from the general discussion. Specific discussion of the issues with the Wars article to continue here. Le Prof Leprof 7272 (talk) 01:04, 18 October 2015 (UTC)
Please advise
Is there no specific template message to tag for plagiarism? Does one need to start a separate Wikipedia plagiarism website to have these matters taken seriously? Le Prof Leprof 7272 (talk) 01:04, 18 October 2015 (UTC)
- Leprof 7272, thank you for your note. I don't know off the top of my head if there's a template for that. Either way, what matters is that we fix the problem. Copyvios need to be removed; it's as simple as that. Plagiarism needs to be fixed through editing. Drmies (talk) 00:53, 18 October 2015 (UTC)]
- Glad to have you onboard Drmies, Doug. I am only peripherally interested in the specific issue, more deeply interested in the general. Le Prof Leprof 7272 (talk) 01:04, 18 October 2015 (UTC)
- Plagiarism is fixed via removal or attribution. I have generally acknowledge the copying of the identified source at Wars of Cyrus. Ideally, individually sentences and passages would be flagged. If we know the source, we use templates like {{pd-old}} (for the article) or {{Citation-attribution}} (for sentences/passages). --Moonriddengirl (talk) 01:32, 18 October 2015 (UTC)
- Glad to have you onboard Drmies, Doug. I am only peripherally interested in the specific issue, more deeply interested in the general. Le Prof Leprof 7272 (talk) 01:04, 18 October 2015 (UTC)
Continuation of specifics after engagement on templates
- Thanks for engaging Mooridden. While I was suspicious, early, the 1881 misappropriations — use verbatim without quotation marks, but with a single appearance of the citation — could be a training issue. But with the appearance of the wholesale copying of sentences from the introduction of the 2012 boo, it appears there is a clear disregard for conventions of intellectual appropriation, regardless of in/out of copyright. I think the whole article should be "pulled." Who has time to do plagiarism checks, paragraph by paragraph? Three points are a trend. How long does one wait before calling a spade a spade? Le Prof Leprof 7272 (talk) 01:38, 18 October 2015 (UTC)
- A 2012 source is a concern. Can you tell me which section? I haven't had a chance to fully scan your notes yet, so apologies if you've already identified this. I'll keep looking in case you have. --Moonriddengirl (talk) 01:39, 18 October 2015 (UTC)
- See this diff, and forgive me for the in article temporary editorializing: [1]. Le Prof. Leprof 7272 (talk) 01:57, 18 October 2015 (UTC)
- Thank you. That's very likely going to wind up another case of unattributed CWW, but I'm still working to figure that out. Before the 2012 book, it was published in this blog, in 2010 I'm looking at the history of Cyrus the Great. where the content is also published, to see if I can figure out when it first entered the project. --Moonriddengirl (talk) 02:16, 18 October 2015 (UTC)
- See this diff, and forgive me for the in article temporary editorializing: [1]. Le Prof. Leprof 7272 (talk) 01:57, 18 October 2015 (UTC)
- A 2012 source is a concern. Can you tell me which section? I haven't had a chance to fully scan your notes yet, so apologies if you've already identified this. I'll keep looking in case you have. --Moonriddengirl (talk) 01:39, 18 October 2015 (UTC)
- I haven't done a plagiarism check; Leprof is suggesting I think that much straight-up copying has indeed taken place. There are footnotes, but if content is copied without quotation marks and such signals, we're still dealing with plagiarism--at least in the sense in which I treat plagiarism in class. I've warned History of Persia before about plagiarizing (their copying of sections from other Wikipedia articles) without proper attribution, and their talk page is full of similar messages. There's a pretty clear warning from Abecedare for instance in early September, then again from Doug Weller on 11 September, one from Favonian three days later, and Spartaz on 6 October. Moonriddengirl, I don't know how we could have been more clear to the editor and I wonder if it's not time to block. I don't believe in "attention-getting blocks", but perhaps a block is in order to prevent further problems--until they acknowledge and pledge to start repairing. Never mind the edit warring they've engaged in... Drmies (talk) 01:40, 18 October 2015 (UTC)
- Clearly there's background here that I don't know, but some of those warnings should probably not really count towards current sanctions. His 6 October warning was for a 22 August edit, predating all the warnings you mention above. It does look like he's continued WP:CWW, which I will block for when needed, and if he's copied a 2012 source in the last few days, we've definitely got ongoing issues. Plagiarism is a different matter, though. Has he been told how to attribute PD text? I mean, all of this could be amounting to a pattern, but I'm just arriving here. :) --Moonriddengirl (talk) 01:49, 18 October 2015 (UTC)
- (Just to note, I'm scrolling backwards on the talk page now. User:Leprof 7272, which 2012 book was copied? --Moonriddengirl (talk) 01:51, 18 October 2015 (UTC))
- Thanks for engaging Mooridden. While I was suspicious, early, the 1881 misappropriations — use verbatim without quotation marks, but with a single appearance of the citation — could be a training issue. But with the appearance of the wholesale copying of sentences from the introduction of the 2012 boo, it appears there is a clear disregard for conventions of intellectual appropriation, regardless of in/out of copyright. I think the whole article should be "pulled." Who has time to do plagiarism checks, paragraph by paragraph? Three points are a trend. How long does one wait before calling a spade a spade? Le Prof Leprof 7272 (talk) 01:38, 18 October 2015 (UTC)
Okay, so it looks like his first notice of copyright policy was 12 July. That bot edit does imply copying from PD sources is okay, but makes plain that copying from other sources is not. Still looking. --Moonriddengirl (talk) 01:53, 18 October 2015 (UTC)
- After his 24 July block for copyvios, he did more copy-pasting in the now deleted history of User:History of Persia/Pasargad on 12 August. Haven't checked for compatible licensing of source yet. Still looking to see what else is going on. --Moonriddengirl (talk) 01:56, 18 October 2015 (UTC)
- I have no desire to get someone in trouble. My only desire is to (1) stop further from happening, and (2) especially, figure out what to do about the article. I think it should come down. Who has time to remediate this? Three large blocks were plagiarize, by academic standards. Further large blocks appear with zero sourcing. Revert to yesterdays version to make the issues clear, if necessary— that is, compare this: [2]. Le Prof Leprof 7272 (talk) 02:03, 18 October 2015 (UTC)
- (edit conflict) Use of material from this site in Gur-e Dokhtar (now deleted) is a potential good faith misunderstanding. I'm suddenly feeling like I've seen this guy's work before. Maybe at SCV. Anyway, I haven't really looked at the subsequently created Gur-e-Dokhtar, which may (or may not) be completely and adequately rewritten. Apranik created on 22 August is a completely unusable paraphrase of [3]. This postdates his block. I'll do a quick scan of recent edits for more undetected copyvios. --Moonriddengirl (talk) 02:11, 18 October 2015 (UTC)
- See this diff, for the 2012 plagiarism, and forgive me for the current in-article temporary editorializing: [4]. Again, you are probing very broadly, perhaps to impart sanctions. My concern is for an article that should not be up because it is intellectually dishonest. As I have said, (1) three instances of earlier content were found, quickly, on superficial examination to have been plagiarized; (2) large blocks of remaining article content are similarly unsourced or sparsely sourced, and bear the same hallmarks (poor sourcing, and changes of author voice from formal and antiquarian to modern, from high quality to low). How can we leave this one article up? Le Prof. Leprof 7272 (talk) 01:57, 18 October 2015 (UTC)
- I've identified that the content published in the 2012 book has been published on Wikipedia at least since 2011. I haven't yet found the origin date, but I'm working on it. (The tool is still scanning backwards.) But while I realize your concern is with this particular article, I'm afraid that I have a more pressing concern to avoid another WP:CCI if what we have here is a persistent copyright infringer who is not responding to warnings. :/ --Moonriddengirl (talk) 02:19, 18 October 2015 (UTC)
- The 2012 plagiarised source was indeed introduced by the editor which started the article in July 2015. While it would not be impossible, the 2012 book in question is unlikely (in my opinion) to have been plagiarised from WP (hence, curiosity here about your 2011 find, which may be explained by web prepublication of this intro content by these authors before the hardback came into print). Look forward to what you find. Le Prof Leprof 7272 (talk) 02:28, 18 October 2015 (UTC)
- Leprof, in general copyvios are fixed the hard way. The hard way is deletion, plain and simple--either of the entire article, if it started as one, or removal of the offending material and removal of the edits from the history. In this case, let's say it's only plagiarism--we could, in principle, rewrite the article and fix it in the way that you'd ask your student to fix it. But yes, who has time for that? I'm not sure, and maybe Moonriddengirl can comment when she's done with the "larger" job, but deletion is an option in that case as well. We don't like doing that, but neither do we like having plagiarized content. Let's wait and see what MRG discovers. Drmies (talk) 02:39, 18 October 2015 (UTC)
- (edit conflict) Okay, so this particular content has been evolving on Wikipedia for some time. Looking at the passage that was reconfigured into a block quote, [5], I find that the material "the text of the cylinder denounces Nabonidus as impious and portrays the victorious Cyrus as pleasing to Marduk" has been evolving on Wikipedia since at least as far back as 2008, with this edit. This snippet of Lee's and Thomas's work looks pretty familiar. :/ "After taking Babylon, Cyrus had proclaimed himself..." This could be a backwards copying from the article to that scholarly work. It happens. But it takes time to verify this. --Moonriddengirl (talk) 02:41, 18 October 2015 (UTC)
I'm going to stop diving down that particular rabbit hole at the moment. The important takeaways here are that the content was a copy-paste from another Wikipedia article, Cyrus the Great, where it still exists. Was it properly attributed when he copied it? If not, that's another issue under CWW. If the content was copied from those scholars, it wasn't this user who did it, btu User:Prioryman in 2008, and I don't believe that. I think this is a case of some scholars not realizing that Wikipedia's content is copyrighted. But I would definitely need time to prove it. :) In any event, the content did not originate with this user, but in another article. Let me check for proper CWW attribution.... --Moonriddengirl (talk) 02:52, 18 October 2015 (UTC)
- All right, so the content is a CWW issue. It was present in the first version of the article, as of 12 July 2015. I haven't checked, but I'd bet there's a ton of content copied from other articles there. This corresponds to the first notice of copyright issues I see, so it's not a sign of ongoing issues in itself. I still need to check recent edits, but I'm kind of past bedtime here. :) I'll do a quick scan, but this takes a lot of time! (User:Leprof 7272, given the CWW issues, this article either must be attributed to all Wikipedia articles from which it is copied or removed. That's not really a question. It can be blanked with {{copyvio}} in the meantime, unless it proves to be speediable. If it's all copied from the one article and the PD sources, it's probably not speediable.) --Moonriddengirl (talk) 02:58, 18 October 2015 (UTC)
- WP:CWW is ongoing, as recently as yesterday. User indeffed pending some indication that he understands policy and will comply. --Moonriddengirl (talk) 03:10, 18 October 2015 (UTC)
- Almost forgot - I came back to do the attribution of at least the article I know was copied at Wars of Cyrus the Great. User:Leprof 7272, I have undone the block quote to the 2012 book, since we know the text predates it. :) Your discovery there proved critical to floating the ongoing issues with this user and is very much appreciated! --Moonriddengirl (talk) 03:28, 18 October 2015 (UTC)
- I want to come back to the original complaint: "large blocks of text taken verbatim from an 1881 online text", presumably meaning an online facsimile or scan of an 1881 printed work. That text is prima facie public domain, and we have a principle going back over a decade that wholesale literal copying is valid, so long as the full attribution is provided, the fact that the text is copied, not merely an authority, is stated, and ideally the boundaries of the copied text are specific. There is no need for quotation marks or other blockquote techniques; a footnote (for part of an article) or endnote (for the entire article) will do. See for example the text of the {{EB1911}} and {{DNB}} templates, and the discussions higher in this page from July and September. Under these principles, the term "misappropriation" is, um, inappropriate. If you call that plagiarism, then thousands, maybe tens of thousands, of Wikipedia articles must be deleted. Rephrasing the original source is, to me, more dishonest because it can be seen as an attempt to disguise the source, although admittedly projects like EB1911 or DNB have not really addressed the "light rewrite" problem (sometimes the now-archaic phrasing does cry out for modernizing, and at what point are these templates literally inaccurate). @PBS: may want to weigh in; we have subtly different approaches. And no disagreement that source written in 2012 is a problem. David Brooks (talk) 13:49, 18 October 2015 (UTC)
- Let me just note that I don't advocate deletion for plagiarism, and I don't think I haven't. I'm not in favor of "light rewrites" anyway: I'm in favor of heavy rewrites and proper attribution (with quotation marks) for select direct quotations. The problem with the old EB1911 things are that one is never clear where the EB starts and where it ends, and that there may be hundreds of edits obfuscating that boundary. BTW, I used the term "plagiarism" as I use it in teaching, which I believe to be usual practice outside out Wikipedia anyway. Drmies (talk) 15:16, 18 October 2015 (UTC)
- I agree with you, User:Drmies. As you say, the old EB1911 articles get rewritten to the point where it's almost impossible to know what is actually the PD text and what is something added/changed by an editor. Heavy rewrites with modern sources where possible/appropriate as the way to handle such pages. By the way, History of Persia is now indefinitely blocked and his response to the block is anything but encouraging. Doug Weller (talk) 16:39, 18 October 2015 (UTC)
- "the old EB1911 articles get rewritten to the point where it's almost impossible to know what is actually the PD text and what is something added/changed by an editor" is actually not true in most of the cases I'm encountering as I step through them one by one. The majority remain exact copies, sometimes with an additional random factoid. But some have been incrementally changed, and when I encounter them I change form endnote to footnote. I agree that this is a challenge in the abstract, but the bigger challenge is properly attributing the thousands of imports in the first place. David Brooks (talk) 22:33, 19 October 2015 (UTC)
- ETA: if you are concerned about the vagueness of EB1911 attributions, then the issue is organized and you can help at Wikipedia:WikiProject Missing encyclopedic articles/1911 verification. It's been going for about 9 years, with PBS and me as the main contributors, and we're 11% done. More energy would help. David Brooks (talk) 12:42, 20 October 2015 (UTC)
- I want to come back to the original complaint: "large blocks of text taken verbatim from an 1881 online text", presumably meaning an online facsimile or scan of an 1881 printed work. That text is prima facie public domain, and we have a principle going back over a decade that wholesale literal copying is valid, so long as the full attribution is provided, the fact that the text is copied, not merely an authority, is stated, and ideally the boundaries of the copied text are specific. There is no need for quotation marks or other blockquote techniques; a footnote (for part of an article) or endnote (for the entire article) will do. See for example the text of the {{EB1911}} and {{DNB}} templates, and the discussions higher in this page from July and September. Under these principles, the term "misappropriation" is, um, inappropriate. If you call that plagiarism, then thousands, maybe tens of thousands, of Wikipedia articles must be deleted. Rephrasing the original source is, to me, more dishonest because it can be seen as an attempt to disguise the source, although admittedly projects like EB1911 or DNB have not really addressed the "light rewrite" problem (sometimes the now-archaic phrasing does cry out for modernizing, and at what point are these templates literally inaccurate). @PBS: may want to weigh in; we have subtly different approaches. And no disagreement that source written in 2012 is a problem. David Brooks (talk) 13:49, 18 October 2015 (UTC)
- Almost forgot - I came back to do the attribution of at least the article I know was copied at Wars of Cyrus the Great. User:Leprof 7272, I have undone the block quote to the 2012 book, since we know the text predates it. :) Your discovery there proved critical to floating the ongoing issues with this user and is very much appreciated! --Moonriddengirl (talk) 03:28, 18 October 2015 (UTC)
- WP:CWW is ongoing, as recently as yesterday. User indeffed pending some indication that he understands policy and will comply. --Moonriddengirl (talk) 03:10, 18 October 2015 (UTC)
- Personally, I think it was a very bad mistake that we ever accepted this content. I consider it due to two factors: first, in the first few yearas we were desperately trying to increase our content on traditional subjects, and would accept essentially anything. second, in those days we were in most traditional areas naive enough to believe that the material that was good enough in 1906 was just as reasonable content 100 years later, and could be used as a substitute. We made two errors there: first, and most obvious ,we neglected the extreme ethnocentrism of the old EB and similar sources, partly because we had not yet acquired enough cultural awareness to be aware of just how ethnocentric it was, partly because we may have still had a certain a,ount of similar feeling ourselves, however loath we may have been to admit it. Second, and in the end much more important, many if the editors here at the time had not realized the changes in our understanding of even the most traditional of topics. With resect to history the 1906 (or 1917 -- the 1917 EB was based rather completely on the earlier EB) was published before most of the presently known middle eastern archeological monuments had been discovered; before the extent of the Minoan / Mycenaean greek culture had been established ; before there was any scientific investigation in the traditional claims of East Asian cultures ' before the most important and securely dated of Egyptian monuments had been discovered ' before the modern analysis of the sources of Biblical literature ; before the entire field of dendrochronology and radioactive dating ; or, in such fields as literature, before the re-evaluation of early 18th century literature and the age of Johnson; before the modern techniques of analysis of early printed publications. (I limit myself here to those areas where I have some knowledge--other people will be able to add many more such example. :On the whole, I think the best course is to remove all material copied form such sources form the text, and pt it into small print in the footnotes, for those interested in following the development of historical tradition--this material has no place in the main body of a modern encyclopedia. (and I mean no disrespect to the older sources sources from that: I actually own personal print copes of the 1913 EB, and the similar years eds. of the Catholic Encyclopedia and and other 19th century sources. I read them from time to time to gain historical perspective. DGG ( talk ) 08:48, 14 June 2016 (UTC)
- I think that you are incorrect. If we take another publication of about the same age of the EB1911 -- the DNB we can draw some direct comparisons because in the 21st Century the DNB has been taken an updated as the ODNB. There are about 60,000 articles in the Dictionary and many of them are almost word for word the same. When articles they were updated that way the original author is credited and the reviewer is described as such. By 1900 much work had been done in cataloguing archives of primary sources, so that in the following 100 years while some biographies has undergone large changes due to the discovery of new archive material, many of them are sill based on the same primary sources. This means that the of am bones of a person's life is often known by 1900. Take for example the 11 April 2010 Anthony Hungerford (Roundhead) article. It was an article based on primary sources. The problem with this approach is that one can not be sure that the primary sources refer to this specific man because he was one of four contemporary men (Wikipedia dab page), using the text from DNB article (version 12 April 2010. Simultaneously improved the style of the text and eliminated the OR of basing an artilce on primary sources. In point of fact the OR used before was not inaccurate, but it takes a professional historian and not a Wikipedia editor to confirm this. All in all the use of these old Dictionaries of Biographies and encyclopaedias or as the conclusion of the article in the ODNB article Introduction; History of the DNB; Plans for a new DNB
- Behind his recommendations lay Matthew's judicious combination of scholarship and pragmatism. 'From my point of view as Editor', he wrote, 'it is important both that it be done well and that it be done.' (10) He found reassurance in Stephen's own view that 'great as is the difference between a good and a bad work of the kind, even a very defective performance is immensely superior to none at all'. (11)
Article where plagiarism concerns have not been fully addressed
I came across an article that was originally copied wholesale (with just slight paraphrasing in spots) from a public domain (US Government) text. Originally there was no attribution and no citations. A couple of hours later the original author deleted from the article what appear to be page numbers from the original text, added links, changed the spelling/naming of some items, and added the following:
- This article includes information collected from the Dictionary of American Naval Fighting Ships.
In 2006 a user noticed that the text in the article was identical to an article found elsewhere on the web, which is itself a copy of the original article. The user posted a comment on the talk page. Since then there have been some revisions and additions, but most of the original text is still there. The structure of the article is still based entirely on the original. Even fanciful expressions copied from the article still remain, e.g.:
- "The early news from the Pacific was bleak"
- "disaster struck" a
- "five days later when American fighting men in Hawaii were rudely awakened to find their country at war."
At some point the "includes information" attribution has been changed to:
This article incorporates text from the public domain Dictionary of American Naval Fighting Ships. The entry can be found here.
- This article incorporates public domain material from websites or documents of the Naval History and Heritage Command.
The problem is that the article doesn't just "incorporate" "text/material from the public domain," it's a wholesale borrowing of that public domain material.
For further explanation, please see my comment:
I notice that the same user contributed a yeoman's amount of material starting in 2002, including several articles taken from the same source - where a DANFS has also been added.
In fact, here's one where in 2008 someone attributed every paragraph to the source material. The end result is that there are 55 footnotes to the same work.
I'm not sure that either approach is a satisfactory solution.
I can make some changes to the article to, for example, give it a more encyclopedic tome. However, I am not knowledgeable about Navy ships and whatever I wrote or rewrote would have to be based on the article as it exists (and the original source material, of course). Redoing the entire article is a daunting task, as is the prospect of checking the other articles taken from this same source Ileanadu (talk) 15:26, 26 September 2016 (UTC)
"Attribution as described in this section is an addition to those requirements."
PBS, regarding this, what editor are you referring to? I ask because I don't see how this edit/edit summary I made makes it so that I am the editor you were referring to. Flyer22 Reborn (talk) 09:12, 3 February 2017 (UTC)
- See the last edit here. -- PBS (talk) 20:55, 3 February 2017 (UTC)
- Ah, okay. Flyer22 Reborn (talk) 21:07, 3 February 2017 (UTC)
This article makes an invalid point
I think here we miss the point that the nature of an encyclopedia is that it DOES plagiarize within the definition provided for in this article. Further, when you give references, the references in and of themselves constitute so-called "attributions", unless one is below average IQ. To the contrary, with certain things, we are even EXPECTED to give only direct quotes of the authoritative source. (such is particularly significant with American Supreme Court Law, because in order to be completely accurate, we need to use the EXACT words of the court.) Food for thought, but none of this makes any real practical sense other than providing certain people something to gripe about. I recommend doing away with this policy entirely to keep the conflicts at a minimum. 108.201.29.108 (talk) 06:17, 10 October 2017 (UTC)
Wording of paraphrase examples
I'd like to propose a minor change in the wording of our last two examples of OK paraphrasing:
- Wikipedia text: Michael E. Brown suggests that political change, such as the move from an authoritarian government to a democratic one, can provoke violence against the state.
- Wikipedia text: Political change increases the likelihood of violence against the state.
I'd like to suggest changing "against" in those two sentences to "within". Brown's original text is from a chapter on "internal conflict". The immediate context is
here (Google Books). It discusses ethnic conflict, repression of minorities, elite power struggles, and other kinds of internal strife. But with one exception (a brief mention of attacks by criminals against the government of Columbia) I couldn't find any mention of violence against the state.
Another idea, which I considered suggesting, is "by" the state. In the original text, Brown's sentence is itself a paraphrase, citing an article by Mansfield and Snyder (here JSTOR and here free earlier version). This article argues that "democratizing states – those that have recently undergone regime change in a democratic direction – are much more war-prone than states that have undergone no regime change." It's specifically about war with other states, not internal conflict. As a summary of Mansfield and Snyder, "by" seems better, but I don't think it works as well as a paraphrase of Brown.
In a sense it doesn't matter, since any of these words will serve to make the point about plagiarism. But other things being equal, I think it's better for a paraphrase to be accurate, so I'd like to propose "within". – Margin1522 (talk) 09:11, 12 March 2018 (UTC)
Is there a template?
Is there a template which says "This article section may contain plagiarised text"? All this guide says is to go in and fix it, and I don't have the time. Adpete (talk) 04:01, 4 January 2018 (UTC)
- see WP:CPI.--Moxy (talk) 04:40, 4 January 2018 (UTC)
- That's not really what I'm looking for. In this case (and I've seen it a few times) the text is more or less cut and pasted from the subject's "About Us" page, so the copyright holder wouldn't object (in fact, it may well have been the copyright holder who did the edit). Adpete (talk) 05:00, 4 January 2018 (UTC)
- {{copypaste}}. Nikkimaria (talk) 14:25, 4 January 2018 (UTC)
- Perfect - Thank you! Adpete (talk) 22:00, 4 January 2018 (UTC)
- Adpete, Plagiarism (copying without attribution, whether or not the material is copyrighted) is completely separate from copyright violation (unlicensed copying of copyrighted material, with or without attribution). We cannot use copyrighted material based on our interpretation of the copyright owner's intent, ever, for many reasons. First, it's illegal. Second, We assert that all material on Wikipedia is available under CC-BY-SA, which among other things allows other people to modify it and use it for any purpose, so the copyright owner's intent would need to stretch that far, which is too much to infer. -Arch dude (talk) 01:25, 23 June 2018 (UTC)