User:SuperHamster/CiteUnseen

Cite Unseen
Cite Unseen in action on Citizens United v. FEC sources
DescriptionUser script that adds iconic indicators to Wikipedia citations
Changelog: User:SuperHamster/CiteUnseen/Changelog
Author(s)SuperHamster, Sky Harbor
First releasedMay 17, 2019; 5 years ago (2019-05-17)
UpdatedJuly 31, 2023; 16 months ago (2023-07-31)
SourceUser:SuperHamster/CiteUnseen.js
GitHub repositoryCiteUnseen

Cite Unseen is a user script that adds categorical icons to Wikipedia citations, providing readers and editors a quick initial evaluation of citations at a glance. This helps guide users on the nature and reliability of sources, and to help identify sources that may potentially be problematic or should be used with caution (key word is may - see the usage guide below).

Cite Unseen's categorization dataset currently holds over 3,400 domains in 20 categories. These categories include:

  • Perennial sources list statuses (generally reliable; marginally reliable; generally unreliable; deprecated; blacklisted)
  • Advocacy groups; books; blogs; user-generated news; editable sites; state media; news; opinion pieces; press releases; satire; social media sites; sponsored articles; tabloids; and TV and radio programs
  • Predatory journals listed on the predatory source list

Initially developed at CredCon in November 2018, Cite Unseen was jointly developed by Kevin Payravi (SuperHamster) and Josh Lim (Sky Harbor), with support from the Credibility Coalition and the Knowledge Graph Working Group. The project saw more development at Wikimedia Hackathon 2019.

Installation

edit

The script is located at User:SuperHamster/CiteUnseen.js and can be enabled for logged-in users. Once logged in, you can add the script to your Wikipedia browsing experience by editing your common.js file and adding the following line:

{{subst:iusc|User:SuperHamster/CiteUnseen.js}}

Cite Unseen will automatically run whenever you open a Wikipedia page.

Before using, please read the usage guidelines below. It's particularly important to keep in mind that while Cite Unseen is here to guide you, it does not evaluate context and should not justify editing decisions.

Configuration

edit

You can configure Cite Unseen by copying and pasting the following code (collapsed) into your CiteUnseen-Rules.js page, and adjusting accordingly.

Configuration code to copy
cite_unseen_categories = {
  "advocacy": true,
  "blogs": true,
  "books": true,
  "community": true,
  "editable": true,
  "government": true,
  "news": true,
  "opinions": true,
  "predatory": true,
  "press": true,
  "rspDeprecated": true,
  "rspBlacklisted": true,
  "rspGenerallyUnreliable": true,
  "rspMarginallyReliable": true,
  "rspGenerallyReliable": false,
  "rspMulti": true,
  "satire": true,
  "social": true,
  "sponsored": true,
  "tabloids": true
}

cite_unseen_domain_ignore = {
  "advocacy": [],
  "blogs": [],
  "books": [],
  "community": [],
  "editable": [],
  "government": [],
  "news": [],
  "opinions": [],
  "predatory": [],
  "press": [],
  "rspDeprecated": [],
  "rspBlacklisted": [],
  "rspGenerallyUnreliable": [],
  "rspMarginallyReliable": [],
  "rspGenerallyReliable": [],
  "rspMulti": [],
  "satire": [],
  "social": [],
  "sponsored": [],
  "tabloids": []
}

cite_unseen_additional_domains = {
  "advocacy": [],
  "blogs": [],
  "books": [],
  "community": [],
  "editable": [],
  "government": [],
  "news": [],
  "opinions": [],
  "predatory": [],
  "press": [],
  "rspDeprecated": [],
  "rspBlacklisted": [],
  "rspGenerallyUnreliable": [],
  "rspMarginallyReliable": [],
  "rspGenerallyReliable": [],
  "rspMulti": [],
  "satire": [],
  "social": [],
  "sponsored": [],
  "tabloids": []
}


cite_unseen_additional_strings = {
  "advocacy": [],
  "blogs": [],
  "books": [],
  "community": [],
  "editable": [],
  "government": [],
  "news": [],
  "opinions": [],
  "predatory": [],
  "press": [],
  "rspDeprecated": [],
  "rspBlacklisted": [],
  "rspGenerallyUnreliable": [],
  "rspMarginallyReliable": [],
  "rspGenerallyReliable": [],
  "rspMulti": [],
  "satire": [],
  "social": [],
  "sponsored": [],
  "tabloids": []
}

The cite_unseen_categories rules specify which icons should be displayed. By default, all icon types will be shown except for resources that are considered generally reliable per the perennial sources list, to reduce clutter.

The cite_unseen_domain_ignore rules let you remove a domain from a category. Domains should be formatted as "example.com". As an example: while CiteUnseen categorizes ResearchGate links as generally unreliable per WP:RSP, some users may wish to disable this as ResearchGate links are often used as valid open access links for reliable journals. To do this, you can have:

cite_unseen_domain_ignore = {
  "rspGenerallyUnreliable": ["researchgate.net"]
}

The cite_unseen_additional_domains lets you add domains by category. Domains should be formatted as "example.com". For example, if you wanted to categorize Wikipedia as a social media site for some reason, you could do the following:

cite_unseen_additional_domains = {
  "social": ["wikipedia.org"]
}

The cite_unseen_additional_strings lets you add plain strings by category. For example, if you wanted to assume that all URLs that contain "/w/" are a wiki, you could do the following:

cite_unseen_additional_strings = {
  "editable": ["/w/"]
}

Usage

edit
 
Cite Unseen after running on Yemeni Civil War (2014–present):

Once installed, Cite Unseen will automatically analyze and annotate references you come across. When it finds a match in its categorization dataset, it will add a categorical icon (refer to the chart below). You can hover over an icon to get more details about the categorization.

Important points to keep in mind while using Cite Unseen:

  • Context matters. Sources that are considered generally unreliable can still have valid use. For example, while we typically avoid citing social media, social media posts may still be used for uncontroversial self-descriptions. And while we typically try to avoid self-published blogs and other user-generated content, they may still be acceptable when authored by established subject-matter experts (see WP:SPS for more).
  • Evaluate. The point of Cite Unseen is to highlight the nature of sources, and to prompt you to think about potential concerns with a source. Just because a source has a concerning mark does not automatically mean it is being used inappropriately. You should never justify removing or adding a source solely because of information that Cite Unseen provides; you need to do your own homework as well.
  • It does not cover everything. There is an endless trove of resources out there, and we can't categorize all of them. You'll find many citations that Cite Unseen won't mark up; this does not indicate anything other than that it either (a) does not fit in an existing category or (b) more commonly, it simply hasn't been categorized.
  • It is not always right. Cite Unseen looks at citation types and does string-matching against URLs. While generally successful, it's possible for Cite Unseen to misidentify a source.
    • Sometimes reliable sources are hosted on an unreliable site. For example, editors citing a book may link to its listing on Amazon.com, which is classified as   generally unreliable. This will cause the citation to be marked as generally unreliable even if the book itself is fine. Situations like these are something to keep in mind while investigating the usage of a source.

Classifications

edit

Cite Unseen classifies sources into eighteen categories.

Icon Description Code
 
Advocacy: An organization that is engaged in advocacy (anything from political to civil rights to lobbying). Note that an advocacy group can very well be a reliable source; this indicator serves to note when a source's primary purpose is to advocate for certain positions or policies. The websites in this category predominately come from articles in Category:Advocacy groups. advocacy
 
Books: Books and other similar printed matter. Not an indicator of reliability by itself. books
 
Blog post: Note that a blog post may be considered reliable as a source on the author themselves, or when produced by an established subject-matter expert, whose work in the relevant field has previously been published by reliable, independent publications. See WP:ABOUTSELF and WP:SPS for more information. blogs
 
User-generated news: News sites that accept articles from the community, such as Examiner.com or Global Voices. community
 
Editable: Sites that are editable by the public, such as wikis (Wikipedia, Fandom) or some databases (IMDb, Discogs). editable
 
State media and other government sources. This categorization takes into account the direct editorial control the government has on the source. Some public broadcasters and other outlets in which the state does not exercise tight editorial control (such as PBS in the United States) will not have this icon. government
 
News: News published in reputable news sources that are generally considered reliable on Wikipedia. news
 
Opinion piece: Opinion pieces and op-eds. opinion
 
Predatory journals: Predatory journals and publishers; these sites charge publication fees to authors without checking articles for quality and legitimacy. This list is derived from Template:Predatory open access source list. predatory
 
Press releases press
 
Satire: Sites that publish satirical articles, such as The Onion. satire
 
Social media: Usually a post from a user on a social media platform. Note that a social media post may be considered reliable as a source on the author themselves, or when produced by an established subject-matter expert, whose work in the relevant field has previously been published by reliable, independent publications. See WP:ABOUTSELF and WP:SPS for more information. social
 
Sponsored: Articles that have been paid for or otherwise sponsored. As an example, see the sponsored section of the Seattle Times. Depending on the publication, sponsored content may be produced by a third-party. sponsored
 
Tabloids: Sites that publish celebrity gossip and tabloid journalism (as in the style of largely sensationalist journalism; publications that publish in tabloid format but are otherwise generally reliable and non-sensationalist are not categorized as tabloids). tabloids
 
TV / radio programs: TV and radio programs, which may or may not qualify as news and/or reliable depending on the individual program. tvPrograms
 
[RSP] Generally reliable in its areas of expertise: Per RSP, editors show consensus that the source is reliable in most cases on subject matters in its areas of expertise. The source has a reputation for fact-checking, accuracy, and error-correction, often in the form of a strong editorial team. rspGenerallyReliable
 
[RSP] Marginally reliable: Per RSP, the source is marginally reliable (i.e. neither generally reliable nor unreliable), and may be usable depending on context. Editors may not have been able to agree on whether the source is appropriate, or may have agreed that it is only reliable in certain circumstances. It may be necessary to evaluate each use of the source on a case-by-case basis while accounting for specific factors unique to the source in question. See Wikipedia's perennial sources list for more details. rspMarginallyReliable
 
[RSP] Generally unreliable: Per RSP, there is community consensus that the source is questionable in most cases. The source may lack an editorial team, have a poor reputation for fact-checking, fail to correct errors, be self-published, or present user-generated content. Outside exceptional circumstances, the source should normally not be used, and it should never be used for information about a living person. Even in cases where the source may be valid, it is usually better to find a more reliable source instead. If no such source exists, that may suggest that the information is inaccurate. The source may still be used for uncontroversial self-descriptions, and self-published or user-generated content authored by established subject-matter experts is also acceptable. rspGenerallyUnreliable
 
[RSP] Deprecated: Per RSP, there is community consensus to deprecate the source. The source is considered [generally unreliable, and use of the source is generally prohibited. Despite this, the source may be used for uncontroversial self-descriptions. rspDeprecated
 
[RSP] Blacklisted: Per RSP, due to persistent abuse, usually in the form of external link spamming, the source is on the spam blacklist or the Wikimedia global spam blacklist. rspBlacklisted
 
[RSP] Varied consensus: Per RSP, the community's consensus on the reliability of this site depends on one or more factors (for example, Forbes articles by staff are considered generally reliable, while articles by contributors are considered generally unreliable). See Wikipedia's perennial sources list for more details. rspMulti

Contributing

edit

We're always looking to expand and tune our categorizations. Please place any questions or ideas on the talk page.

If you're interested in touching the code itself:

Next steps

edit

Some of the next big goals for the project:

Technical implementation

edit

Cite Unseen performs string matching on URLs, as well as checks for different types of citation templates, in order to identify the kind of work and any potential ideological leanings.

Cite Unseen is implemented in JavaScript. When Cite Unseen is run, it does the following:

  • Iterates through every citation in a given Wikipedia article and pulls URLs.
  • Checks each URL against a pre-defined list of domains and strings that are categorized by nature (biased, press, news, opinion piece, etc.).
  • Injects icons next to citations accordingly.

See also

edit