Talk:Scunthorpe problem/Archive 1

Latest comment: 1 year ago by 64.185.19.49 in topic Minecraft 2022 Word Filtering
Archive 1

I've Noticed

This effect takes place a lot in Free Realms. If u type gibberish or a misspelling or even a word like nuts or mount or something (in innocent context) it will censor it. — Preceding unsigned comment added by 69.248.158.97 (talk) 02:46, 23 June 2011 (UTC)

Perhaps link to Medireview, since it is the same idea? - Jax

First paragraph could use some serious work toward conciseness. - IstvanWolf 04:06, 20 May 2006 (UTC)

Bullets 1 and 3 are not Scunthorpe problems, they're simple word filtering. Bullet 2 should be made clear that it was the substring, not just the "word" Allah. The external link cited says that "Kallahar" and "Callahan" were blocked, so #2, once edited, will be a substring/Scunthorpe problem. The prose section is good but I think it would make sense to mention how rarely the short list of (American) obsceninties are substrings of inoffensive (American) words. The problem disproportionately affects proper nouns and compounded or elided words (such as domain names or login IDs). —Preceding unsigned comment added by 63.251.87.214 (talkcontribs) 19:15, August 3, 2006

There are some fair points here. In its strictest sense, the phrase Scunthorpe Problem refers to substrings of letters within a word. However, I added bullets 1 and 3 because they are also examples of computers failing to show the sort of common sense interpretation of language that humans take for granted. While this may not please the purists, it could be argued that these fall within the general definition of the term Scunthorpe Problem since they are all examples of computer obscenity filters doing silly things. I'll also have a look at the first paragraph. --Ianmacm 06:27, 4 August 2006 (UTC)

UK bias?

From reading this article, you'd almost think this sort of thing only happens in the U. K. But I think the real question is, does anyone call it this outside of the U. K.? — The Storm Surfer 08:29, 4 May 2007 (UTC)

This is an interesting point. Although most pieces of computer jargon were coined in the USA, the term Scunthorpe Problem comes from the UK and seems to have stuck. There is a piece about the Scunthorpe Problem on CNET at [1] which mentions the now famous problems that the people of Scunthorpe had in accessing the internet back in 1996. To the best of my knowledge, the term Scunthorpe Problem is still the best-known way of describing overenthusiastic computer obscenity filters.--Ianmacm 20:49, 4 May 2007 (UTC)

Text cleanup

 
Irina Slutskaya can be banned by the Scunthorpe Problem

The article has been given a slight rewrite and an image added showing the comical results when an attempt was made to do a Google search on Irina Slutskaya using a public library computer. The specific claim about the Salon.com message boards and Cialis was removed, because while it was made in good faith and may well be true, it lacks a citation.--Ianmacm 15:14, 15 May 2007 (UTC)

To expand slightly, the program that banned Irina Slutskaya was RM SafetyNet Plus [2], a popular piece of software used to filter internet content on public computers. I first came across bizarre behaviour by this program when it blocked a search on Procter and Gamble. After scratching my head for a while, I realised that it was blocking the word gamble, and it also blocked searches on the the word casino. This is word filtering rather than a strict example of the Scunthorpe Problem, but it shows how computers can be tripped up in this area. RM SafetyNet Plus allows searches on the word dick, but blocks searches on the word cock.

The origin of the phrase The Scunthorpe Problem may be traced back to the article Google's chastity belt too tight on CNET news in April 2004 [3]. This contains a paragraph entitled The Scunthorpe problem, and may have helped to popularise the phrase, if not actually inventing it. It is fair to say that typing the phrase Scunthorpe problem into a search engine usually brings up either the Wikipedia or the CNET article, and there is not a great deal else to read.

Nowadays most computers have learned not to block words like Sussex or Scunthorpe, but there is still room for mistaken decisions, as this article shows.--Ianmacm 18:28, 17 May 2007 (UTC)

LiveJournal

I removed this from the article pending further research:

* In 2007, the filtering of the racial slur spic from the LiveJournal search engine prohibited users from searching for spicy food, the Spice Girls, or hospice.

This appears to be based on a blog entry at [4]. Intrigued by this, and looking for more information, I created an account at LiveJournal and did a search on the Spice Girls and some of the other "banned" terms. Although they did not return results, the search engine did not warn that the terms were banned on the grounds of taste and decency. Some clear confirmation is needed here, or there is the possibility of an urban legend slipping into the article. Further comments on this are welcome. --♦IanMacM♦ (talk to me) 11:26, 9 December 2007 (UTC)

We don't have a source - people have noticed that some terms have become unsearchable and have been experimenting with it. There does seem to be hitlist of terms based on sex, ethnic insults and some nazi-related stuff. However there are no real sources for this stuff - it's not published outside of LJ and isn't officially admitted to as far as I know. This is too recent to make WP I think, and needs to be picked up by someone else first. Secretlondon (talk) 16:47, 9 December 2007 (UTC)
I don't think it's required that the term be blocked out of reasons of taste or decency - it still counts as an example of filtering based on substring, which ends up catching longer innocent terms. But yes, I agree it's probably best to leave out for now - I think entries should really have some 3rd party source, otherwise it ends up as original research. Mdwh (talk) 17:32, 9 December 2007 (UTC)

More updated "hit list" is at [5] (by same author as original blog entry; slightly cleaned-up language; more accurate list.) Confirmation of code ignoring some interest searches and returning an error message is at [6] LiveJournal staff admits the situation exists, but admission is buried in discussion threads at the official announcement communities. Waiting on more detailed official comment or explanation, which has been promised. Elfwreck (talk) 18:42, 9 December 2007 (UTC)

Thanks for the feedback. This is an interesting situation, and if true it would be well worth mentioning in the article. As the other contributors have pointed out, the situation with the evidence at the moment is anecdotal and could be seen as original research. No mainstream media outlet has picked up on this yet, and the staff at LiveJournal have not formally confirmed that the list of banned terms exists. For these reasons it is right to remain cautious at the moment, although the information could be included at a later date. --♦IanMacM♦ (talk to me) 19:38, 9 December 2007 (UTC)

Datapoints: Livejournal added the interest-search-blocking code mid 2007: the changes are visible in the <a href="http://community.livejournal.com/changelog/5260013.html">changelog here</a>. I have extensively tested elfwreck's findings (I have no previous connection to elfwreck) and blogged about them <a href="http://viv.id.au/blog/?p=1205"here</a>. —Preceding unsigned comment added by Waawa (talkcontribs) 02:22, 10 December 2007 (UTC)

We still can't use blogs as a source. this has a comment by a LJ engineer. She says implemented end Oct and only changed significantly once. However this is still very recent, not officially admitted to, with no reliable sources. Secretlondon (talk) 05:12, 10 December 2007 (UTC)

Update: it appears that LJ has now said as much as they're going to say at this stage. A Staff member representing LJ has updated here admitting to the search interest blocking, and explicitly refusing to elaborate. Further here, in reference to the fale-positive substring blocking issue: "Unfortunately, we can't clarify the problem which caused the blocked search-terms. Over time we hope to eliminate terms and substring matches, but it won't be something we *can* comment on." —Preceding unsigned comment added by Waawa (talkcontribs) 02:27, 20 December 2007 (UTC)

I'm a LiveJournal user who's taken an interest in this, and at the moment I believe that the situation is as follows. Most of the blocked terms are still blocked. Matters are slightly confused by the fact that a different interest-"censoring" problem has come to light (and, I think, been fixed) in the meantime, but the blocks we're discussing here are still in place. As far as I know, there has been no further staff comment at all, even on the lines of "we can't say anything" or "we're under legal constraints" or whatever. Most pertinently to us here on WP, also as far as I know there has been no coverage at all of the issue in anything we could use as a reliable source. I did wonder whether The Register might pick up on it, even if only to snigger, but no, not a peep. (And yes, "snigger" is blocked as well.) Loganberry (Talk) 16:00, 28 March 2008 (UTC)

Cumming

I removed this from the article:

  • Genealogists researching the surname Cumming have found that their correspondence has been blocked.

This was done partly because it lacked a citation, and partly because it repeated the point about magna cum laude being blocked. --♦IanMacM♦ (talk to me) 10:19, 21 January 2008 (UTC)

A quick update: RM SafetyNet Plus [7] managed to block both magna cum laude and Cumming, in addition to Irina Slutskaya as mentioned in the article. This program is an excellent example of the problems caused by enforcing rigid rules about perceived "obscene" strings of letters. --♦IanMacM♦ (talk to me) 09:02, 23 January 2008 (UTC)

Original research problems

I'm removing speculation and original research from this article. If there is a reliable secondary source for the Slutskaya example, please put it back. --Jenny 14:16, 13 July 2008 (UTC)


Since there's some opposition to the removal of original research from this article, I'm calling a topic RFC. --Jenny 14:33, 13 July 2008 (UTC)

Let's have a bit of common sense here. Both of these examples were added by other users a while back, and were verified by testing against the program RMSafetyNetPlus [8] (see the screenshots in the article). --♦IanMacM♦ (talk to me) 14:35, 13 July 2008 (UTC)

The problem as I see it is that some Wikipedians have apparently typed names into google in public libraries, obtained a filter message, and added these to this article as examples of the phenomenon. This isn't verifiable for a start, because we only have these fellows' word for it that they did this, but the conclusion that the filter message is due to one reason and not another is also original research. It may be true, but our standard is verifiability, not truth. We have elsewhere in the article ample verifiable examples of the phenomenon, so these items of original research are unnecessary. --Jenny 14:37, 13 July 2008 (UTC)

One of the key principles of Wikipedia is to assume good faith. Assumptions of bad faith can be made only on the basis of good evidence. The reverts suggested are based on an assumption of bad faith which is not supported by the evidence. --♦IanMacM♦ (talk to me) 14:44, 13 July 2008 (UTC)
It's not about bad faith, it's about verifiability. We don't include material just because we think it's true. In any case I'd like to see what others have to say on the matter. --Jenny 14:49, 13 July 2008 (UTC)
I agree with Jenny. I don't doubt the truthfulness of the text, but I don't think it should be included unless reported by a reliable secondary source. Not every word or name that is blocked by some filter somewhere is notable. If we attempted to include every one, this article would quickly devolve into a sprawling list. Neitherday (talk) 04:11, 14 July 2008 (UTC)
I am not quite sure that a sledge hammer like WP:OR is needed when a smaller tool would do the job. It seems to me that editors have the right to pare down the list. On the other hand confirmability is a good criterion. But even then you might have too many examples so you will still have to pare down the list.
Perhaps we can avoid this issure. What do you think of creating a separate list article having a confirmed and an unconfirmed section. I think the list will be notable enough, and it might solve your problem while providing some amusement to those who stumble upon it. Then you can set a limit to the number of confirmed cases in the main article (say 6). If someone decides to add a new one to the main article you can move it to the list. If someone copies an example from the list to the main article you or they can delete another example so that the list on the main page doesn't get stale. In some sense this is a little like cleaning up the room by throwing everything into the closet. One still has to organize the closet (and throw stuff out) but the amount of organization you need there is less and the room is cleaner.
As a side note I personally liked the idea of having the picture of the russian iceskater that you deleted. It is good to have a picture to break up the monotony. It doesn't have to be that picture. (If someone adds another equally interesting picture you can always delete the first, in my opinion.) It could be any one picture. In my opinion, it is under the editor's discression to limit the number of examples and pictures as long as they are fair about it. TStein (talk) 02:46, 16 July 2008 (UTC)
What do you think of creating a separate list article having a confirmed and an unconfirmed section Not a great deal. Unconfirmed stuff should be removed, per our policies and guidelines. 86.44.20.40 (talk) 03:10, 17 July 2008 (UTC)
Oops. You are right of course. I wasn't thinking properly when I wrote the above paragraphs. TStein (talk) 05:23, 17 July 2008 (UTC)

It clearly is OR. But then nearly all of our user-taken images are. I wouldn't like this sort of usage enshrined in policy but it's clearly an excellent illustration for this article. I suggest removing the example from the article text proper and amending the caption text to a simple description of what it is: a search for the ice skater blocked in a pornography filter. There can be no objection to that, I feel. But maybe two images of this sort is overkill. 86.44.20.40 (talk) 03:10, 17 July 2008 (UTC)

This seems a reasonable compromise. Neitherday (talk) 04:16, 17 July 2008 (UTC)

I have a question about the list section of the article that may pertain. In the example section

Sorts of original research

  • I suspect that the WP:OR rule was invented to stop unproven theorizings by Wikipedia editors, particularly by those without adequate scientific or technical knowledge and skill in the field of what is under query. Direct un-theorized observation is another matter :: for example, a Wikipedia editor lives in a particular village, or often drives through that village, and thus well knows that there is a church there, and says so in Wikipedia. Anthony Appleyard (talk) 10:54, 2 November 2009 (UTC)
Yes - the result would be absurd if every single fact in an article required a secondary source.Terrymr (talk) 04:33, 8 January 2010 (UTC)
It seems we have another instance of this absurdity. I recently added a directly observed fact regarding the Orange ISP censoring Scunthorpe out of Email subjects matter. The user Ianmacm seems to be getting into an edit war in order to preserve this absurdity. Vexorg (talk) 16:53, 11 January 2011 (UTC)
Absurdity? This was added three times without a shred of evidence to support it. Furthermore, the wording that was added does not even specify the country where this is supposed to have occurred. I have added cn, no citation and it is off in seven days.--♦IanMacM♦ (talk to me) 16:59, 11 January 2011 (UTC)
Yes it is an absurdity. What evidence would you require for an article on the Sun? An article in The Times? With respect your approch to wikipedia is not doing the encyclopaedia any favours. And please don't make threats. The info is a directly observable fact Vexorg (talk) 17:08, 11 January 2011 (UTC)
No threats here, just plain WP:V. In any case, there are already enough examples in the article, which tends to read like a bulleted list in places. I can't see why this has led to such an issue.--♦IanMacM♦ (talk to me) 17:11, 11 January 2011 (UTC)
As a directly observable fact it's easily verifiable. If you are really that bothered you can easily disprove my edit by emailing someone with a UK Orange ISP account with Scunthorpe in the subject. You are the one who made it an issue by creating an edit war on the basis of some ill applied WP:V and WP:OR. Adding 'citation needed' in this instance is irrelevant. The edit is verifiable! Vexorg (talk) 03:45, 13 January 2011 (UTC)
As above, WP:V is important. Examples have been removed before because they were uncited. You are insisting on having this without a cite, making this the only uncited example. No UK media source has picked up on this, which is a surprise and a worry, as people in Scunthorpe would surely have spotted this quickly. It also looks like WP:GAME to remove the cn tagging on one of your own edits.--♦IanMacM♦ (talk to me) 08:24, 13 January 2011 (UTC)
"No UK media source has picked up on this" - and with that comment you're behaving in a manner which severely devalues Wikipedia. i.e making it a mirror of corporate mainstream media. You should seriously consider the ethos of Wikipedia. Do you seriously put the agenda of mainstream media above directly observable facts? Vexorg (talk) 05:59, 21 January 2011 (UTC)
This is a weird comment, as Wikipedia policy defines notability in terms of coverage from reliable secondary sources. Anyway, the article should not be a bulleted list per WP:TRIVIA. There are more than enough examples already.--♦IanMacM♦ (talk to me) 07:50, 21 January 2011 (UTC)

Orange example

No GA or FA would allow this without a cite. Even if true, the article should avoid bulleted lists. There are probably plenty of examples out there that we have missed, but we do not need to have all of them per WP:LISTCRUFT. The Orange example lacks notability with no sourcing.--♦IanMacM♦ (talk to me) 10:03, 13 January 2011 (UTC)
if there are other examples then lets add them. Wikipedia should be a slave to corporate media the has the power to make something notable simply becuase it has a bigger bank balance. Directly observable facts are as such. It doesn't need an article in The Times to make it notable. If you want a publication that is based on opinion that's value is based upon a bank balance then Wikipedia is not for you. Vexorg (talk) 06:05, 21 January 2011 (UTC)

Regular expressions? Really?

The Scunthorpe problem derives from straight-up substring replacement, not shoddy regular expressions. Not only do you not need regexes to do that kind of work, I doubt that the sort of people who create these filters even know what regexes are. Someone with even the slightest experience with regexes would at least use /\bshit\b/i (or similar) instead of /shit/i. I mean, this is the *opposite* of pattern-matching. Perhaps the article should reflect that. -- Phyzome is Tim McCormack 22:57, 11 August 2008 (UTC)

Regex rules are sometimes used for this type of automated content removal, but I cannot comment on how individual programs work. If anyone else thinks that this is inaccurate, it would be changed. --♦IanMacM♦ (talk to me) 06:57, 12 August 2008 (UTC)

Yes really. I made this mistake myself 12 years ago when writing a profanity filter, in Perl, for a well-known broadcasting corporation based in Britain. Using regexps. Badly. WayneGMyers (talk) 01:27, 29 May 2012 (UTC)

Tyson Gay ??

Tyson Gay fell victim to a similar problem when a conservative US website's content filter removed 'gay' from an article about the sprinter and changed it to 'homosexual'. This isn't exactly pertinent to this article but if there isn't a more fitting subject I think it warrants attention. It certainly recieved plenty of press coverage if only for the comic value.

Thanks for pointing this out. Tyson Gay (for those who do not know) is an American sprint star. There is some coverage of this incident from June 2008, for example at here and here. As for article relevance, it is correct to say that this is not strictly an example of the Scunthorpe Problem, since it is an auto-replace of a "controversial" word. It is, however, another example of a nonsense decision by a computer. --♦IanMacM♦ (talk to me) 15:13, 8 December 2008 (UTC)

Conficker

Why is the name of the computer worm Conficker sometimes censored? --88.78.3.190 (talk) 20:07, 19 February 2009 (UTC)

Don't know, but at a guess it could be to prevent people from clicking on phoney cures for the computer worm. Could you provide an example of where this happened?--♦IanMacM♦ (talk to me) 08:25, 20 February 2009 (UTC)
It contains a German swearword (intentionally so). AnonMoos (talk) 11:23, 20 February 2009 (UTC)

Image

 
Example of Scunthorpe problem occurring on Wikipedia

This was removed due to possible original research issues. Is the claim that the username was blocked because of the substring cunt?--♦IanMacM♦ (talk to me) 22:52, 22 March 2009 (UTC)

Possibly, although that was the reason for it being blocked:[9][10]. —Snigbrook 23:15, 22 March 2009 (UTC)
Thanks, the lists mentioned above use regex rules, one of which is .*[ck\(]unt.* This would indeed produce the classic Scunthorpe problem for anyone trying to sign up with Scunthorpe in the username. It is harder to say whether this should be in the article, which has tried to keep to examples that received coverage in the media, in order to comply with Wikipedia guidelines.--♦IanMacM♦ (talk to me) 08:43, 23 March 2009 (UTC)
After this was added as an infobox image, here is the result of attempting to sign up today as Scunthorpe123. I can't help feeling that the filters should have learned about this word by now.--♦IanMacM♦ (talk to me) 15:55, 30 November 2011 (UTC)

Inclusion of some examples

  • One of the policies of the article is not to include examples likely to produce a [citation needed] template. At the moment there are several examples that could do this, eg Cummings and Wessex. There are enough examples in the article to give a general flavour of the problem, so some could be removed without a great loss.--♦IanMacM♦ (talk to me) 11:48, 2 November 2009 (UTC)
  • Some of the entries (eg Cummings and Wessex) were removed because they lacked reliable sourcing and added little to the examples already given. It is also important for the article to make a distinction between the classic Scunthorpe Problem (blocking of a substring within a word) and whole word blocking, which although similar is not quite the same thing. The article still has a tendency towards WP:LISTCRUFT because the examples section outweighs the rest of the article by a wide margin, but all of the current examples are sourced to show that they actually happened.--♦IanMacM♦ (talk to me) 11:27, 4 November 2009 (UTC)
  • Some of the items added eg, the "Within Wikipedia, a block against the word "admin" in usernames of users who are not administrators, has been known to query usernames containing the placename and game-name Badminton." are bordering on WP:NAVEL and WP:TRIVIA. Please ensure that examples given are notable and reliably sourced. There are already enough examples to give a flavour of the problem.--♦IanMacM♦ (talk to me) 09:07, 5 November 2009 (UTC)
  • Re spelling "shitake" / "shiitake", and "Niigata" etc, Japanese "ii" is a conventional transcription of long "i", not two short vowels. Anthony Appleyard (talk) 06:41, 8 November 2009 (UTC)

3 different problems

The examples given in this article seem to come from three different problems, only one of which is mentioned in the lede text.

  1. Problem one is that which is described in the lede: blocking (usually of emails or search results) because the text contains a string of letters shared by an obscene word. I think this is usually a straightforward coding failure. Examples: Scunthorpe; Lightwater.
  2. Problem two is the problem of homographs. It is a failure to (properly) judge context. Examples: Dick Whittington; magna cum laude.
  3. Problem three is the replacement of a string of letters on the incorrect assumption that it is an obscene word. Examples: Tyson Gay -> Tyson Homsexual; classic -> clbuttic (FYI, clbuttic redirects here, and I believe I have seen this type of replacement called "the clbuttic mistake"); medieval -> medireview.

I suggest that either the lede should be changed to incorporate the other problem types, or the examples of different problems should be removed, or moved to other articles. If the latter, I'd suggest putting problem 2 at homograph and problem 3 at clbuttic (merge medireview into it). Thanks, cmadler (talk) 17:08, 4 March 2010 (UTC)

The purists have pointed out before that the classic Scunthorpe Problem involves a substring within a word, as Scunthorpe does. The phrase has come to be used to describe any silly decision by a computer obscenity filter, and maybe the article needs some tweaking to clarify this.--♦IanMacM♦ (talk to me) 18:40, 4 March 2010 (UTC)
I guess my point is that the article currently says, "The Scunthorpe problem occurs when a spam filter or search engine blocks e-mails or search results because their text contains a string of letters that are shared by an obscene word." If this article is going to go with the broader use of "Scunthorpe problem", perhaps the explanation provided in the lede should be updated (with citations, of course). cmadler (talk) 19:11, 4 March 2010 (UTC)
The phrase Scunthorpe Problem was either invented or popularized by the CNET article Google's chastity belt too tight in April 2004. Since then, it has been used to describe any dubious decision about words by censorship software, eg this article from The Economist refers to the beaver story as an example of the problem. Looking at the WP:LEAD, it is not unduly misleading about the cause of the problem. Whether a string of letters is an entire word or within a word, the computer is not making the kind of judgments that a human would make, leading to foolish and often comical results.--♦IanMacM♦ (talk to me) 21:12, 5 March 2010 (UTC)

Update to articles which refer to me.

As I'm updating the article with factual references that refer to me, can my edits be left to stand please? Mr Cockburn (C0ckburn). thanks —Preceding unsigned comment added by Siliconglen (talkcontribs) 20:43, 5 May 2010 (UTC)

Clbuittic mistake

Strictly speaking, this is not the "clbuttic" Scunthorpe problem, because it relates to word replacement rather than blocking. However, it is an example of the silly things that obscenity filters can do.--♦IanMacM♦ (talk to me) 07:27, 26 August 2010 (UTC)
The new addition is a mishmash of WP:TRIVIA. The article has quite enough examples already. It is worth citing th the Daily Telegraph article, but there is no need to go down the trainspotting route of citing every example from a Google search. Some of the sources given are not reliable and do not establish notability. Also, as mentioned above, this is not really a classic example of the problem anyway.--♦IanMacM♦ (talk to me) 08:03, 26 August 2010 (UTC)

Gay (again)

There are enough examples in the article already, but this has been in the news: [11] involving Fort Gay, West Virginia. This seems to be the result of human error, although a computer may have flagged the word gay as "inappropriate".--♦IanMacM♦ (talk to me) 16:25, 8 September 2010 (UTC)

whakatane

That's how it DOES sound, not just the way the analyser deemed it. LowKey (talk) 07:01, 27 July 2011 (UTC)

The citation in the article is a dead link, but this news story gives a mirror version. Whakatane is apparently pronounced with an F despite the spelling.--♦IanMacM♦ (talk to me) 08:28, 27 July 2011 (UTC)

More examples

This story about Virgin Media is in several of the newspapers today. Some people have suggested that it may be a prank rather than actual examples of the problem. Anyway, there are enough examples in the article without adding these.--♦IanMacM♦ (talk to me) 07:57, 20 December 2011 (UTC)

Instagram

The instagram #scunthorpe tag contains no images while browsing the #scunny tag reveals plenty of images using it. — Preceding unsigned comment added by 2.120.132.176 (talk) 22:31, 6 June 2013 (UTC)

Pokemon GTS

On the global trade station for pokemon, it will not allow you to put up for trade: Nosepass, Probopass, and Froslass. Until earlier this year, the list also included Corfagrigus. I have sources and everything, but wasn't entirely sure if this was the best place for it. I'll add them unless anyone contests it. 74.128.43.180 (talk) 00:12, 30 July 2013 (UTC)

The article is somewhat overloaded with examples. Unless this is sourced to a newspaper or other mainstream source, it may not need to be added. Lists on Wikipedia do not need to be exhaustive, and the article has issues with WP:PROSE.--♦IanMacM♦ (talk to me) 05:57, 30 July 2013 (UTC)

Facebook bans users over the word "faggot"

 
LOL ur so gay (according to Facebook)

This story has picked up a good deal of coverage. The problem is that the sourcing does not make clear whether an automated process led to the block, or human error. As a result, it is not ideally suitable to be added to the article as an example of the Scunthorpe problem.--♦IanMacM♦ (talk to me) 08:07, 2 November 2013 (UTC)

Beaver?

The article doesn't explain what the meaning of the word 'beaver' is which lead to it being blocked from censored web searches. Can someone give this info in the article? Cogiati (talk) 14:41, 27 December 2013 (UTC)

This source explains that beaver is "slang for women's genitals".--♦IanMacM♦ (talk to me) 15:18, 27 December 2013 (UTC)

Essex

As i recall, the first time i heard about this kind of problem was a case where a school in Es*** unintentionally blocked its students from accessing the school's own website. — Preceding unsigned comment added by 70.17.200.178 (talk) 14:06, 9 January 2014 (UTC)

This page causes the problem itself...

I just tried to post a link to this article on another website (IMDB), and the autofilter there blocked it. Incidentally, it also blocked a poster trying to type "Montenegro", which was why I wanted to link here. Seems like that's another example that should be added to the page. See here for reference. Lurlock (talk) 04:15, 5 February 2014 (UTC)

where to put them?

The items about Penistone, Lightwater, and Clitheroe, and about email to MPs about the Sexual Offences Bill, were incorrectly included in § Blocked for word with two meanings, along with such actual polysemous examples as Dick Whittington and magna cum laude.

I looked at the reference listed for Penistone ([12]), but there's nothing there about an actual blockage, it simply mentions it. Apparently this is a popular example which has become an urban legend, not in being false (it probably is true) but in being widely cited with no supporting evidence. And without case information, there's no place in the article to put it. Look at that part of the TOC:

2.1 Refused web domain names and email addresses
2.2 Blocked web searches
2.3 Blocked emails
	2.3.1 Blocked for word with two meanings
2.4 News articles damaged
2.5 Blocked pages

The article is essentially organized around the effect in different cases, with just this one level-3 subsection about a particular cause.

Anyway, I've flagged Penistone and Lightwater for "citation needed", and moved them and the other two items up into "blocked emails", out of the "two meanings" subsection.

I've also put single blank lines between all the bullet items in the wikicode. That doesn't affect the display, but it makes the wikicode easier to edit, which might help prevent further misplacements. --Thnidu (talk) 07:37, 31 March 2014 (UTC)

Just to let you know, I've removed the blank lines between the bullet items, for accessibility reasons. See WP:LISTGAP for details. —Granger (talk · contribs) 02:47, 12 October 2014 (UTC)
@Mr. Granger: Thanks. I certainly don't want to mess up accessibility, and as you rightly guessed, I was unaware of that effect. (Grumble. It still makes it harder to edit. Oh well.) Thnidu (talk) 01:32, 29 October 2014 (UTC)

Hello fellow Wikipedians,

I have just modified 6 external links on Scunthorpe problem. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 22:43, 17 September 2017 (UTC)

Blocking the "butterfly effect"

I do not have any mainstream article to source it, but according to my own experience multiple universities in France are blocking access to websites with 'butterfly' in the URL. The cause is a filter on English sexual words (in this case 'butt'), and IT departments don't seem to care about it and advise to go on "French" websites. Coeur (talk) 14:17, 29 August 2018 (UTC)

It wouldn't be surprising, although it is unsourced. Irina Slutskaya can do the same thing, due to the inclusion of slut.[13] The piece of software involved, RM SafetyNet, has been programmed to say Think of the children when this Russian ice skater is mentioned.--♦IanMacM♦ (talk to me) 18:26, 29 August 2018 (UTC)

Image issue

Images are worth a thousand words, but... the fact that you removed lots of images from the article (e.g. you removed the image of a cup of caffè mocha on top of a hot cocoa drink from the article) is horrendous. Put them back on; also, what's wrong about an image of Tyson Gay and the flag of Wales? --203.81.71.30 (talk) 06:45, 2 September 2018 (UTC)

Images should be used sparingly, per MOS:PERTINENCE, which says "Images must be significant and relevant in the topic's context, not primarily decorative." For example, Tyson Gay has his own article where a photo of him is needed, but it isn't needed here as it doesn't add anything significant to an understanding of the Scunthorpe problem to know what he looks like.--♦IanMacM♦ (talk to me) 06:51, 2 September 2018 (UTC)

Scunthorpe Telegraph 2019-03-07 edition

This may be worth adding to the page at some point, although the only currently available references I can find for the filtering (or not, as it turned out) of offensive text are The Sun and Daily Mail, both of which are considered "Potentially unreliable sources".

https://www.holdthefrontpage.co.uk/2019/news/regional-publisher-launches-probe-after-word-c-appears-in-death-notice/ https://www.dailymail.co.uk/news/article-6781935/Local-paper-blunders-printing-two-rude-words-obituaries-page.html https://www.thesun.co.uk/news/8585580/scunthorpe-telegraph-swear-words-obituaries/ ErsatzCulture (talk) 17:24, 8 March 2019 (UTC)

Looked at this, but if true, it isn't really an example of the Scunthorpe problem. It is either a printing mistake or a prank by a bored copywriter, not a flaw by censorship software. The Sun source says "A spokesperson said: "We were testing the system to see if it would allow those words in and unfortunately it has not been removed." --♦IanMacM♦ (talk to me) 17:52, 8 March 2019 (UTC)

Solution section hunting

I'm currently looking for any source to back up the simple solution of adding spaces on either side of the offending word and its replacement, to distinguish expletives from parts of non-expletive words (e.g. distinguishing every single instance of the letters a-s-s appearing in that order from the word itself). That way, we can put in a bit about how to solve it. However, it's going a bit slow. Please help. Uaiazr Jxhiosh (talk) 01:24, 2 June 2020 (UTC)

Meatballs.

I notice that the ‘Words with Two Meanings’ sections mention faggots

I ALSO notice that Facebook has recently made the same mistake.

Is it worth including that?

Cuddy2977 (talk) 10:45, 26 February 2021 (UTC)

See also Faggot_(food)#Double_meaning, where exactly the same thing happened in November 2013.[14] It looks as though a filter has banned all uses of the word. It's a bit like the Plymouth Hoe story in January 2021.[15] The controversy over the double meaning of Faggot is mentioned at Faggot_(food), and it may be more on topic there.--♦IanMacM♦ (talk to me) 17:22, 26 February 2021 (UTC)

The Problem is also in Steam

Steam censors potentially offensive usernames, even when they aren't actually offensive, and also does that to links too. At least last time I saw this happen, which was last year 月夜丸ゼロ (talk) 01:46, 11 May 2021 (UTC)

Schmuck

Does Schmuck actually stand for penis?

marriam webster says yes

So i guess it is. But Wikipedia page for Schmuck (pejorative) says it mostly means one who is stupid or foolish, or an obnoxious, contemptible or detestable person. The word came into the English language from Yiddish (Yiddish: שמאָק‎, shmok), where it has similar pejorative meanings, but where its literal meaning is a vulgar term for a penis

It also says later ""The Yiddish word shmok derives from Old Polish smok "grass snake, dragon".[1][2][3]

In the German language, the word Schmuck means "jewelry, adornment".[4] It is a nominalization of the German verb schmücken "to decorate" and is unrelated to the word discussed in this article.[1]""

so whats the consensus?--LostCitrationHunter (talk) 15:12, 5 November 2021 (UTC)

@LostCitrationHunter: Has anybody claimed that the word schmuck "actually stands for penis" in English? Neither Merriam-Webster nor etymonline makes that claim. Both sources state that it means "penis" in (East) Yiddish; etymonline gives some further information about the history of the Yiddish word, but that's not particularly relevant in an article about the Scunthorpe problem. The article says that a man named Schmuck was unable to register on a website because of the meaning of the word in Yiddish. I'm not sure what the problem is here, to be honest. --bonadea contributions talk 17:32, 6 November 2021 (UTC)


--not sure how to edit this, but Clitheroe suffers this problem too — Preceding unsigned comment added by 85.211.180.141 (talk) 10:58, 26 January 2022 (UTC)

References

  1. ^ a b "Schmuck". Online Etymology Dictionary. Retrieved 17 Jan 2011.
  2. ^ "Schmuck". American Heritage Dictionary. Retrieved 5 Dec 2018.
  3. ^ Gold, David L. (1982). "More on Yiddish shmok". Comments on Etymology. 11 (15): 33–37.
  4. ^ "Schmuck". Leo – Online German-English Dictionary. pp. 360–362. Retrieved 13 Mar 2010.

This problem happened multiple times while reading this.

Because it said "Scunthorpe", and my bad word blocker changed it to "Sexpletivehorpe", because "Cunt" changes to "Expletive". Orrinpants (talk) 20:38, 19 April 2022 (UTC) (The time is way off because I forgot to sign it initially.)

Inconsistency of categorization

Some of the categories used for the examples on this page are not consistent, mainly "words with multiple meanings," which is the only category based on the type of profanity conflation, and thus has very heavy overlap with other categories. I noticed that a couple of the entries in that section didn't belong there at all, the ones on Penistone and Clitheroe, but I wasn't sure if they should be moved to the "blocked emails" section, especially since the Clitheroe one doesn't mention how the residents were inconvenienced. I guess this probably occurred due to examples and new categories being added piecemeal by various contributors, but I feel like if I try to fix it then I'll just make it worse, so I would appreciate if a more experienced article editor could do something. 2600:6C64:7F7F:FF78:FC88:2C95:A467:E858 (talk) 01:21, 30 July 2022 (UTC)

Bird charity locked out of Twitter after woodcock tweets

Is here. It looks as though "wood" and "cock" have been misinterpreted by a filter, but the news story stops short of saying this. ♦IanMacM♦ (talk to me) 17:14, 31 January 2023 (UTC)

Little snitched threatened with paypal ban after invoice contains ALEP

As reported on [16], not sure if noteworthy enough for the article but I'll leave it here. Nkuttler (talk) 08:36, 28 March 2023 (UTC)

Minecraft 2022 Word Filtering

This problem is currently ongoing in Minecraft due to a recent word filtering update that has banned the use of "night" and other words beginning with "nig" in certain contexts. This is the most trustworthy source currently disseminating this information and I'm expecting a trustworthy article that can be used as a reference will write about the topic. youtube.com/watch?v=p56oN3aAg3I 2600:1012:B1BC:967A:9002:23EB:3B6:AFE5 (talk) 01:58, 30 June 2022 (UTC)

As a general rule, YouTube videos aren't suitable sources because they are self-published. The video does show that typing in words like night and nigh in chats triggers a removal. This is pretty dumb, but ideally there should be secondary reliable sources mentioning this.--♦IanMacM♦ (talk to me) 06:15, 30 June 2022 (UTC)
A couple older version of Minecraft: Bedrock Edition/Windows 10 Edition, about a year or so ago, would censor "nig" while typing it into the creative inventory search (seen if trying to find a night vision potion). Maybe you can find a source for that? 64.185.19.49 (talk) 14:09, 28 March 2023 (UTC)