This page is for discussion of the factors that "define" a Spam article.

This page is NOT for discussion of keywords - see User:FearBot/Wordlists for that.

Primary Factors

edit

The main factors I use for identifying articles as not spam are:

  • Contains stub tags
  • Contains disambig tags
  • Multiple templates (multiple instances of {{ in text)
  • At least one image
  • At least 5 wikilinks
  • At least one External Link
  • At least one section heading
  • Over 100 bytes long (approx 100 characters)
  • Presense of translations (At least the tags saying there are)
  • Comments
  • Containing HTML
  • Using Infoboxes
  • Using citations and references
  • Containing categories and headings
  • Being a redirect page (This will cause the page to be ignored. It is identified by being a page with one or two lines, and the first line containing a redirect tag)

A single one of these factors won't cause FearBot to mark as spam, it needs to find multiple factors. Some are more likely to be related to spam, such as no links is more likely to be spam than say no images.

Major Spam Factors

edit
  • Large pages (not good for rating existing articles but for new articles its rare for them to be very large (e.g. the size of Bill Gates))
  • HUGE numbers of External Links
  • Many exclamation points
  • Default formatting examples (e.g. '''Bold text''' and == Headline text == )
  • Containing signatures (detected by containing <--[[User:, in future I may expand to just plain linking to user pages)
  • The page title being in ALL CAPS.

More Coming

edit

I am updating this page constantly, so be aware it is a WIP.

Suggestions

edit

If you have any suggestions, please add them here.

Comments

edit

If you have any comments on existing items, please add them here. For comments on suggestions, do it in the above section, indented, below the relevant suggestions.

Evaluation Function

edit

The full evaluation function can be found at User:FearBot/EvalFunc