Extra rules

edit

This rules are intended for individuals editors to manually add to the "find and replace" as they require.

Grammar and punctuation

edit
Space after a full stop for terminal punctuation
  • find="(\s)([A-Za-z\]\)]{2,})(?<!\b[Aa]lt|www)\.(?!([Cc]om|a[cdefgilmnoqrstuwxz]|aaa|aero|Arie|arpa|asia|aspx?|b[abdefghijmnorstvwyz]|[Bb]elieve|biz|blog|c[acdfghiklmnorsuvxyz]|cat|coop|co|config|d[ejkmoz]|doc|e[ceghrstu]|edu|exe|f[ijkmor]|g[abdefghilmnpqrstuwy]|gif|gov|h[kmnrtu]|html?|i[delmnoqrst]|info|int|j[emop]|jpeg|jpg|jobs|k[eghimnprwyz]|l[abcikrstuvy]|linux|m[acedghklmnopqrstuvwxyz]|mail|mil|mobi|musuem|n[acefgilopruz]|name|net|new|nnn|ogg|ogv|org|om|p[aefghklmnrstwy]|pairs|pdf|php|plot|png|post|pro|qa|r[eouw]|read|s[abcdeghijklmnortvyz]|scot|start|stop|svg|t[cdfghjklmnoprtvwz]|tel|tfm|travel|trojan|txt|u[agkmsyz]|v[aceginu]|virus|w[fs]|wav|y[etu]|y[etu]|Phil)\b)([A-Za-z][a-z]{1,}\b(?=\s|\,))"
  • replace="$1$2. $4"
Puts a space after a period. Many exceptions i.e website(thesite.com) computer code(code.Function), software names(some.Software), India.Arie etc To do:documentation
'(A/a)n to a' in A and an usage
  • find="\b([A-Za-z]{2,}(?<!\bar|\bagus|Anteil|Briefe|Coire|\b[ds]ich|Erinnerung|\b[Hh]ier|Kritik|\bich|\bnicht|\bsind|\bund|\bun)\,?\;?|[a-z]{2,}\.\s?)\san\s((?!<(ch|nd)\san\s)(?!Bord|d\s|das\b|de\b|de[mnrs]\b|de[nr]en|deiner|[Dd]ich\b|die(\b|se)|Diotima|[Dd]omhain|Deutsch|drei\b|ein\b|f\b|[Jj]ede[rn]|jenem|[Kk]indern|l\b|leFH|m\b|meine|[Mm]i(ch|r)\b|n\b|r\b|s\b|sein(e?\b|e[mnr])|si(ch\b|e(\b|ben))|tOire|Taoisigh|Tighe|x\b|viele|ytt|x(ray|mas|HCI)|zu\b)[bcdfgjklmnpqrstvwxyzBCDGJKPQTVWYZ][A-Za-z1-9]{0,})"
  • replace="$1 a $2"
i.e. "An day -> A day". Picks up some false positives so the process requires human check every time. i.e. 'Simon an Garfunkel'=>'Simon a Garfunkel' rather then ignored. Some documentation
Space after comma or semi-colon
  • find="\b([A-Za-z\)\]]{2,})(\,\,?|\;)(?<!left;|&|center;|Chaos;|collapse;|inline,|<|[mn]dash;|nbsp;|none;|serif;|Steins;)(?<=\s[A-Za-z\)\]]{2,}\,\,?|\;)(?>([A-Za-z\(\[]{1,}))"
  • replace="$1$2 $3"

Hyphenation

edit
Remove unneeded hyphen after "re-" prefix, as in "re-distribute" -> "redistribute"
  • find="\b([Rr])e\-(?!dress|creat|releas|form)([a-df-z][a-z]+)\b"
  • replace="$1e$2"
It skips words where a hyphen is used to distinguish between homographs (e.g. re-creation vs. recreation). Also, some dictionaries prefer a hyphen in "re-release". You can easily add other exceptions as needed. It also skips cases where the root word begins with "e" (as in re-emphasise). In British English usage, the hyphen is favored in more cases than in American usage, especially when the prefix ends in a vowel and the root word starts with a vowel (as in "re-invigorate", though even oxforddictionaries.com does not hyphenate this).

Typo

edit
village
  • find="\b([Vv])llage\b"
  • replace="$1illage"
Volume around 20 occurrences
It
  • find="\b([Ii])t(has|was|would)\b"
  • replace="$1t $2"

Endings

edit
  • The nature of creating pretty edit summaries within the existing AWB rule ability means that endings are slow. Use when processing power and time is not a premium.
-ably
  • find="\b(\S[a-z-]*)(ba|abb|[eiu]b|abl|b)l+yl?\b(?<=(?:escrrib|[Pp]rob|urb|mplac|ecc|[dlmpry]ic|ctic|voc|ead|[om]id|[au]nd|[mp]end|laud|[ain]ce|gree|rsee|[anr]ge|nshake|stake|alle|erme|use|[Ll]ove|ize|[ae]ff|tig|ach|augh|tig|ish|vai|reci|soci|medi|ifi|reli|ami|deni|xpi|vari|sati|[Pp]iti|nvi|peak|istak|ink|[Ss]ark|sal|mbl|ail|pell|roll|viol|onsol|cul|lam|tim|amm|[ht]om|orm|sum|san|[miv]en|[eu]gn|[afglm]in|amn|[dis]on|cap|[au]lp|[ao]pp|[ep]ar|ecr|sider|onder|[ef]fer|ngener|umer|lner|pher|lner|nsuper|[lt]ter]|iser|over|uver|wver|[ms]ir|[dlmnvx]or|nerr|[es]tr|[cds]ur|[nv]our|[ad]is|gnis|dvis|pens|nvers|\bpass|[cf]us|\b[u]s|reat|alat|edoubt|ract|elect|dict|luct|[bdfmpruv]it|[aeu]nt|not|ccept|ort|\b[s]t|test|ett|[fmprt]ut|rgu|alu|equ|ssu|iev|[eg]iv|driv|[lmr]ov|erv|low|joy|[ns]iz)(\2)l+yl?)\b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*)"
  • replace="$1ably"
  • Volume 40-80 occurrences.
  • False positives: None with database scanner September 2012.
  • This rule checks 310 words with 7+ variant misspellings (Fixes a potential 2170 typos (310 words * 7 variants), but not anything very numerous.

Abbreviations

edit
October
  • find="\b(?<!\S)[Oo]ct\.(\s)"
  • replace="October$1"
Government
  • find="\b(?<!\S)(G|g)ovt\.(\s)"
  • replace="$1overnment$2"
B.Sc. and M.Sc
  • find="\b([BM]\.Sc)(\s)(?<!\S[BM]\.S\s)"
  • replace="$1.$2"
Ph.D.
  • find="\b(?<!\S)Ph\.D(\s)"
  • replace="Ph.D.$1"
Mr., Mrs., Ms. and Dr.
  • find="\b(Dr|Mr|Mrs|Ms)\.(?<!\S(?:Dr|Mr|Mrs|Ms)\.)(?<!Fokker Dr.)([A-Z])"
  • replace="$1. $2"

Capitialisation

edit
California
  • find="\bcaliforni(an?s?)\b"
  • replace="Californi$1"