- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
Extra rules
editThis rules are intended for individuals editors to manually add to the "find and replace" as they require.
Grammar and punctuation
edit- Space after a full stop for terminal punctuation
- find="(\s)([A-Za-z\]\)]{2,})(?<!\b[Aa]lt|www)\.(?!([Cc]om|a[cdefgilmnoqrstuwxz]|aaa|aero|Arie|arpa|asia|aspx?|b[abdefghijmnorstvwyz]|[Bb]elieve|biz|blog|c[acdfghiklmnorsuvxyz]|cat|coop|co|config|d[ejkmoz]|doc|e[ceghrstu]|edu|exe|f[ijkmor]|g[abdefghilmnpqrstuwy]|gif|gov|h[kmnrtu]|html?|i[delmnoqrst]|info|int|j[emop]|jpeg|jpg|jobs|k[eghimnprwyz]|l[abcikrstuvy]|linux|m[acedghklmnopqrstuvwxyz]|mail|mil|mobi|musuem|n[acefgilopruz]|name|net|new|nnn|ogg|ogv|org|om|p[aefghklmnrstwy]|pairs|pdf|php|plot|png|post|pro|qa|r[eouw]|read|s[abcdeghijklmnortvyz]|scot|start|stop|svg|t[cdfghjklmnoprtvwz]|tel|tfm|travel|trojan|txt|u[agkmsyz]|v[aceginu]|virus|w[fs]|wav|y[etu]|y[etu]|Phil)\b)([A-Za-z][a-z]{1,}\b(?=\s|\,))"
- replace="$1$2. $4"
- Puts a space after a period. Many exceptions i.e website(thesite.com) computer code(code.Function), software names(some.Software), India.Arie etc To do:documentation
- '(A/a)n to a' in A and an usage
- find="\b([A-Za-z]{2,}(?<!\bar|\bagus|Anteil|Briefe|Coire|\b[ds]ich|Erinnerung|\b[Hh]ier|Kritik|\bich|\bnicht|\bsind|\bund|\bun)\,?\;?|[a-z]{2,}\.\s?)\san\s((?!<(ch|nd)\san\s)(?!Bord|d\s|das\b|de\b|de[mnrs]\b|de[nr]en|deiner|[Dd]ich\b|die(\b|se)|Diotima|[Dd]omhain|Deutsch|drei\b|ein\b|f\b|[Jj]ede[rn]|jenem|[Kk]indern|l\b|leFH|m\b|meine|[Mm]i(ch|r)\b|n\b|r\b|s\b|sein(e?\b|e[mnr])|si(ch\b|e(\b|ben))|tOire|Taoisigh|Tighe|x\b|viele|ytt|x(ray|mas|HCI)|zu\b)[bcdfgjklmnpqrstvwxyzBCDGJKPQTVWYZ][A-Za-z1-9]{0,})"
- replace="$1 a $2"
- i.e. "An day -> A day". Picks up some false positives so the process requires human check every time. i.e. 'Simon an Garfunkel'=>'Simon a Garfunkel' rather then ignored. Some documentation
- Space after comma or semi-colon
- find="\b([A-Za-z\)\]]{2,})(\,\,?|\;)(?<!left;|&|center;|Chaos;|collapse;|inline,|<|[mn]dash;|nbsp;|none;|serif;|Steins;)(?<=\s[A-Za-z\)\]]{2,}\,\,?|\;)(?>([A-Za-z\(\[]{1,}))"
- replace="$1$2 $3"
Hyphenation
edit- Remove unneeded hyphen after "re-" prefix, as in "re-distribute" -> "redistribute"
- find="\b([Rr])e\-(?!dress|creat|releas|form)([a-df-z][a-z]+)\b"
- replace="$1e$2"
- It skips words where a hyphen is used to distinguish between homographs (e.g. re-creation vs. recreation). Also, some dictionaries prefer a hyphen in "re-release". You can easily add other exceptions as needed. It also skips cases where the root word begins with "e" (as in re-emphasise). In British English usage, the hyphen is favored in more cases than in American usage, especially when the prefix ends in a vowel and the root word starts with a vowel (as in "re-invigorate", though even oxforddictionaries.com does not hyphenate this).
Typo
edit- village
- find="\b([Vv])llage\b"
- replace="$1illage"
- Volume around 20 occurrences
- It
- find="\b([Ii])t(has|was|would)\b"
- replace="$1t $2"
Endings
edit- The nature of creating pretty edit summaries within the existing AWB rule ability means that endings are slow. Use when processing power and time is not a premium.
- -ably
- find="\b(\S[a-z-]*)(ba|abb|[eiu]b|abl|b)l+yl?\b(?<=(?:escrrib|[Pp]rob|urb|mplac|ecc|[dlmpry]ic|ctic|voc|ead|[om]id|[au]nd|[mp]end|laud|[ain]ce|gree|rsee|[anr]ge|nshake|stake|alle|erme|use|[Ll]ove|ize|[ae]ff|tig|ach|augh|tig|ish|vai|reci|soci|medi|ifi|reli|ami|deni|xpi|vari|sati|[Pp]iti|nvi|peak|istak|ink|[Ss]ark|sal|mbl|ail|pell|roll|viol|onsol|cul|lam|tim|amm|[ht]om|orm|sum|san|[miv]en|[eu]gn|[afglm]in|amn|[dis]on|cap|[au]lp|[ao]pp|[ep]ar|ecr|sider|onder|[ef]fer|ngener|umer|lner|pher|lner|nsuper|[lt]ter]|iser|over|uver|wver|[ms]ir|[dlmnvx]or|nerr|[es]tr|[cds]ur|[nv]our|[ad]is|gnis|dvis|pens|nvers|\bpass|[cf]us|\b[u]s|reat|alat|edoubt|ract|elect|dict|luct|[bdfmpruv]it|[aeu]nt|not|ccept|ort|\b[s]t|test|ett|[fmprt]ut|rgu|alu|equ|ssu|iev|[eg]iv|driv|[lmr]ov|erv|low|joy|[ns]iz)(\2)l+yl?)\b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*)"
- replace="$1ably"
- Volume 40-80 occurrences.
- False positives: None with database scanner September 2012.
- This rule checks 310 words with 7+ variant misspellings (Fixes a potential 2170 typos (310 words * 7 variants), but not anything very numerous.
Abbreviations
edit- October
- find="\b(?<!\S)[Oo]ct\.(\s)"
- replace="October$1"
- Government
- find="\b(?<!\S)(G|g)ovt\.(\s)"
- replace="$1overnment$2"
- B.Sc. and M.Sc
- find="\b([BM]\.Sc)(\s)(?<!\S[BM]\.S\s)"
- replace="$1.$2"
- Ph.D.
- find="\b(?<!\S)Ph\.D(\s)"
- replace="Ph.D.$1"
- Mr., Mrs., Ms. and Dr.
- find="\b(Dr|Mr|Mrs|Ms)\.(?<!\S(?:Dr|Mr|Mrs|Ms)\.)(?<!Fokker Dr.)([A-Z])"
- replace="$1. $2"
Capitialisation
edit- California
- find="\bcaliforni(an?s?)\b"
- replace="Californi$1"