User:Sun Creator/Avoid domains and URLs

Generic code

edit

Add the following to any RegEx rule to generically avoid matching to correctly formed domains and URLs.

(?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})

Tests

edit
On '\b' start and end of every word for high numerical occurrence
  • \b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 491ms
  • \b(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 493ms //Somewhat unexpected given the results for 'a' words below.
  • \b(?<!\.[^\s\.]*) 408ms
  • \b(?![^\s\.]*\.\w) 385ms
  • \b 319ms
Word with 'a' on boundary
  • \ba(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 24ms
  • \ba(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 22ms
  • \ba(?![^\s\.]*\.\w) 20ms
  • \ba(?<!\.[^\s\.]*) 18ms
  • \ba 17ms
Realistic test, 'state' occurs six times in article during test.
  • \bstate\b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 2ms
  • \bstate\b(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 2ms
  • \bstate\b(?<!\.[^\s\.]*) 2ms
  • \bstate\b(?![^\s\.]*\.\w) 2ms
  • \bstate\b 2ms