Talk:Sentence boundary disambiguation

Latest comment: 8 years ago by 2A01:E35:2EF3:D930:4969:4D0F:C351:86F7 in topic Vanilla Strategy poorly phrased

this should probably be merged with sentence extraction. which word is more appropriate? not sure. se is more on the business side, and sbd seems to be the technical and appropriate term. Josh Froelich 20:51, 17 December 2006 (UTC)Reply

edit

Encountered a 404 on the SATZ link — Preceding unsigned comment added by 76.184.141.23 (talk) 05:50, 20 March 2012 (UTC)Reply

Maybe there is a problem with the RegExp

edit

The pcre regExp seems to have an extra closing bracket in front of the final \s. This works better: ((?<=[a-z0-9])[.?!])|(?<=[a-z0-9][.?!]\")(\s|\r\n)(?=\"?[A-Z]) , but it still wrongly detects the page numbers in in-text citation; e.g. (p.180) as an end-of-sentence. Hkandy (talk) 12:36, 17 May 2013 (UTC)Reply

Vanilla Strategy poorly phrased

edit

The vanilla strategy can make sens, but is poorly phrased:

  • (a) If it's a period, it ends a sentence.
  • (b) If the preceding token is in the hand-compiled list of abbreviations, then it doesn't end a sentence.
  • (c) If the next token is capitalized, then it ends a sentence.

If (b) is true, (a) can not be true. IMO It would be better phrased as if (a), (b) and (c) are true, then the period is the end of sentence. So it can be rewritten by dropping "it ends a sentence." in the first item. I'm not sure, that's I, I did not edit. — Preceding unsigned comment added by 2A01:E35:2EF3:D930:4969:4D0F:C351:86F7 (talk) 08:57, 21 October 2015 (UTC)Reply