Template talk:Strlen quick
This is the discussion/talk-page for: Template:Strlen_quick.
Created
editThe fast string-length counter, Template:Strlen_quick, was created by long-term user Wikid77 on 30 January 2011, to provide a very fast string-length template, optimized for improved performance with actual Wikipedia data. It is also optimized to use limited wiki-markup resources in the NewPP MediaWiki preprocessor, by using expansion depth of only 5 levels, rather than 9-to-14 levels used by other string-length templates. -Wikid77 10:09, 30 January 2011 (UTC)
Optimizing for actual string lengths
edit30-Jan-2011: The Template:Strlen_quick was created, as a faster alternative to {str_len}, by optimizing for real string data as used in articles. Using the actual string searches, from existing Wikipedia articles, it is possible to determine the most-likely string lengths, such as 17/18 characters for titles. Then, optimize to match those lengths faster: for example, suppose the top 1,000 articles all used an infobox code of 9 letters, in that case, checking for length 9, first, could avoid checking other lengths. In the case of 353,000 articles using {{Italic_title}}, the string lengths range from 2-99 letters, with the most-common lengths between 16-19 long, and 88% of all titles < 30 long. The distribution of lengths of titles has been as follows:
- 84% > 10, 12% < 10, 51% in 10-19, 25% in 20-29, 7% in 30-39, 1.7% in 40-49, 0.6% >50.
For lengths 0-9, the increase is dramatic: almost no titles are 1 or 2 characters, a few are 3, some are 4, then more have lengths 5, 6, 7, 8, with 9 as 19x times more common than length 3. In trying to match title-length quickly, then check for the most-common first, as length 9-to-1 in reverse order.
Among lengths 10-19, the most common are at 17/18, then fewer when farther away, with 10 being the least-frequent length among those. Above 20, the lengths decrease in frequency, 21-to-29, as the reverse of 9-1, so checking 21, first, is 3x times more likely to match than 29. Among 30-39, the titles are quite rare, with 31 being as rare as length 5, and 39 being 3x times more rare, as occurring only 43-per-10,000 titles. By optimizing for the actual lengths of titles, those lengths can be matched perhaps twice as quickly. A pure binary search would give unfair advantage to rare lengths, so the string-search should be prioritized in favor of the more common lengths.
The markup logic, below, uses prioritized steps (the actual markup handles length over 70):
LOGIC to match 1-to-60 lengths in order of most common real data: {{ #ifeq: x{{{1}}}|x{{padleft:{{{1}}}|20}} | {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|30}} | {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|40}} | {{#switch: x{{{1}}} | {{padleft:|41|x{{{1}}}}} = 40 | {{padleft:|42|x{{{1}}}}} = 41 | {{padleft:|43|x{{{1}}}}} = 42 | {{padleft:|44|x{{{1}}}}} = 43 | {{padleft:|45|x{{{1}}}}} = 44 | {{padleft:|46|x{{{1}}}}} = 45 | {{padleft:|47|x{{{1}}}}} = 46 | {{padleft:|48|x{{{1}}}}} = 47 | {{padleft:|49|x{{{1}}}}} = 48 | {{padleft:|50|x{{{1}}}}} = 49 | {{padleft:|51|x{{{1}}}}} = 50 | {{padleft:|52|x{{{1}}}}} = 51 | {{padleft:|53|x{{{1}}}}} = 52 | {{padleft:|54|x{{{1}}}}} = 53 | {{padleft:|55|x{{{1}}}}} = 54 | {{padleft:|56|x{{{1}}}}} = 55 | {{padleft:|57|x{{{1}}}}} = 56 | {{padleft:|58|x{{{1}}}}} = 57 | {{padleft:|59|x{{{1}}}}} = 58 | {{padleft:|60|x{{{1}}}}} = 59 | #default= 60 <!--when >= 60 and none of the above--> }}<!--endsw 40's++ --> | {{#switch: x{{{1}}} | {{padleft:|31|x{{{1}}}}} = 30 | {{padleft:|32|x{{{1}}}}} = 31 | {{padleft:|33|x{{{1}}}}} = 32 | {{padleft:|34|x{{{1}}}}} = 33 | {{padleft:|35|x{{{1}}}}} = 34 | {{padleft:|36|x{{{1}}}}} = 35 | {{padleft:|37|x{{{1}}}}} = 36 | {{padleft:|38|x{{{1}}}}} = 37 | {{padleft:|39|x{{{1}}}}} = 38 | #default= 39 }}<!--endsw 30's--> }}<!--endifeq 40--> | {{#switch: x{{{1}}} | {{padleft:|21|x{{{1}}}}} = 20 | {{padleft:|22|x{{{1}}}}} = 21 | {{padleft:|23|x{{{1}}}}} = 22 | {{padleft:|24|x{{{1}}}}} = 23 | {{padleft:|25|x{{{1}}}}} = 24 | {{padleft:|26|x{{{1}}}}} = 25 | {{padleft:|27|x{{{1}}}}} = 26 | {{padleft:|28|x{{{1}}}}} = 27 | {{padleft:|29|x{{{1}}}}} = 28 | #default= 29 }}<!--endsw 20's--> }}<!--endifeq 30--> | {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|10}} | {{#switch: x{{{1}}} | {{padleft:|18|x{{{1}}}}} = 17 | {{padleft:|19|x{{{1}}}}} = 18 | {{padleft:|17|x{{{1}}}}} = 16 | {{padleft:|20|x{{{1}}}}} = 19 | {{padleft:|16|x{{{1}}}}} = 15 | {{padleft:|15|x{{{1}}}}} = 14 | {{padleft:|14|x{{{1}}}}} = 13 | {{padleft:|13|x{{{1}}}}} = 12 | {{padleft:|12|x{{{1}}}}} = 11 | #default= 10 <!--when >= 10 and none of above--> }}<!--endsw 10's++ --> | {{#switch: x{{{1}}} | {{padleft:|10|x{{{1}}}}} = 9 | {{padleft:|9|x{{{1}}}}} = 8 | {{padleft:|8|x{{{1}}}}} = 7 | {{padleft:|7|x{{{1}}}}} = 6 | {{padleft:|6|x{{{1}}}}} = 5 | {{padleft:|5|x{{{1}}}}} = 4 | {{padleft:|4|x{{{1}}}}} = 3 | {{padleft:|3|x{{{1}}}}} = 2 | #default= 1 }}<!--endsw 1's--> }}<!--endifeq 10--> }}<!--endifeq 20-->
Tests of the above code show that it, in fact, processes actual title lengths about 2x times (twice) as fast as the binary-search markup logic which has been used in template {{str_len}}. -Wikid77 10:09, 30 January 2011, revised 01:21, 22 February 2011 (UTC)
Zero length string returns length=1
edittesting:
{{Strlen quick|aaa}}
→ 3{{Strlen quick|aa}}
→ 2{{Strlen quick|a}}
→ 1{{Strlen quick|}}
→ 0{{Strlen quick|1=}}
→ 0{{Strlen quick}}
→ 0
- I think the last three are in error. -DePiep (talk) 07:54, 15 June 2012 (UTC)