Wikipedia talk:Lua string functions

Created essay

edit

The essay "WP:Lua string functions" was created by long-term user Wikid77 to describe the capabilities, performance, and limitations when using Lua script to handle text strings in Wikipedia pages.

Lua text problems of wikitables and nowiki tags

edit

I am worried about "bugs" in Lua chopping any text strings which contain wikitables, either with "{|" tokens or with "<table>" tags. While I have been able to use Lua to scan entire articles (with expansion of all infoboxes, span-tags, {convert}'s, category links, and navboxes), but unless every wikitable is commented-out by "<!--...-->" or noinclude'd, then Lua's view of the article contents stops at the first wikitable. That action seems like a bug, where Lua should allow all article-page data into a text string. Also, Lua cannot see inside a nowiki tag (nor inside a "<pre>" tag), which always has length 43 characters, and never reveals any contents between "<nowiki>...</nowiki>" but only text before/after nowiki tags. I wonder if we need a new tag for literal text, such as "<literal>...</literal>". These are complex issues, and we need to discuss them in another long page.

Example of <table>: Seems to work now. Compare the effect of "<table>" with the Lua-based Template:Str_find in searching the whole string length:

  • {str_find|123456789012|78} → 7
  • {str_find|123456789012|90} → 9
  • {str_find|1234<table><tr><td>5678</td></tr></table>9012|78} → 22
  • {str_find|1234<table><tr><td>5678</td></tr></table>9012|90} → 42
  • {str_find|12345<span>67890</span>12|78} → 13

Using the "<table>" tag formerly stopped the string. Also, inserting a wikitable "{|" in column 1 would produce a similar effect (since "{|" generates a "<table>" tag). However it seems correct now in this example.

Example of <nowiki>: Compare the effect of "<nowiki>" with the Lua-based Template:Str_len in getting the whole string length:

  • Nw1: {str_len|123456789012} → 12
  • Nw2: {str_len|1234<nowiki>5678</nowiki>9012} → 42
  • Nw3: {str_len|1234<nowiki>567890</nowiki>12} → 40

Using the "<nowiki>" tag hides the text but counts as +43 characters. So, for the case Nw2, the length is 4+43+4=51, and Nw3 yielded 4+43+2=49 long.

It has been suggested that the mw.text.unstrip function at MediaWiki's "Lua reference manual" (which is not yet live) may be of help with the nowiki tags. Also see: T47085. -Wikid77 (talk) 15:36, 16 March 2013 (UTC)Reply

  • Passing tables into Lua is limited by parser restrictions: When trying to pass wikitables into a Lua-based template, or into a Lua-invoke function, then the vertical-bar pipe in "{|" is likely to stop the text string at that point. In such cases, it would be better to just pass a pagename into Lua, where Lua could read the text from inside the page rather than attempt to pass wikitables in parameter contents. The nowiki-tag or pre-tag elements are also limited by parser processing, so Lua would see only the 43-character wp:strip markers placed in the text, after parsing, and have no access to the nowiki-delimited text. -Wikid77 (talk) 23:13, 16 March 2013 (UTC)Reply