Module talk:Namespace detect

Latest comment: 8 years ago by Rinellie in topic Issue when moving to a local Wiki

Allow parameter aliases

edit

Hi Mr. Stradivarius!

Could you change this to allow the configs cfg.main, cfg.talk, cfg.other, cfg.subjectns, cfg.demospace and cfg.page to get either an array of strings (and also just a string, for backwards compatibility) which would be aliases for the parameter name? This would help porting this module to other wikis, since it is common to have both the English parameter name and the translated parameter name being accepted (as in {{{Localized parameter|{{{English parameter|}}}}}}), or even two different translations for the same parameter. Helder.wiki 19:23, 6 March 2014 (UTC)Reply

Possible performance issues with Module:Namespace_detect

edit
Moved here from my user talk page. — Mr. Stradivarius ♪ talk ♪ 09:37, 20 March 2014 (UTC)Reply

Hi. Sorry for the bother. I just wanted to bring this matter to your attention.

Module:Namespace_detect appears to be at the root of some performance issues. For example, I believe it is because of Module:Namespace_detect that it takes about 30 seconds to preview an edit on List_of_Unicode_characters

List_of_Unicode_characters is a large page. However, I believe most of the time is spent by several of the Unicode charts calling Module:Namespace_detect indirectly through Template:Lang.

For example:

	for nsid, ns in pairs(mw.site.subjectNamespaces) do
		if nsid ~= 0 then -- Exclude main namespace.
			local nsname = mw.ustring.lower(ns.name)
			local canonicalName = mw.ustring.lower(ns.canonicalName)
			mappings[nsname] = {nsname}
			if canonicalName ~= nsname then
				table.insert(mappings[nsname], canonicalName)
			end
			for _, alias in ipairs(ns.aliases) do
				table.insert(mappings[nsname], mw.ustring.lower(alias))
			end
		end
	end
  • Note that this code loops 15 times (as there are 15 subject namespaces)
  • Each loop will call mw.ustring.lower 2 times (for simplicity's sake, we can ignore the mw.ustring.lower(alias))
  • So, for Template:Unicode_chart_Kannada, there will be 2,580 calls to mw.ustring.lower (86 * 15 * 2)
  • There are several more Unicode charts on the page that call lang: for example Template:Unicode chart Myanmar
  • All told, there are approximately 100,000 calls to mw.ustring.lower from one edit to this page
  • Although mw.ustring.lower and Language:lc are relatively simple procs, there are overhead costs with going back and forth between Lua / PHP. Even at 3,300 calls per second, it will take the aforementioned 30 seconds to preview an edit

I also have reason to believe that variations of this situation are repeated elsewhere on other pages. For context, I am the developer of XOWA (an offline wiki app), and I do monthly parses of English Wikipedia's 4.4 million mainspace articles. Module:Namespace_detect is flagged as one of the most time-consuming #invokes. I never understood why, until tonight.

As a recommendation, you could move getParamMappings to a new Module:Namespace_detect/Data page and use "return mw.loadData('Module:Namespace_detect/Data')". This change would be straightforward and improve performance, though you would have to change the Module to preserve the portable cfg table.

Let me know if you need more info.

Thanks.

gnosygnu 08:28, 20 March 2014 (UTC)Reply

This should be at some template/module talk page, with just a link here. Perhaps Mr.S would like to move all this? I have not looked at the issue, but the requirement to detect the namespace multiple times should be removed (obviously!). After ten seconds research, I'm guessing that {{Lang}} detects the namespace each time it is invoked in order to decide whether to add a category. That's way over-the-top on a page like List of Unicode characters that is apparently using {{Lang}} many times. A clever template coder should add a parameter which can be used to omit all that overhead. Johnuniq (talk) 09:27, 20 March 2014 (UTC)Reply
(edit conflict) Hi Gnosygnu, and thank you for the detailed analysis! I wrote Module:Namespace detect back when I wasn't that experienced with Lua, and I was actually thinking that I should go back and have a look at it now that I have a few more modules under my belt. While it (maybe, hopefully) should be faster than the old wikitext version, we can obviously make things a lot better. I'll have a look at it, implement your suggestions, and see if there are any other changes that might need making. Best — Mr. Stradivarius ♪ talk ♪ 09:34, 20 March 2014 (UTC)Reply
Sorry about the wrong placement, and thanks for moving it.gnosygnu 03:08, 21 March 2014 (UTC)Reply

Please replace the contents of the page with this. This will cause getParamMappings to only run once per page, rather than once per #invoke, per the above discussion. Jackmcbarn (talk) 16:20, 20 March 2014 (UTC)Reply

Hold on a second - I have some other changes I'd like to make to the code before we put this up live. At the moment we are expanding all the arguments we are passed, for example. It would be better to only expand the ones necessary for the namespace the page is called from. And I'd like to implement Helder's suggestion above as well. — Mr. Stradivarius ♪ talk ♪ 02:32, 21 March 2014 (UTC)Reply
Thanks for making the proposed changes. I made two changes now:
  • I re-added p.getParamMappings since it is public and other Modules call it. For example, Module:Category_handler has this: local mappings = nsDetect.getParamMappings()
  • I wasn't sure if the function p.table(frame) was supposed to change. I reverted it back to the current version. Feel free to revert back your version if the change was deliberate. It looks like this was deliberate. My apologies.
Otherwise, it tested fine in a limited test on my machine.
FWIW: my initial estimate of 100,000 calls looks like it is closer to 85,000 calls. This means there were roughly 2,833 calls to Module:Namespace_detect. (85,000 / (15 * 2))
The page still does roughly 27,000 calls b/c of the matchesBlacklist function. This correlates with the above figure: 27,000 calls / 9 blacklisted terms => 3,000 calls to Module:Category_handler. I don't know if this can be fixed easily, as matchesBlacklist can't be made a static variable and no_cat = false is the default
At any rate, I'm hoping this should drop List_of_Unicode_characters to somewhere between 6 and 8 seconds to render (vs approximately 30 now)gnosygnu 03:08, 21 March 2014 (UTC)Reply
You could probably use mw:Extension:TemplateSandbox to test it. (Also, I'm sure TemplateSandbox's documentation could use some love if you're willing to help.) Anomie 12:26, 22 March 2014 (UTC)Reply
Thanks @Anomie: for the link. I didn't know about this Extension before.
I tried it now, and got mixed results. I'm still getting the same profile render time (25 seconds), but the Lua Profile table is clearly different.
List_of_Unicode_characters
Lua Profile:
    Scribunto_LuaSandboxCallback::getAllExpandedArguments           4540 ms       49.3%
    Scribunto_LuaSandboxCallback::lc                                1660 ms       18.0%
    Scribunto_LuaSandboxCallback::match                             1140 ms       12.4%
    recursiveClone <mw.lua:109>                                      620 ms        6.7%
    Scribunto_LuaSandboxCallback::gsub                               360 ms        3.9%
    type                                                             100 ms        1.1%
    (for generator)                                                  100 ms        1.1%
    <mw.language.lua:87>                                              80 ms        0.9%
    getParamMappings <Module:Namespace_detect:69>                     80 ms        0.9%
    Scribunto_LuaSandboxCallback::loadPackage                         60 ms        0.7%
    [others]                                                         460 ms        5.0%
Sandbox
Lua Profile:
    Scribunto_LuaSandboxCallback::getAllExpandedArguments           3420 ms       57.0%
    Scribunto_LuaSandboxCallback::match                              760 ms       12.7%
    recursiveClone <mw.lua:109>                                      540 ms        9.0%
    Scribunto_LuaSandboxCallback::gsub                               320 ms        5.3%
    dataWrapper <mw.lua:698>                                         140 ms        2.3%
    (for generator)                                                  140 ms        2.3%
    Scribunto_LuaSandboxCallback::getExpandedArgument                140 ms        2.3%
    type                                                             120 ms        2.0%
    Scribunto_LuaSandboxCallback::lc                                  60 ms        1.0%
    <mw.title.lua:50>                                                 60 ms        1.0%
    [others]                                                         300 ms        5.0%
Clearly the sandbox is picking up the new changes (lc falls from 18% to 1%). I'm pretty sure I'm using the correct prefix: User:Gnosygnu/sandbox. If I use an incorrect prefix, such as User:Gnosygnu/sandbox_invalid, I get the same Lua Profile as the current page.
I'll play with this some more later, but I just wanted to let you know. After further investigation, I'm beginning to think that the multiple lc Scribunto calls have a more dramatic effect in LuaStandalone than LuaSandbox -- presumably because LuaStandalone serializes all messages back and forth. As such, Special:TemplateSandbox may very well be correct. The new change will reduce the number of lc calls, but won't have any real meaningful effect (maybe 1 second faster). Any performance issues with the current page might be due to regular Template expansion. gnosygnu 16:47, 22 March 2014 (UTC)Reply

@Gnosygnu and Jackmcbarn: I've finished making my changes to Module:Namespace detect/sandbox:

  • I've moved the configuration to a separate page, Module:Namespace detect/config. This is to try make the distinction between code and configuration clearer.
  • I have also implemented Helder's suggestion - now the parameter config values can be specified either as an array of strings or just a string.
  • All config values have now been made optional.
  • The p.main function now uses Module:Arguments, which means that arguments are now only fetched when they are needed. (Before they were all expanded before being passed to p._main.)
  • I've also simplified the p._main code to avoid an unnecessary for loop. As part of this, I have replaced the tail call from p.main to p._main with a retval. This is to make more explicit the fact that on finding no matches at all, the module should return nil for other Lua modules and the blank string for #invoke. If there are any objections to this, we can always revert it back to an implicit return value.
  • Finally, I have revamped the p.table code so that the namespaces are now displayed in order, and so that |talk=yes actually works.

Let me know what you think of the changes, and if everything looks good we can update the main module. — Mr. Stradivarius ♪ talk ♪ 13:31, 22 March 2014 (UTC)Reply

@Mr. Stradivarius: Thanks for the changes. I think they're fine. I've done a modified test on my machine, as well as the TemplateSandbox (See my comment above). "lc" is no longer a significant portion of execution time. I'm still seeing the same number of calls to "match", but I think the calling templates need to be changed. FWIW, my own tests (using XOWA) show no measurable performance difference with the latest changes. This may be because I'm using LuaStandalone vs LuaSandbox. (LuaSandbox is not possible in a Windows / Java environment). There may be other apples to oranges issues as well, though I still believe that "lc" is a significant performance cost gnosygnu 15:44, 22 March 2014 (UTC)Reply
Yes, I expected that my latest changes wouldn't increase performance for most pages. For pages using {{namespace detect}} directly, the switch to Module:Arguments may provide a significant boost, depending on what wikitext would have otherwise been expanded. But most transclusions of Module:Namespace detect come through Module:Category handler, and those uses aren't affected by that change. Your original suggestion is definitely the big performance-saver. For further performance savings we might want to consider removing the page and demospace parameters, as that would enable caching of the namespace data with mw.loadData. If we did the same thing with Module:Category handler, we would also be able to cache the blacklist checks. The downside to that would be that the module test cases would no longer work, but I think that the performance benefits might outweigh that disadvantage. Another performance saving could be made by changing Module:Category handler to only expand arguments when necessary. That could be done by using Module:Arguments, and by using metatables to pass arguments to Module:Namespace detect by proxy. — Mr. Stradivarius ♪ talk ♪ 16:33, 22 March 2014 (UTC)Reply
Okay. Thanks for the explanation. I'm beginning to think that even the "lc" calls won't make a noticeable difference because of LuaSandbox / LuaStandalone differences. (see my comment above). If so, then the real performance problems may be non-Module related, though I'm at a loss to suggest what. (Templates?) gnosygnu 16:47, 22 March 2014 (UTC)Reply
@Mr. Stradivarius: I don't like local variables to avoid table lookups. I think that makes the code more confusing, and I don't think it really helps much (if at all) with performance. Other than that, looks good. Jackmcbarn (talk) 00:51, 23 March 2014 (UTC)Reply
If this was a less performance-critical module I would probably agree with you, and I admit that I'm guilty of overusing this technique. However, if we're talking about tens of thousands of calls to a loop every time a page is parsed, I think it would make enough of a difference to be worth doing. I'm basing this on Lua Performance Tips by Roberto Ierusalimschy, which says that using local variables is 30% faster than using table lookups. Of course, 30% faster than hardly anything is still hardly anything, so it might make sense to change some of those local variables back to table lookups to make the code easier to read. In particular I would guess that using them in the /data page is not important, since that is cached with mw.loadData now. But it would make sense to use local variables for the functions that are getting called multiple times per #invoke. — Mr. Stradivarius ♪ talk ♪ 11:06, 23 March 2014 (UTC)Reply
@Gnosygnu: The real performance problems are definitely not module-related. When I previewed List of Unicode characters just now with the old version of Module:Namespace detect, it took 40 seconds to parse, but the Lua time usage was only reported at about 4.3 seconds. So the other 36 seconds must be from something other than Lua. Templates are a very likely candidate - the current post-expand include size is 1944444/2048000 bytes, and I count 207 different templates and modules used - but I have also seen performance issues like this on pages containing lots of images. This would be a good question to ask at WP:VPT, I think. — Mr. Stradivarius ♪ talk ♪ 11:06, 23 March 2014 (UTC)Reply
@Mr. Stradivarius: Yeah, sorry about that. I should've checked the Profile at the start. On my machine, the lc calls to Scribunto has a much larger impact, but I'm beginning to think that this is because
  • I only partially reconstructed List of Unicode characters and most of the templates were not brought over
  • I'm using LuaStandalone because I'm on Windows. LuaStandalone serializes all messages back and forth from Lua to PHP. (In contrast, Wikipedia is using LuaSandbox which hooks PHP directly to Lua)
  • There really are 85,000+ calls to lc, and I saw a significant difference by skipping this section.
In the future, I'll check the Parser Output more closely. I'll also try to set up a full enwiki environment on a machine here so I can get a better comparison.
Thanks for making the changes . I'm hoping they should still help in some way. gnosygnu 21:07, 23 March 2014 (UTC)Reply
Nothing to apologise about - your concerns were perfectly valid, and I want this module to run well on all wikis, not just the English Wikipedia. I've updated the main module now with the sandbox version, so we will now see if our efforts have paid off. :) Let me know if you spot anything strange happening. Best — Mr. Stradivarius ♪ talk ♪ 11:58, 24 March 2014 (UTC)Reply
Yeah, it looks like barely a second of difference, if even. Oh well. (*sigh*)
Thanks again for the changes. gnosygnu 02:43, 25 March 2014 (UTC)Reply

Module:Namespace detect/data

edit

Moved from my talk page. — Mr. Stradivarius ♪ talk ♪ 14:30, 25 March 2014 (UTC)Reply

Hello. Is there any good reason to include local cfg = require('Module:Namespace detect/config') on enwiki. The config file is "empty". Christian75 (talk) 09:13, 25 March 2014 (UTC)Reply

The module needs to check the /config page to see if there are any configuration differences that it needs to process. This allows us to use the same basic code on enwiki as on other wikis using different languages. If those differences weren't in the /config file, they would have to be put in the main module code, which would mean we would have to produce different modules for all the different languages we wanted to localise to. It's a labour-saving device - if we didn't do the localisation work here, it would fall to editors on the different wikis to directly alter the module code. I would guess that this way of doing things is much easier for editors working in localisation to understand. If you're worried about performance, the /data page is loaded with mw.loadData, which means the config table is cached per page, so the overhead in loading it is minimal. — Mr. Stradivarius ♪ talk ♪ 14:38, 25 March 2014 (UTC)Reply
or we could just change:
local cfg = require('Module:Namespace detect/config')

with

-- local cfg = require('Module:Namespace detect/config')  -- enable this line and disable the next line if you want to localize the output
  local cfg = {}

Its 9,900,000 transclusions of nothing at enwiki. Christian75 (talk) 10:35, 26 March 2014 (UTC)Reply

We could, but then that would mean that users would have to edit the module code to configure the module, rather than just the configuration file. I'd rather only do this if there is a real problem with loading the config file on enwiki, and I don't think the module having a high transclusion count is really enough of a reason. — Mr. Stradivarius ♪ talk ♪ 11:01, 26 March 2014 (UTC)Reply
If it were a problem, it would be possible to have a "no config" argument passed to the module, and if that were set, it would skip loading the config. However, I agree with Mr. Stradivarius—the overhead is probably very small, and there is no evidence that omitting the config would achieve anything. Even if requiring a module took a significant time (it doesn't), the caching of the 9.9 million pages means that the require overhead would still be small. What I find interesting is that the main module uses "loadData" on the data module, and the latter uses "require" on the config module. I wonder what would happen if something prohibited by loadData were in config. I'll have to run a test one day. Johnuniq (talk) 11:27, 26 March 2014 (UTC)Reply
mw.loadData scans the table it is passed to make sure it hasn't received any invalid data. (Here's the validation code.) So the invalid data would get through to the /data page fine, but when the table is loaded from /data to the main module it would cause an error. But yes, try it and see for yourself. :) — Mr. Stradivarius ♪ talk ♪ 12:39, 26 March 2014 (UTC)Reply
Also, checking a "no config" argument may have a bigger overhead than loading the /config module, especially seeing as an argument would be checked once per #invoke and not once per page. I'd be interested to see the numbers for that if anyone feels like setting up a test. — Mr. Stradivarius ♪ talk ♪ 12:49, 26 March 2014 (UTC)Reply
I don't see harm in loading a "blank" config. I think it would just overcomplicate it to add any such checks. Jackmcbarn (talk) 21:25, 27 March 2014 (UTC)Reply

Script error in WikiProject banner on Talk:Air Miles

edit

The WikiProject banner at the top of Talk:Air Miles contains the following:

  • This Script error does not require a rating on the project's quality scale.
  • This Script error is supported by the airline project.

The error message in both cases is: Lua error in Module:Namespace_detect at line 32: bad argument #1 to 'ipairs' (table expected, got nil). @Mr. Stradivarius: ping. Jackmcbarn (talk) 21:06, 13 April 2014 (UTC)Reply

It's gone from that page now, but it's since appeared on Talk:Illicit major and Talk:Río Negro (newspaper). Jackmcbarn (talk) 03:06, 14 April 2014 (UTC)Reply
Possibly the error is cached after the last update to the module? Some pages may have produced script errors in the few seconds between me updating the /config module and me updating the /data module. Given the number of transclusions this module has, it's quite likely that some might have stuck around since then. If there are any errors that still appear after purging the page, let me know. — Mr. Stradivarius ♪ talk ♪ 04:23, 14 April 2014 (UTC)Reply

Protected edit request on 14 May 2014

edit

77.30.122.165 (talk) 08:20, 14 May 2014 (UTC)Reply

  Not done: it's not clear what changes you want made. Please mention the specific changes in a "change X to Y" format. — Mr. Stradivarius ♪ talk ♪ 08:36, 14 May 2014 (UTC)Reply

Issue when moving to a local Wiki

edit

Not asking for a change, as it could potentially break stuff here on Wikipedia (no idea). Just for information.

Took a while to figure out why it didn't work on my local Wiki. But think I solved it after changing the following lines in Module:Namespace_detect/data. Starting at line 64

	for nsid, ns in pairs(mw.site.subjectNamespaces) do
		if nsid ~= 0 then -- Exclude main namespace.
			local nsname = ns.name
			local lcNsname = mw.ustring.lower(ns.name)
			local canonicalName = mw.ustring.lower(ns.canonicalName)

			-- For custom namespaces (custom Wiki) the casing is important
			-- and needs to be preserved
			if canonicalName == lcNsname then
				nsname = lcNsname
			end
			mappings[nsname] = {nsname}

			if canonicalName ~= nsname then
				table.insert(mappings[nsname], canonicalName)
			end
			for _, alias in ipairs(ns.aliases) do
				table.insert(mappings[nsname], mw.ustring.lower(alias))
			end
		end
	end

Rinellie (talk) 18:59, 30 March 2016 (UTC)Reply