Module talk:DecodeEncode
Module:DecodeEncode is permanently protected from editing because it is a heavily used or highly visible module. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit.
|
Bug report: bad decoding of U+03B5 ε (epsilon)
editAbout U+03B5 ε GREEK SMALL LETTER EPSILON (ε ε)
- Issue: after resolving HTML entity
ε
bymw.text.decode()
, the plain character is not found bymw.ustring.gsub()
. No issue with alternative HTML entityε
. ε good, ε bad.
- Report limitations: Original report and bug reproduction is at enwiki Module talk:DecodeEncode, from where en:module:DecodeEncode and en:module:String are used live. At phabricator pseudocode may be used and some "results" may be hardcoded. In-text the escape
&
is used, not in-function. Lua patterns not used ("no%
").
- To reproduce:
- 1. Create research string:
Xε1Xε2X
(shows live and unedited as: Xε1Xε2X)
- 2. Render the string by
decode()
(as inner function) - 3. then on rendered result use
gsub()
to replace plain characterε
→E
: (as outer function)mw.ustring.gsub( s=(
[is pseudo-code, see note. 21:10, 7 February 2023 (UTC)]mw.text.decode( s=Xε1Xε2X, decodeNamedEntities=true )
), pattern=ε, repl=E )
- 4. Result3 (s&r pattern use ε from
Xε1X
):- XE1XE2X
- 5. Result4 (s&r pattern use ε from
Xε2X
):- XE1XE2X
- Expected:
XE1XE2X
(only one characterε
exists)
- Note 21:10, 7 February 2023 (UTC): This step 3 is in pseudo-code. To reproduce, use Lua modules module:String and Module:DecodeEncode:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
- → XE1XE2X
- -DePiep (talk) 21:10, 7 February 2023 (UTC)
Workaround A, ad hoc
editWorkaround A, ad hoc: add innermost function to first replace in the research string ε
→ ε
:
- A1:
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s={{#invoke:String|replace|source=Xε1Xε2X|pattern=ε|replace=ε|plain=true}}}}|pattern=ε|replace=E|plain=true}}
→ - XE1XE2X
Workaround B, in module (THIN SPACE example)
editWorkaround B: early in :en:module:DecodeEncode, replace ε
→ ε
About THIN SPACE: it looks like character U+2009 THIN SPACE (   ) has a samilar issue.   good,   bad.
Currently in code:
function p._decode( s, subset_only )
local ret = nil;
s = mw.ustring.gsub( s, ' ', ' ' ) -- Workaround for bug:   gets properly decoded in decode, but   doesn't.
ret = mw.text.decode( s, not subset_only )
return ret
end
In en:module:DecodeEncode/sandbox, I have coded a similar handling of EPSILON:
function p._decode( s, subset_only )
local ret = nil;
-- U+2009 THIN SPACE: workaround for bug: HTML entity   is decoded incorrect. Entity   gets decoded properly
s = mw.ustring.gsub( s, ' ', ' ' )
-- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity ε is decoded incorrect for gsub(). Entity ε gets decoded properly
s = mw.ustring.gsub( s, 'ε', 'ε' )
ret = mw.text.decode( s, not subset_only )
return ret
end
- /sandbox tests:
- B.
{{#invoke:String|replace|source={{#invoke:DecodeEncode/sandbox|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
- B1. ResultB1 (s&r pattern use ε from
Xε1X
): XE1XE2X - B2. ResultB2 (s&r pattern use ε from
Xε2X
): XE1XE2X
I propose to edit the module along this way.
Workaround C (mw, Lua)
editChanges in mw, Lua: I have not idea.
- I propose to consider module editing along § Workaround B. -DePiep (talk) 12:26, 4 February 2023 (UTC)
testcases EPSILON
edit- Original failure, now solved=not showing any more:
-
- (hardcoded explanation here): in cell marked , the result showed as "XE1Xε2X". That is: wikitext input "
ε
" was not recognised & replaced. -DePiep (talk) 07:49, 19 February 2023 (UTC)
- (hardcoded explanation here): in cell marked , the result showed as "XE1Xε2X". That is: wikitext input "
EPSILON ε ⟨ε ⟩ error & fix proposal (16 Feb 2023)
| |||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 |
id | entity code | plain | mod:.. decode(&entity;) | replace(decode(..)) with E pattern=hardcoded ⟨ε⟩ from plain (s=&entity;) (s=checkstring) |
mod:..decode/sandbox |
checkstring | Xε1Xε2X
|
>Xε1Xε2X< | >Xε1Xε2X< | ||
EPSI | ε
|
>ε< | >ε< | E XE1XE2X |
E XE1XE2X |
EPSILON | ε
|
>ε< | >ε< | E XE1XE2X |
E XE1XE2X |
- See § Workaround B, in module (THIN SPACE example) for code change;
- Similar fix as U+2009 THIN SPACE ( ,  ) has (though original cause bug may be different for THIN SPACE).
- Phabricator T328840 did not gain traction. Would be mw-level, not this module.
Template-protected edit request on 16 February 2023
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
- Please copy all code from module:DecodeEncode/sandbox into module:DecodeEncode (diff)
- Issue: bad decoding of HTML entity
ε
- re U+03B5 ε GREEK SMALL LETTER EPSILON (ε, ε)
- Change: fix by replacing with entity
ε
before applyingdecode()
. See § Workaround B for code diff & backgrounds; minor comment change - Discussion: (1) reported at T328840, no responses (mw-level); (2) bug report here not challenged
- Testcases: See § testcases EPSILON.
- DePiep (talk) 06:49, 16 February 2023 (UTC)
NBSP behaviour
editLeaving this note here.
About NBSP, U+00A0 NO-BREAK SPACE ( ,  ). With input
I am experiencing problems reminding of § epsilon (T328840, now resolved).
When nested like: (replace|s=(decode|s=AB YZ
)|replace=AB_YZ) returns breaking code (breaking when used in/with HTML/css code like span, sup, class).
No time to build the reproduction/test, so have to leave it for now. Not reported on phab. DePiep (talk) 07:27, 20 February 2023 (UTC)
Template-protected edit request on 21 March 2023
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Please replace all code Module:DecodeEncode with module:DecodeEncode/sandbox. (compare )
Change: apply require('strict')
, and declade function local explicit. DePiep (talk) 14:34, 21 March 2023 (UTC)
|answered=pause
: needs some extra eyes first. Will invite. -DePiep (talk) 14:36, 21 March 2023 (UTC)
- Invitation is out. -DePiep (talk) 14:49, 21 March 2023 (UTC)
- Upd: Gonnym has made large improvements, so the sandboxdiff is large. I do not see strict-related changes. DePiep (talk) 21:31, 21 March 2023 (UTC)
- The changes are good and no globals remain. The two mw.ustring could be string. Johnuniq (talk) 06:40, 22 March 2023 (UTC)
- thx. As said, please someone with trust perform ER because me editing/commenting in between does not help. DePiep (talk) 08:18, 22 March 2023 (UTC)
- The changes are good and no globals remain. The two mw.ustring could be string. Johnuniq (talk) 06:40, 22 March 2023 (UTC)
- Upd: Gonnym has made large improvements, so the sandboxdiff is large. I do not see strict-related changes. DePiep (talk) 21:31, 21 March 2023 (UTC)
- Set
|answered=no
after two positive critiques. Also, I met no error while developing with this sandbox. -DePiep (talk) 09:00, 22 March 2023 (UTC)
- Done — Martin (MSGJ · talk) 18:35, 22 March 2023 (UTC)