Module talk:Emoji

(Redirected from Module talk:Emoji/doc)
Latest comment: 7 months ago by Trappist the monk in topic Where do these names come from?

I feel like we can make this better

edit

Hi everyone:

I have worked on Lua modules before, and after stumbling upon this one, I feel we can make it better and easier for Wikipedians to get emoji codes in user and talk pages. I am calling for users who have worked on this module before, specifically @RexxS, @Qzekrom and @Izno, but really anyone who wants to help. To anyone who wants to help, please reply and we can begin working on a plan!

Additionally, I started another module, Module:WEmoji, and may be able to use this module for WEmoji.

More technical details: We could make it require less typing, e.g. by adding it to a template (See Template:Wikipedia ads for an example), so you wouldn't have to type in as much text. For example, {{Emoji Smiley}}. Or, to use mappings, {{Emoji 1f603}}.
I will wait 2 months, and if I do not receive a reply, I will begin working on this by myself.
Urban Versis 32 (talk) 13:32, 22 May 2022 (UTC)Reply

Where do these names come from?

edit

If you look at U+1F507 🔇 SPEAKER WITH CANCELLATION STROKE (linked here:🔇), you will see it is called Muted Speaker. That name comes from Unicode's CLDR short name. However,

{{#invoke:emoji|emoname|1f507}} → mute

So where are do these names from? Please document. Dpleibovitz (talk) 04:20, 9 March 2024 (UTC)Reply

Ok, I did some hunting around. The latest charts can be found at
And it shows *muted speaker | mute | quiet | silent | speaker
So there can be many aliases for a name. Note sure that the module works with all of these. Will it always return the first alias (and never the full name)? Dpleibovitz (talk) 04:49, 9 March 2024 (UTC)Reply
I don't know where that (rather limited) list came from. The editor who added it is no longer with us so we can't ask.
The annotations chart you link doesn't seem to me to be the definitive list of emoji. Perhaps a better list is https://unicode.org/Public/emoji/latest/emoji-test.txt (version 15.1 at this writing). This appears to list all current emoji with their proper names.
I have hacked a lua module in my sandbox that reads a local copy of the emoji-test.txt file to create a replacement for Module:Emoji/data. Some items in Module:Emoji/data are not in the sandbox list by the same name (wink, grin, 8ball from the examples in the module doc are some).
Module:Emoji/data is only used in one mainspace article (Irony punctuation § Emoji and Emoticons) so replacing the data table with the new one from my sandbox requires only that article to be fixed (because the abbreviated names rolling_eyes, stuck_out_tongue, and upside_down should be face_with_rolling_eyes, face_with_tongue, and upside-down_face).
So, what to do? Nothing? Replace the data in Module:Emoji/data with data derived from emoji-test.txt? with data derived from some other source?
Trappist the monk (talk) 15:32, 10 March 2024 (UTC)Reply
I think a computer readable form of the specs can be found at https://github.com/unicode-org/cldr/blob/latest/common/annotations/en.xml
Don't understand the format completely. Each annotation has near duplications, but with a type=tts that identifies the short name. But it does have all the keywords. However, each locale has a completely different xml file, e.g., https://github.com/unicode-org/cldr/blob/latest/common/annotations/en_CA.xml The explanations for the XML can be found https://unicode.org/reports/tr35/tr35-general.html#Annotations
A "proper" Module:Emoji might need to parse all the english language locales to build up a single data module, if we figure out what we want this module for. (I have my personal suggestions). Dpleibovitz (talk) 18:12, 11 March 2024 (UTC)Reply

So we seem to have two sources of data, but your's (which is newer and more extensive) doesn't have CLDR keywords, nor locale information., while https://www.unicode.org/cldr/charts/44/annotations/americas.html does. For example "trade mark" is the general short name (with "trademark" a keyword), but in the en-CA locale, these two are reversed. As you say, this module is currently used in one article, but I have suggestions as to use in thousands more.

  • Before beginning, a quick background of me. I have my own merged fork of Wikipedia, Wiktionary, Wikiquote, etc. that starts out with zero content. As I add the pages I desire, I add all the categories, templates, modules that are needed to support them. Typically none of this content is modified much. However, Wikipedia articles are merged with their disambiguations, as well as the Wiktionary articles of both initial cases. Many Wikipedia redirects are replaced by their Wiktionary articles. All wiktionary templates, modules and catergories have a "Wikt/" prefix that is stripped out via DISPLAYNAME to prevent clashes. Wish there was MediaWiki support for a USING keywords! If you want to discuss my work, lets do that in your or my talk page, or offline via email.
  • In Wikipedia, I have modified the {{r from emoji}} (on my fork) to take all the CLDR keywords as extra parameters. The first parameter is the short name which is currently manually specified, but could be automated with this module. I have also added a |locale=en-CA parameter. In my fork, the {{r from emoji}} is called twice for ™️. Unfortunately, these currently use historical Unicode values in title case. At some point, Unicode changed all names to be completely in lower case, so all invocations of {{r from emoji}} should be modified. In my fork, I have categories for every keyword and short name, e.g, category:aKeyword (emoji keyword) and category:aShortName (emoji short name), so {{r from emoji}} will both link its displayed output to these categories, as well as categorize the current page into them sorted by short name. I have made the same modifications to wikt:Template emojibox (and added the first short name parameter), so I can use this for non-redirect entries such as trademark, trade mark and (called twice for the general and en-CA locale). Wikipedia could use this as well, perhaps simply disabling the output but retaining the categorizing. Perhaps nobody wants the categorizing at all but It makes finding emojis (sorted with their short names as articles or redirects) much easier than a single long list for each alphabetic character - perhaps that's what the emoji/Unicode articles are for, but they don't exist on Wiktionary. Module:Emoji could automate everything, but neither the current implementations nor your improvements would help my purposes so I'm still manually entering things (for the few emoji's I;m interested in). Perhaps what I want, data module wise is
    shortname[codepoint] = {
       {"short name", "keyword1", "keyword2", "etc."},
       locale["en-CA"] = {"short name", "keyword1", "keyword2", "etc."},
       locale["en-NZ"] = {"short name", "keyword1", "keyword2", "etc."},
    },
    

Currently, I don't know where both locale and keyword data would come from that is computer readable. But I would get the module to produce the outputs of {{r from emoji}} and wikt:Template emojibox, sometimes twice or more if needed. I could place my modifications to {{r from emoji}} in my user page if you like. They're fairly minor and don't yet use this module.

  • PS. I think we should use spaces as the standard suggests, and not underscores.
  • PPS. I also added {{unichar|emoji=short name}} for my purposes. With this module, it could simply be |emoji=yes instead. Ordinarly {{unichar}} displays the unfriendly long/permanent name for the code-point, but it is not functionally friendly for emojis.

Dpleibovitz (talk) 00:05, 11 March 2024 (UTC)Reply

I am not going to pretend understanding of, or interest in, your own private wikipedia fork. Any work I do with Module:Emoji must have primary benefit to en.wiki. If that work benefits your private wiki, great.
I agree that underscores are inappropriate but that is how the existing data table is written so my sandbox mimics that. I modified Module:Emoji to allow emoji-names-with-spaces so that users don't have to write the underscore word separators (also case insensitive):
{{#invoke:Emoji | emocode | Arrow Double Down}} → 23ec
Trappist the monk (talk) 13:50, 11 March 2024 (UTC)Reply
When I suggest things on Wikipedia, they are only for the benefit of Wikipedia and not for my purposes. Nevertheless, some suggestions do come from my fork where I have worked out some kinks. From Wikipedia's perspective, my fork is simply a test bed or sandbox. I no longer use Module:Emoji at all as I think it is ill-founded, and I believe that https://unicode.org/Public/emoji/latest/emoji-test.txt is a test bed and not to be used the way you are doing so. Let me explain why I think so, but note I'm far from a Unicode or CLDR expert. In Unicode, the code point (long) names are fixed for life even if they have spelling errors. Aliases do exist and could change for fixes. CLDR is about localization. Within a locale, the short names (some are not even emojis) should be unique. They keywords are much like our categories and many emoji's belong to the same keyword. I will give examples. They keywords were developed for a particular purpose in mind by Google and others - for predictive text. Type enough keywords, and the system will suggest the various emojis. I believe that the emoji-test.txt file simply verifies that the processing of such names and keywords is correct, but it does not specify the short names, nor the keywords for code points (nor their locales), whereas I think https://www.unicode.org/cldr/charts/44/annotations/americas.html does them all (but is not computer readable). Here are some examples.
{{#invoke:Emoji | emocode | wink}}1f609
But the short name for the 😉 emoji ahould be winking face not wink. The code from americas.html shows *winking face | face | wink The 😉 pages uses {{R from emoji|Winking Face}} which is the correct short name but in a historical casing - they should all be fixed, but perhaps that's what they wanted for DEFAULTSORT. The keyword wink also belongs to 😜 (winking face with tongue). So in this case, the Module returns a keyword (arbitrarily the 2nd) or consumes a keyword which is ambiguous. They keyword 8ball doesn't even exist, but perhaps predictive text separates numbers from letters. However the entry for 🎱 (1F3B1) is *pool 8 ball | 8 | ball | billiard | eight | game (there is an en-CA locale entry as well). For 🚡 (1f6a1), the module correctly return the short name. The entry is *aerial tramway | aerial | cable | car | gondola | tramway wuth an additional en-AU locale. In summary, Module:Emoji sometimes consumes or returns an arbitrary keyword, and sometimes a short name. Not a reliable mechanism that can be used.
In any case, if this does get resolved, could you add to the documentation something such as
details
Values are derived from (your, mine or other unicode chart)
See also
{{r from emoji}} - which uses the short name
----
I will update {{r from emoji}} documentation suggesting my location for getting short names. There may be others. If you find a computer readable version, please let me know. Dpleibovitz (talk) 17:23, 11 March 2024 (UTC)Reply
Yeah, https://unicode.org/Public/emoji/latest/emoji-test.txt is not ideal, and is likely intended for some sort of testing, but that aside, it seems to be the most complete list available. The companion list https://unicode.org/Public/emoji/latest/emoji-sequences.txt is similar but doesn't separate ranges of emoji into their individual elements (23E9..23EC; fast-forward button..fast down button; ⏩..⏬ but no description of 23EA and 23EB).
The current Module:Emoji/data and the short-names lists are clearly incomplete: 1220-ish and 1960-ish v. 4000+ (from ~/emoji-test.txt). And, clearly, the names in the current Module:Emoji/data should not be trusted.
It is possible to extract short names, hex values, and emoji from https://www.unicode.org/cldr/charts/44/annotations/americas.html. It is even possible to extract separate locale data. I question the usefulness of the pipe-separated keywords. My first hack at creating a better list for Module:Emoji/data was to extract the generic en locale short names from ~/americas.html; I wish now that I hadn't discarded that original code.
Trappist the monk (talk) 23:06, 11 March 2024 (UTC)Reply
Module:Sandbox/Trappist the monk/Emoji short name data make has separate tables for generic English, world, Australian, Canadian, United Kingdom, and Indian locales.
Trappist the monk (talk) 16:46, 13 March 2024 (UTC)Reply