Talk:Base32

Latest comment: 11 months ago by ReadOnlyAccount in topic Encoding table format

Triaconta-wha?

edit

The part about the 32-sided polygon sounds like pure weapons-grade balonium to me. Anyone for killing it? --tcsetattr (talk / contribs) 21:08, 24 February 2008 (UTC)Reply

Tria-conta-kai means three-tens-and- and is used in poligon naming.
A Tria-conta-kai-digon is a poligon with 32 sides.
Instead I would say that Tria-conta-kai-decimal is wrong and means a base30 number.
It should be: Tria-conta-kai-di-decimal. And a base 64 should be named: Hexa-conta-kai-tetra-decimal.
08:58, 16 October 2012 (UTC) — Preceding unsigned comment added by 62.77.56.17 (talkcontribs)
Not quite "base30"; "triacontakaideka" means "thirty and ten", so "triacontakaidecimal" would mean something like "base thirty-ten", which doesn't make sense. This name does not seem to be mentioned in any of the references; was it made up? If it's WP:OR it should be removed. The right name would probably be something like "triacontakaidial" (Greek) or "duotrigesimal" (Latin, as in "sexagesimal"). —Cousteau (talk) 13:21, 26 April 2020 (UTC)Reply
"The part about the 32-sided polygon" is fine – except for the name. The section itself makes sense and is sourced, but the name "triacontakaidecimal" does not make sense and is indeed {{not in source}} and likely original research. It was first added here.
I came here to say the same thing as User:62.77.56.17, but actually User:Cousteau is correct: The problem is not just the possibly accidental omission of the di infix, but also that the -decimal construction does not make sense past the -teens. Hexa(6)-decimal(10) makes sense for 0-15 (16 numbers), but the contrivance of tria-conta-kai-di-decimal is technically incorrect and more trouble than it's worth.
I am more forgiving of the "base32hex" name, even though its -hex suffix is also linguistically dubious. In its defence, base32hex is independently established (i.e. not just by Wikipedia-recursion) and well-sourced, and -hex also has this Etymology 1 sense going for it, which matches hacker jargon. Finally, the "base32hex" system extends the hexadecimal numeral system, so perhaps the suffix is justified on that basis too.
Upon further consideration, triacontakaidial (or triakontakaidial) and duotrigesimal also look correct to me, but perhaps finding quality sources for them would be good. They look sufficiently obvious/correct for me to refrain from deleting them as WP:OR. (If Wikipedia started deleting perfectly reasonable and regular words—as opposed to claims—as unsourced WP:OR, that would create a lot more problems...) Both vigesimal (20) and quadragesimal (40) are in Merriam-Webster, but curiously trigesimal (30) is not. The OED paywalls its trigesimal entry. Going from trigesimal to duotrigesimal is obvious. Duotrigesimal also appears in the List of numeral systems.
Another issue here is the question whether these systematic names, however logical, should be used specifically for this "base32hex" system, for base-23 in general, or both. Apparently this article was split on 2007-11-27 and then merged again on 2015-02-21 (cf. #Proposal for merging). The lede did mention duotrigesimal before the split (since 2006-10-26), but the "merge" was effectively a deletion, and the term was not added back to the lede. When the "triacontakaidecimal" name first showed up "at birth" in the aforementioned edit, the text added also included someone's name in a byline (since removed, by the same editor). Does that name identify the IP-editor who added this, the inventor of this particular system, or both? Did that person come up with the technically very much incorrect "triacontakaidecimal" moniker?
ReadOnlyAccount (talk) 02:20, 24 October 2023 (UTC)Reply
If the etymologically meaningful (and easy to understand) "duotrigesimal" is already used elsewhere in Wikipedia, I say to switch to that one without asking. Even if it turned out to also be wrong, made up, and unreferenced, so does "triacontakaidecimal"; and "duotrigesimal" at least makes sense, so at the very least it's an improvement. —Cousteau (talk) 09:37, 24 October 2023 (UTC)Reply
I have since found that Wiktionary has a trigesimal entry, which references the OED. Wiktionary also has a duotrigesimal entry, which I think is a logical systematic production. Wiktionary does not have entries for triacontakaidecimal, triacontakaididecimal or triacontakaidial. However, I have not found any evidence the term duotrigesimal is any closer associated with the system RFC 4648 calls base32hex (or Base 32 Encoding with Extended Hex Alphabet) than with any of the other base-32 systems covered in the article. This is where I return to the point that I think this article requires careful attention to tease apart the generic and germane information on base-32 numeral systems in general, and the information on all these respectively different base-32 notation systems in particular. Those are somewhat different things, and there's not currently enough attention paid to the former. I did weigh in on the question whether these should be in different articles, see #Proposal for merging, but I don't really have a very strong opinion on that matter. All of which brings me to the point that I believe duotrigesimal probably belongs in the lede of the article on base-32 numeral systems in general – as it once was, see history and see above. I don't think base32hex is any more duotrigesimal than the other duotrigesimal notation systems. However, perhaps base32hex, aka the RFC 4648 Base 32 Encoding with Extended Hex Alphabet deserves at least second billing after the RFC 4648 Base 32 Encoding [sic]. Base32hex is the more logical of the two, however only the other system solves the problem of confusingly similar digits without relying on non-ubiquitous fonts. That said, I don't necessarily think it's the job of a numeral system to correct or compensate for the mistakes of font designers. —ReadOnlyAccount (talk) 11:17, 24 October 2023 (UTC)Reply
Fair enough. I see that the term "duotrigesimal" (or maybe "triacontakaidial") makes sense as a general name for the base-32 numeric system in general, not of a specific representation (and, unlike the case with hexadecimal, which is virtually always represented as 0–9 A–F, there isn't an agreed de facto standard to represent base-32 but several competing ones). In that regard, it is true that the correct term for that section in particular is indeed "base32hex" (a preexisting and semi-official term coined in RFC 4648 as you pointed out), along with the more descriptive name «"extended hex" base 32» (which explains why they're using the particle "hex" in the name – it just means that it was conceived as an extension of hexadecimal). So I'll replace the two occurrences of "triacontamadeupword" with "base32hex".
As for whether there should be two separate articles for base-32 as a numeric system and as a data encoding format, I agree that it makes sense to differentiate the concepts; but then again I don't think the former has many practical uses other than serving as the base for the latter, so it probably makes no sense to create a separate article for it. There's already the Radix article which should cover for the general case of base-n systems. This article should focus on Base32 as an encoding system. I'll modify the introduction to clarify this fact.
PS: as a curiosity, Wiktionary contains separate entries for both triacontakaitetragon and triacontatetragon (two synonyms for 34-gon), but not for 32-gon nor apparently any other 30+n-gon. I don't know what makes a 34-sided polygon so special. However, it does contain the German word Zweiunddreißigeck (meaning 32-gon).
Cousteau (talk) 01:02, 26 October 2023 (UTC)Reply
Thanks for getting rid of "triacontakaidecimal". I'm a little scared to think this mistake stuck around for 16 years, and it might yet echo through wiki-mirrors and maybe elsewhere, so it's possible we haven't heard the last of the triacontamadeupword just yet, but for now I'm happy it's gone. There's still a bit of a riddle as to whether that one guy came up with it, but we're better off without it. I do want to argue for inclusion of the systematic and OED-adjacent duotrigesimal term though.
We agree there are at least two different concepts in play, the numeral system and its encoding scheme(s). One might even differentiate three concepts, the numeral system, its notation, and software implementations. Either way, and with respect, I think you're drawing the wrong conclusion in deciding to only have the article reflect Base32 as an encoding system.
Differently put: There are two things, thing foo and thing bar. (One might further tease apart thing bar into bar and baz, but as the movie line goes, that's not important right now.) Earlier on, there was a dispute as to whether to have two articles for foo and bar or just one foobar article. The article has split and merged at least once. To now deduce from awareness of this history that the merged article should finally only focus on bar and leave foo to cursory mentions elsewhere, I think that's the wrong conclusion. I think there definitely is foo and bar content to be covered, whether in one article or two. There might not be much point refighting the split/merge battle, but I think all the content can and should be covered. I do see that if we compare octal, decimal, hexadecimal and tetrasexagesimal (that's base64 to you and me), then we can see that the former articles give us much more foo content, while base64 is almost exclusively bar. Note how the hexadecimal article begins: "In mathematics and computing, the hexadecimal (...) numeral system is...", while base64 starts off with: "In computer programming, Base64 is...". And I'm not even arguing with that at this point. I think it does play a role what a thing is most notable for. However, in the case of duotrigesimal, or base32, I think this is an in-between, even though base32 is arguably even less well known in its bar role. Base64 is extremely well known under its bar identity, and I pity the foo' who even knows it's a foo too. Foo's don't even know its name as that (even though, again, sexagesimal is in OED and the tetra- prefix is acceptably systematic). Again, base64 is well-known as a bar. Base32 is not that well known as a bar, and neither is it that well known as a foo, but it's also not so obscure and so context-free that both foo and bar couldn't be covered in its article, and I think there's a good argument for covering both, and even covering the relative obscurity and all attendant uncertainties.
For instance, I think the article could benefit from a Nomenclature section, especially given how between (the proper noun-capitalised) Base32, base32, base-32, duotrigesimal (and also base32hex, etc.) several of these have been used (in the article and more importantly in the wild) in both a foo and bar sense, and it isn't always clear which was meant. I suppose even the-name-that-shan't-be-named is now in the wild, but absent extremely non-circular WP:RS, I'm much happier to leave that out in perpetuity. I'd be much more lenient towards inclusion of hard-to-secondary-source bar stuff like base32h, which looks like a case of parallel invention of more or less the same thing as Crockford's Base32, by someone who probably didn't know about Crockford's Base32.
In light of the fact there is real-world uncertainty out there (some of the conventions not being conclusively settled), but also in light of the fact that the maths and language conventions are sound and systematic, I think there is room for a little leniency without opening the floodgates.
I've also realised even the much-vaunted RFC 4648 is still only a "Proposed Standard", and moreover, the description of its base32 scheme as "the RFC 4648 symbol set" is quite wrong. RFC 4648 describes not one system but five: Two base-64 systems, two base-32 systems, and the hexadecimal we all know and love, for good measure I suppose. I'm more and more leaning towards giving the second duotrigesimal system in RFC 4648 second billing, ahead of all the other alternatives, while duly providing, of course, sufficient explanation as to the forwhy in the actual article. —ROA, Dec 2: I've just done that now.
And finally, I know what makes the 34-sided polygon so special, and I've sussed out why Wiktionary has these entries for the triacontatetragon. The 34-gon is special because it's constructible. And there used to be a triacontatetragon article, but even though it was well-sourced and illustrated and sufficiently special, the Deletionista somehow had it in for it and repeatedly nominated it for deletion until they got their way, and then they dumped somebody else's child to die in a ditch, alienating another few good editors in the process, no doubt. I would wholeheartedly support reinstatement. Who cares if it's not for everybody? You don't like it, you don't understand it, you don't get the appeal? You don't have to look at it! Just click on something else. Don't try to ban meat for everybody on grounds that you can't even when it comes to chewing it. There are people who like lording it over others a lot better than they like making the world a better place – and perhaps they feel incapable of doing the latter, so they won't allow others to do it. And that's why DC hates the BRI.
ReadOnlyAccount (talk) 14:40, 26 October 2023 (UTC)Reply
PS: Mandatory reading for deletionists. Not that I'm certain they're equipped to handle it, but one might as well try.

Proposal for merging

edit

In reality, the only interesting thing one can say about this base is its use in Base32. In my opinion, the article Base 32 should be "merge" to this article in accordance with Wikipedia:Notability (numbers). QQ (talk) 11:19, 23 May 2008 (UTC)Reply

(Oppose) I have added base-32 numerals of Ngiti to meet the notability criteria. - TAKASUGI Shinji (talk) 08:23, 24 June 2008 (UTC)Reply
(Compromise?) Given the similarity of the names, and the differences in concept, perhaps a disambiguation reference would be in order. Mdin617 (talk) 21:03, 16 July 2008 (UTC)Reply

Response to Proposal for merging

edit

This does not seem a good idea. The two subjects, Base 32 and Base32 encoding, are very different. Merging the two would only result in replacing two clear and concise articles with a single long and confusing article. —Preceding unsigned comment added by 194.221.133.226 (talk) 11:26, 26 June 2008 (UTC)Reply

User:Double sharp executed the merger of Base 32 into Base32 several years ago. I think that was a bad idea and done without, if not against, consensus. Despite the confusingly similar names, having separate articles for the numeral system and related computer encodings probably would be preferable. The present (merged) article is not great in terms of describing the numeral system. It's mostly about computer encodings and may leave people who aren't programmers or very IT-minded more befuddled than anything else. It might even confuse people who know a lot about numeral systems and maths but who are not programmers. ReadOnlyAccount (talk) 02:57, 24 October 2023 (UTC)Reply
PS: Second thought, given the importance of, but non-consensus on representation of base-32 digits, maybe a shared article showing all these implementations is okay – IF it is edited for a general audience, with due weight and attention to the numeral system and a good general introduction to it. — Preceding unsigned comment added by ReadOnlyAccount (talkcontribs) 04:26, 24 October 2023 (UTC)Reply

Crockford's Base32

edit

This is incredibly un-noteworthy original research, why is this even in here? Vote remove. — Preceding unsigned comment added by 81.129.57.239 (talk) 22:59, 16 July 2013 (UTC)Reply

Crockford's Base32 has a few sources cited in this article, so it appears to meet the Wikipedia:Notability definition, and does not appear as the kind of "original research" described by Wikipedia:No original research. --DavidCary (talk) 05:01, 26 June 2014 (UTC)Reply
This does indeed not look like original research, but I think some sources describing where it is used could be useful. Currently there is only a link to Crockford's website describing the system. (There used to be a few more links describing implementations for several platforms, but they were removed; anyway I don't think these references did prove notability.) —Cousteau (talk) 15:49, 20 July 2017 (UTC)Reply

"Padding" belongs in the last column of the table

edit

Shouldn't

padding	=

be better placed at the bottom of the last column in the table? Jidanni (talk) 03:41, 8 April 2020 (UTC)Reply

Presumably User:Jidanni was referring to the table in the RFC 4648 Base32 alphabet section?
(I have no opinion on whether to move the "padding    =" entry to the rightmost column.) ReadOnlyAccount (talk) 04:05, 24 October 2023 (UTC)Reply
Second thought, I think Jidanni has a point, because the current arrangement suggests that padding follows 7 before 8, which it clearly shouldn't. ReadOnlyAccount (talk) 04:43, 27 October 2023 (UTC)Reply
I have just done this – I've moved the padding entry to the right. –ReadOnlyAccount (talk) 01:43, 2 December 2023 (UTC)Reply

Another variant?

edit

I was trying to identify the variant this coding (see element "base32 ...") uses but it appears to be none of the listed one. 50.68.41.27 (talk) 23:28, 7 January 2022 (UTC)Reply

Reflist

edit

Ever since this edit the article has mixed a bulleted RFC in with a footnoted reflist. I'm not sure that's ideal. ReadOnlyAccount (talk) 06:49, 24 October 2023 (UTC)Reply

Base-31

edit

If this edit is correct, then it would seem that entire section possibly doesn't even belong here, since this article does not seem to be about any and all thirty-something bases, but only about base-32. ReadOnlyAccount (talk) 13:50, 24 October 2023 (UTC)Reply

PS: In other words, this article is about the duotrigesimal numeral system and its various notations, not about the untrigesimal numeral system. (The latter is mentioned on https://en.wiktionary.org/wiki/Appendix:Number_bases.) — Preceding unsigned comment added by ReadOnlyAccount (talkcontribs) 13:53, 24 October 2023 (UTC)Reply

Dodgy example

edit

The example that was added via these two edits seems not very good. First of all, it says it's "using the previously described 32-character set", but what's previously described in the lede is "using (...) the twenty-two upper-case letters A–V and the digits 0-9", which is base32hex, and that is not what this example actually uses, since it contains the letters W, Y and Z, which are not in base32hex. (NB: It seems this edit is to blame for the mismatch.) Secondly, this example string appears to be RFC 4648 Base32 encoded, but decoding the ciphertext string as that yields hex/binary data, not just a simple (and more easily understandable) text string. It appears to be a somewhat random bunch of bytes, so what does that tell the reader? Really not much. (I mean, if you can make sense of 08 0b 80 91 02 cc a4 21 c8 32 f9 4b 0c f7 a0 94 06 5d c9 95 f2 96 2b 6c ce 2c b3 5b 2f 00 88 91 cf 84 c5 df, be my guest.) Worse, the preceding sentence reads: "The rest of this article discusses the use of Base32 for representing byte strings, not unsigned integer numbers, similar to the way Base64 works." Depending on what is meant by "byte strings"—and the language is fuzzy—a reasonable reader could expect text strings (made from bytes!), which at least this example doesn't seem to encode. Besides, any byte could be interpreted as a signed or unsigned number, so the sentence may be more of a diary entry of a light bulb going on above its author's head, without being similarly illuminating to anyone else. Also, providing only the encoded string without the corresponding decoded form makes the example a lot less useful, especially in an encyclopædic context. A decent example would use text only, and provide both the plaintext and ciphertext. Or, if you really needed to demonstrate the encoding of binary data, you would include better context as to what exactly that binary data represents. The context provided here is that the example string is an "IPFS CIDv1 in Base32 upper-case encoding". That's great. What's a CIDv1? Oh, look: Even the linked article doesn't answer that question. And even disregarding the fact that this is a bit of an unsolved riddle, why have an example in the lede at all? Especially since despite the argument from authority provided by the RFCs, it's not yet clear which duotrigesimal notation will become universally accepted (if any). In the case of the hexadecimal numeral system, it also used to be the case that people had different ideas as to notation early on, but there the dust has long since settled, and despite some differences on ancillary details like 0x2F vs 2Fh, fundamentally just about everybody agrees how hexadecimal is written. The same cannot be said for duotrigesimal. Far from everybody agrees how to write base-32, and it's not even clear whether the RFC's base32 or base32hex will ultimately win out over the other. That makes putting an example in the lede even worse, because now you're picking favourites.ReadOnlyAccount (talk) 17:57, 24 October 2023 (UTC)Reply

Edit note: The dodgy example was in the lede; now it's not, hence I have struck out related phrasing above. Some of the other concerns still apply, but are in the process of being addressed. Perhaps, if having examples is deemed helpful, perhaps we could add a comparison table with examples for all the various alternatives. I'm not 100% convinced that would be super-helpful and encyclopædic, but it might beat having examples strewn and scattered in each of the respective subsections – which in turn might beat playing favourites and only providing one example when standardisation isn't settled. —ReadOnlyAccount (talk) 10:42, 26 October 2023 (UTC)Reply

Dodgy sentence in the lede, redux

edit

First, see above for my earlier critique of the sentence in question (in the lede).
Secondly, note that it can also be parsed one of two different ways (parentheses added):

  1. (The rest of this article discusses the use of Base32 for representing byte strings, not unsigned integer numbers), similar to the way Base64 works.
  2. The rest of this article discusses the use of Base32 for representing byte strings, not (unsigned integer numbers, similar to the way Base64 works).

ReadOnlyAccount (talk) 18:34, 24 October 2023 (UTC)Reply

Disambiguated. As for the fuzzy "byte strings", would "raw binary data" work better? —Cousteau (talk) 01:48, 26 October 2023 (UTC)Reply
Thanks, and yes, I think the latter would work better. —ReadOnlyAccount (talk) 10:22, 26 October 2023 (UTC)Reply

A base is a base is a base

edit

At the peril of offending fans of Gertrude Stein, the phrasing "Base32 is the base-32 numeral system" strikes me as particularly inelegant. The problem is, it was introduced in the context of a not well-settled low-heat edit direction dispute over whether this page is for the numeral system, computer implementations, or both (see the rest of this Talk page). Anyway, this edit is to blame for the near-tautology. ReadOnlyAccount (talk) 18:17, 25 October 2023 (UTC)Reply

Encoding table format

edit

It's probably not ideal that of the six major encoding tables currently in the article, two are horizontally arranged while the other four are vertically arranged, and there also are some subtle and unnecessary differences between the respective tables' layouts, etc. Is there a table format everybody can agree on?
ReadOnlyAccount (talk) 04:25, 27 October 2023 (UTC)Reply

I have now changed the first two tables to the same format. With agreement, perhaps more could follow? —ReadOnlyAccount (talk) 02:08, 2 December 2023 (UTC)Reply