The case against deprecation
editBackground information
editMultiple-byte units | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Orders of magnitude of data |
- In most contexts the SI prefixes kilo-, mega- and giga- mean 1 thousand, 1 million and 1 (short scale) billion, respectively, as in one kilogram = one thousand grams, one megajoule = one million joules and one gigawatt = one billion watts. In symbols: 1 kg = 1,000 g; 1 MJ = 1,000,000 J; 1 GW = 1,000,000,000 W.
- In computer science the units kilobyte, megabyte and gigabyte (symbols kB, MB and GB) were originally used in this standard decimal sense to mean 1,000 and 1,000,000 and 1,000,000,000 bytes, respectively. In symbols: 1 kB = 1000 B; 1 MB = 10002 B; 1 GB = 10003 B.
- However, in modern use (and depending on the context), the same three symbols sometimes have a binary meaning. The binary definitions of these three symbols are 1 KB = 1024 B; 1 MB = 10242 B;[1] 1 GB = 10243 B. In this context it is customary to use an upper case "K" instead of the SI prefix "k", for kilo.
- The computer itself does not account for the number of bytes using binary prefixes, but someone in the 1980s decided to report memory, file and HDD size in this manner. As such, the use of binary prefixes is only a convention. Altering this convention to agree with SI Prefixes such as in Apple's 2009 "Snow Leopard" release and Ubuntu could have been done at any time; however, it stuck this way for much of the computer industry.[2]
- For many applications (primarily the storage capacity of hard disk drives and data rates for telecommunications), the decimal convention is retained, whereby one kilobit is exactly one thousand bits and one megabyte is exactly one million bytes.[3]
- There are many WP articles in which the same symbol (eg MB) is used with two different meanings, often hopping between them in the same paragraph or section, sometimes even in the same sentence. This dual use creates confusion and a corresponding need to disambiguate.
- These ambiguous usages are common beyond Wikipedia and have led to litigation.
- Problems get successively worse with higher values prefixes tera- (10004 vs 10244), peta- (10005 vs 10245), etc. The highest value SI prefix for which a binary counterpart has been defined is yotta-, meaning 10008. The corresponding binary prefix yobi- means 10248 (≈1.21×1024), which differs by 21 % from the conventional decimal interpretation of yotta-.
- In December 1998, in an attempt to resolve the ambiguity the International Electrotechnical Commission (IEC) introduced a new set of prefixes kibi-, mebi- and gibi- for the binary meanings, with symbols Ki-, Mi- and Gi- so that 1 KiB (one kibibyte) = 1024 B, 1 MiB (one mebibyte) = 10242 B and 1 GiB (one gibibyte) = 10243 B. In the IEC standard, the prefixes kilo-, mega- etc are reserved for their original decimal meanings.
- In March 2005, the IEC prefixes were adopted by the Institute of Electrical and Electronics Engineers (IEEE) after a two-year trial period.
- The use of IEC prefixes has been approved by national and international standards bodies, including, in addition to IEC and IEEE, the International Bureau of Weights and Measures (the standards body responsible for the SI system of units), the European Committee for Electrotechnical Standardization (CENELEC) and the US National Institute of Standards and Technology.
- The binary prefixes defined by the IEC are now incorporated in the International System of Quantities (ISQ).
- The alternative (binary use of SI-like prefixes) is deprecated by the same standards bodies.
- Use of IEC prefixes in popular literature is rare, making them unfamiliar to many readers. Their use in scientific publications increased from fewer than 15 per year on first introduction to about 200 per year in the early 2010s, and about 600 per year in the mid-2020s: 1999-2001 (ca. 40 hits); 2002-2004 (60 hits); 2005-2007 (190 hits); 2008-2010 (380 hits); 2011-2013 (710 hits); 2014-2016 (1050 hits); 2017-2019 (1330 hits); 2020-2022 (1510 hits); 2023-2025 (1060 hits to date).
Why Wikipedia should not deprecate the use of IEC prefixes
edit- IEC prefixes are unambiguous, succinct, simple to use and simple to understand.
- The use of IEC prefixes is endorsed by national and international standards bodies.
- The use of one symbol (e.g. GB) to mean two different things in the same article creates confusion and ambiguity. Despite this ambiguity, there are many WP articles in which kilobyte, megabyte and/or gigabyte are used in this way. In this situation, the IEC prefixes provide an ideal disambiguation tool because they are unambiguous and succinct.
- Deprecation (of IEC prefixes) increases the difficulty threshold for disambiguation, reducing the rate at which articles can be disambiguated by expert editors.
- In turn this reduces the total number of articles that can be further improved by less expert editors with footnotes etc (assuming that there is consensus to do so).
- Deprecation is interpreted by some editors as a justification for changing unambiguous units into ambiguous ones.
- Removing IEC prefixes from articles, even when disambiguated with footnotes, destroys a part of the information that was there before, because it requires an expert to work out which footnote corresponds to which use in the article.
- In the long term, the use of IEC prefixes would ultimately avoid the need to use same symbol (e.g., MB) with two different meanings. This may sound like a pipe dream, but it could be implemented as a user preference, so that readers could choose between familiar (ambiguous) units and (unfamiliar) unambiguous ones.
- The main argument for not using IEC prefixes is the unfamiliarity of, for example, the mebibyte (MiB) compared with the megabyte (MB). The unfamiliarity is not disputed, but is not relevant to disambiguation. The point is that disambiguation is rare and therefore all disambiguation methods are unfamiliar.
- Alternative disambiguation methods are either cumbersome (i.e., exact numbers of bytes), difficult and time-consuming to implement in a manner that is clear to the reader (i.e., footnotes)[4] or unlikely to be understood (i.e. exponentiation).
- In conclusion, disambiguation is not easy, so it would be unwise to discard the simplest disambiguation tool at our disposal just because it is unfamiliar to some readers. The best disambiguation method has yet to be established, so it is premature to deprecate this one.
See also
edit- Binary prefix
- Timeline of binary prefixes
- User pages
- A case for the use of IEC prefixes on Wikipedia by Omegatron .
- A discussion page by Quilbert.
Footnotes
edit- ^ MB even has a third meaning, equal to 1000 KiB or 1,024,000 B
- ^ Snow Leopard changes how file and drive sizes are calculated
- ^ According to the LBA Count for IDE Hard Disk Drives Standard from the website of the International Disk Drive Equipment and Materials Association (IDEMA), there are 1,000,194,048 bytes (1,953,504 logical blocks x 512 bytes/logical block) per nominal gigabyte of hard drive storage.
- ^ This problem is illustrated by Address space layout randomization, which includes the confusing disambiguation footnote "Transistorized memory, such as RAM and cache sizes (other than solid state disk devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (10241), M (10242), G (10243), ..."