This is a list of some binary codes that are (or have been) used to represent text as a sequence of binary digits "0" and "1". Fixed-width binary codes use a set number of bits to represent each character in the text, while in variable-width binary codes, the number of bits may vary from character to character.
Five-bit binary codes
editSeveral different five-bit codes were used for early punched tape systems.
Five bits per character only allows for 32 different characters, so many of the five-bit codes used two sets of characters per value referred to as FIGS (figures) and LTRS (letters), and reserved two characters to switch between these sets. This effectively allowed the use of 60 characters.
Standard five-bit standard codes are:
- International Telegraph Alphabet No. 1 (ITA1) – Also commonly referred to as Baudot code[1]
- International Telegraph Alphabet No. 2 (ITA2) – Also commonly referred to as Murray code[1][2]
- American Teletypewriter code (USTTY) – A variant of ITA2 used in the USA[2]
- DIN 66006 – Developed for the presentation of ALGOL/ALCOR programs on paper tape and punch cards
The following early computer systems each used its own five-bit code:
- J. Lyons and Co. LEO (Lyon's Electronic Office)
- English Electric DEUCE
- University of Illinois at Urbana-Champaign ILLIAC
- ZEBRA
- EMI 1100
- Ferranti Mercury, Pegasus, and Orion systems[3]
The steganographic code, commonly known as Bacon's cipher uses groups of 5 binary-valued elements to represent letters of the alphabet.
Six-bit binary codes
editSix bits per character allows 64 distinct characters to be represented.
Examples of six-bit binary codes are:
- International Telegraph Alphabet No. 4 (ITA4)[4]
- Six-bit BCD (Binary Coded Decimal), used by early mainframe computers.
- Six-bit ASCII subset of the primitive seven-bit ASCII
- Braille – Braille characters are represented using six dot positions, arranged in a rectangle. Each position may contain a raised dot or not, so Braille can be considered to be a six-bit binary code.
See also: Six-bit character codes
Seven-bit binary codes
editExamples of seven-bit binary codes are:
- International Telegraph Alphabet No. 3 (ITA3) – derived from the Moore ARQ code, and also known as the RCA
- ASCII – The ubiquitous ASCII code was originally defined as a seven-bit character set. The ASCII article provides a detailed set of equivalent standards and variants. In addition, there are various extensions of ASCII to eight bits (see Eight-bit binary codes)
- CCIR 476 – Extends ITA2 from 5 to 7 bits, using the extra 2 bits as check digits[4]
- International Telegraph Alphabet No. 4 (ITA4)[4]
Eight-bit binary codes
edit- Extended ASCII – A number of standards extend ASCII to eight bits by adding a further 128 characters, such as:
- EBCDIC – Used in early IBM computers and current IBM i and System z systems.
10-bit binary codes
edit- AUTOSPEC – Also known as Bauer code. AUTOSPEC repeats a five-bit character twice, but if the character has odd parity, the repetition is inverted.[4]
- Decabit – A datagram of electronic pulses which are transmitted commonly through power lines. Decabit is mainly used in Germany and other European countries.
16-bit binary codes
edit- UCS-2 – An obsolete encoding capable of representing the basic multilingual plane of Unicode
32-bit binary codes
edit- UTF-32/UCS-4 – A four-bytes-per-character representation of Unicode.
Variable-length binary codes
edit- UTF-8 – Encodes characters in a way that is mostly compatible with ASCII but can also encode the full repertoire of Unicode characters with sequences of up to four 8-bit bytes.
- UTF-16 – Extends UCS-2 to cover the whole of Unicode with sequences of one or two 16-bit elements
- GB 18030 – A full-Unicode variable-length code designed for compatibility with older Chinese multibyte encodings
- Huffman coding – A technique for expressing more common characters using shorter bit strings than are used for less common characters
Data compression systems such as Lempel–Ziv–Welch can compress arbitrary binary data. They are therefore not binary codes themselves but may be applied to binary codes to reduce storage needs.
Other
edit- Morse code is a variable-length telegraphy code, which traditionally uses a series of long and short pulses to encode characters. It relies on gaps between the pulses to provide separation between letters and words, as the letter codes do not have the "prefix property". This means that Morse code is not necessarily a binary system, but in a sense may be a ternary system, with a 10 for a "dit" or a "dot", a 1110 for a dash, and a 00 for a single unit of separation. Morse code can be represented as a binary stream by allowing each bit to represent one unit of time. Thus a "dit" or "dot" is represented as a 1 bit, while a "dah" or "dash" is represented as three consecutive 1 bits. Spaces between symbols, letters, and words are represented as one, three, or seven consecutive 0 bits. For example, "NO U" in Morse code is "— .
— — — . . —", which could be represented in binary as "1110100011101110111000000010101110". If, however, Morse code is represented as a ternary system, "NO U" would be represented as "1110|10|00|1110|1110|1110|00|00|00|10|10|1110".
See also
editReferences
edit- ^ a b Alan G. Hobbs (1999-03-05). "Five-unit codes". NADCOMM Museum. Archived from the original on 1999-11-04.
- ^ a b Gil Smith (2001). "Teletypewriter Communication Codes" (PDF).
- ^ "Paper Tape Readers & Punches". The Ferranti Orion Web Site. Archived from the original on 2011-07-21.
- ^ a b c d "Telecipher Devices". John Savard's Home Page.