A character in a specified language can be coded as a string of bits (binary digits).
Fixed length codes
5-bit Code
- In the early days, English was the primary language used for communications in most parts of the world.
- The English alphabet consists of 26 letters.
- 5-bits (which can represent 32 symbols) are needed to represent
(a) upper case characters
(b) lower case characters - To represent both cases, a “Shift” character was introduced in some systems.
Alternatively, one can use higher bit Codes.
6-bit Code
- Some early computers (e.g. ICL1902) use 6-bit code.
- It can represent 64 characters.
- A subset is used for Control.
- The remaining is used for printable characters (letters, numerals, punctuation).
7-bit Code
- ASCII (American Standard Code for Information Interchange)
8-bit Code
- EBCDIC (Extended Binary Coded Decimal Interchange Code)
Character Sets
- With the wide spread use of computer technology, Character sets for the various languages were developed.
- Some languages (notably Chinese) require long bit strings.
Variable Length Codes
- Fixed length coding gave way to Variable length coding.
- The most used characters in a language are represented as one byte and the lesser used characters are represented as two or more bytes.
Unicode
- Unicode aims to have standard character codes for the languages.
- There are formal and informal institutions to help develop, propose and approve new Unicode character sets.
- UTF is a Unicode Transformation Format to transform Unicode characters to fit the specified length (e.g. UTF-8, UTF-16).
Standards and Recommendations
Standards may be
- De Jure (set by law)
- De Facto (set by common usage).
Standards must be followed for Compliance.
Recommendations, which should be followed, can cause variations in the implementations.