Unicode / ASCII

Unicode Character Table / Miscellaneous Symbols and Pictograph

Emoji

Usefull char

UTF-8, UTF-16, and UTF-32 ?

Stick to UTF-8 and these three character sets - coding policy for software devlopment

That’s one of the great features of UTF-8. You can move forwards and backwards through a UTF-8 string without having to start from the beginning. - UTF-8 is a brilliant design

UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.

UTF-16 is better where ASCII is not predominant, since it uses 2 bytes per character, primarily. UTF-8 will start to use 3 or more bytes for the higher order characters where UTF-16 remains at just 2 bytes for most characters.

UTF-32 will cover all possible characters in 4 bytes. This makes it pretty bloated. I can’t think of any advantage to using it.

ASCII

Four Column ASCII / HN

Remember that ASCII is a 7 bit encoding. Let’s say the following:

  • The first two bits denote the group of the character (2^2 so 4 possible values)
  • The remaining five bits describe a character (2^5 so 32 possible values)
00 01 10 11 5last
NUL Spc @ ` 00000
SOH ! A a 00001
STX ” B b 00010
… Β  Β  Β  Β 

see also

Written on August 13, 2020, Last update on November 30, 2025
ascii utf8 binary text encoding online software