What are the differences between UTF-8, UTF-16, and UTF-32?
I understand that they will all store Unicode, and that each uses a different number of bytes to represent a character. Is there an advantage to choosing one over the other?
回答1
UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.
UTF-16 is better where ASCII is not predominant, since it uses 2 bytes per character, primarily. UTF-8 will start to use 3 or more bytes for the higher order characters where UTF-16 remains at just 2 bytes for most characters.
UTF-32 will cover all possible characters in 4 bytes. This makes it pretty bloated. I can‘t think of any advantage to using it.
回答2
n short:
In long: see Wikipedia: UTF-8, UTF-16, and UTF-32.
回答3
UTF-8 is variable 1 to 4 bytes.
UTF-16 is variable 2 or 4 bytes.
UTF-32 is fixed 4 bytes.
Note: UTF-8 can take 1 to 6 bytes with latest convention: https://lists.gnu.org/archive/html/help-flex/2005-01/msg00030.html
原文:https://www.cnblogs.com/chucklu/p/14639465.html