Friday, March 29, 2024

Internationalization

Despite its name, the World Wide Web has had some difficulty reaching out past
the Western languages and alphabets. In general, character representation in HTML
was largely confined to the use of the ISO 8859-1 (Latin-1) character set. This character
set contains letters for English, French, Spanish, German, and the Scandinavian languages,
but no Greek, Hebrew, Arabic, or Cyrillic characters, among others, and few scientific
and mathematical symbols. Also, the Latin-1 character set contains no provisions
for marking reading direction.

Part of the problem with Latin-1 is that it simply doesn’t have room to handle
all the alphabets and languages of the world. It is an 8-bit, single-byte coded graphic
character set and, as such, can represent only up to 256 characters.


Enter Unicode. Unicode is a character-encoding standard that uses a 16-bit set,
thereby increasing the number of encoded characters to more than 65,000 characters.


HTML 4.0 uses the Universal Character Set (UCS) as its character set. UCS is a
character- by-character equivalent to Unicode 2.0.


from Special Edition Using HTML 4: Appendix A



What’s New in HTML 4.0

© Copyright Macmillan Computer Publishing. All
rights reserved.

Previous article
Next article

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured