Internationalization

Despite its name, the World Wide Web has had some difficulty reaching out past
the Western languages and alphabets. In general, character representation in HTML
was largely confined to the use of the ISO 8859-1 (Latin-1) character set. This character
set contains letters for English, French, Spanish, German, and the Scandinavian languages,
but no Greek, Hebrew, Arabic, or Cyrillic characters, among others, and few scientific
and mathematical symbols. Also, the Latin-1 character set contains no provisions
for marking reading direction.

Part of the problem with Latin-1 is that it simply doesn’t have room to handle
all the alphabets and languages of the world. It is an 8-bit, single-byte coded graphic
character set and, as such, can represent only up to 256 characters.

Enter Unicode. Unicode is a character-encoding standard that uses a 16-bit set,
thereby increasing the number of encoded characters to more than 65,000 characters.

HTML 4.0 uses the Universal Character Set (UCS) as its character set. UCS is a
character- by-character equivalent to Unicode 2.0.

from Special Edition Using HTML 4: Appendix A

What’s New in HTML 4.0

Get the Free Newsletter!

Popular Articles

How to Reload the Page

HTML5 Navigation: Using an Anchor Tag for Hypertext

How to Create Indents and Bullet Lists

Featured

Top Online Courses to Learn SEO

Sellzone Marketing Tool for Amazon Review

The Top Database Plugins for WordPress

The Revolutionary ES6 Rest and Spread Operators

Advertisers

Menu

Our Brands