By Sue Charlesworth


Building the Right Environment to Support AI, Machine Learning and Deep Learning

Despite its name, the World Wide Web has had some difficulty reaching out past the Western languages and alphabets. In general, character representation in HTML was largely confined to the use of the ISO 8859-1 (Latin-1) character set. This character set contains letters for English, French, Spanish, German, and the Scandinavian languages, but no Greek, Hebrew, Arabic, or Cyrillic characters, among others, and few scientific and mathematical symbols. Also, the Latin-1 character set contains no provisions for marking reading direction.

Part of the problem with Latin-1 is that it simply doesn't have room to handle all the alphabets and languages of the world. It is an 8-bit, single-byte coded graphic character set and, as such, can represent only up to 256 characters.

Enter Unicode. Unicode is a character-encoding standard that uses a 16-bit set, thereby increasing the number of encoded characters to more than 65,000 characters.

HTML 4.0 uses the Universal Character Set (UCS) as its character set. UCS is a character- by-character equivalent to Unicode 2.0.

from Special Edition Using HTML 4: Appendix A
What's New in HTML 4.0

© Copyright Macmillan Computer Publishing. All rights reserved.

Make a Comment

Loading Comments...

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.

    By submitting your information, you agree that htmlgoodies.com may send you HTMLGOODIES offers via email, phone and text message, as well as email offers about other products and services that HTMLGOODIES believes may be of interest to you. HTMLGOODIES will process your information in accordance with the Quinstreet Privacy Policy.

Thanks for your registration, follow us on our social networks to keep up-to-date