Charset

By Joe Burns

Use these to jump around or read it all...

[The Code]
[Other Charsets]
[Why iso-8859-1 Rather Than us-ascii?]

In all my years of writing tutorials, I can't remember ever receiving an email and spinning right around and posting a tutorial off of it. There's a first time for everything I guess.

I was working on a new site and asked some people to beta test it. I didn't know many of the people. They were all subscribers to my design newsletter. One gentleman happened to be an American now living in Japan and working as an interpreter. He also does Web Design.

Well, when he entered the site using his Microsoft Internet Explorer browser, the page just looked horrible. The reason was that his browser was set to display Japanese characters by default. My page, however, was in English (more correctly, Western European text). Many characters could not be displayed so he received the little boxes. Some text popped up, some didn't. It was a mess.

Some who actually know about this little problem might suggest that anyone running a browser not set to English as default can simply reload using the "encoding" function and the page will display in English (ASCII/Western European) text. I guess that's true but it's not being overly nice to the viewer. Every page has to be reloaded and encoded.

What's interesting about this problem is that it doesn't have to happen. You, the designer, can add one simple META command that will alert most browsers, using any language, that the page that is about to display is to be displayed using an English set of characters, or a character set, or a "charset".


The Code

I know you look at other people's code, right? C'mon, admit it. Have you ever seen one of these:

<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

That line of text will save persons using a browser not set to display English a lot of reloading. Here's what it is saying:

  • http-equiv should be viewed as meaning "name". It is most often accompanied by:
  • content which is equal to a value. The name/value pair here works much like it does in a form element.
  • charset states what set of characters should be used to display the code within the document.
The command goes between the <HEAD> tags.


Other Charsets

The first thing that popped into my mind when I began really looking at this command was how many other charsets are there? The most complete list I found was one that described the charsets understood by the Netscape Navigator browser Source: http://people.netscape.com/ftang/meta.html

For English:

  • us-ascii

For Western-European Languages:
  • iso-8859-1
  • x-mac-roman (only for Macintosh version)
  • iso-8859-9 (3.0 and above)
  • x-mac-turkish (3.0 and above only for Macintosh version)

For Traditional Chinese:
  • big5
  • x-euc-tw

For Japanese:
  • Shift_JIS ( 3.0 or above, 2.0 only know x-sjis )
  • x-euc-jp
  • iso-2022-jp

For Korean:
  • euc-kr
  • so-2022-kr

For Simplified Chinese:
  • gb2312

For Eastern (or Central) European Languages:
  • iso-8859-2
  • x-mac-ce (only for Macintosh version)

For Cyrillic:
  • iso-8859-5 (3.0 and aboved)
  • koi8-r (3.0 and aboved)
  • x-mac-cyrillic (3.0 and above, only for Macintosh version)

For Greek:
  • iso-8859-7 (3.0 and above)
  • x-mac-greek (3.0 and above, only for Macintosh version)


Why iso-8859-1 Rather Than
us-ascii?

I think I'll have to hit upon this or I'm going to get email asking the question. If you look at the listing of charsets above, you'll notice that there is a charset for "English". Well, my pages are in English so why not use that? Why use the iso-8859-1?

The reason is that the us-ascii charset only contains the 128 ascii characters. Look at your keyboard. See all those letters and numbers and such? There are 128 of them. Count upper and lower case as two. That's not enough.

The iso-8859-1 charset is greatly expanded over us-ascii to include ASCII equals to special characters like ¥, ®, Ø. You can see a full list of the special characters in my ASCII Command tutorial.

For a short answer, the iso-8859-1 charset is standardized, more dynamic, and carries more information. It's the best choice for pages written in English.


That's That

If your pages are in English, I would suggest that you make a point of adding the META command above in between your HEAD tags. You may never know it, but you'll probably be making someone's surfing life a whole lot easier and more enjoyable.

 Enjoy!

[The Code]
[Other Charsets]
[Why iso-8859-1 Rather Than us-ascii?]



Make a Comment

Loading Comments...

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.
  •  
  •