Giganews Newsgroups
Subject: Charset
Posted by:  Stan Brown (the_stan_bro…
Date: Thu, 15 Oct 2020

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But perfectly decent characters like é, ×, ² show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

So what charset should I use to represent a file where every
character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1
character set?

To make things even more murky, at
I found this gem: "If the attribute is present, its value must be an
ASCII case-insensitive match for the string "utf-8", because UTF-8 is
the only valid encoding for HTML5 documents."
If that's true, it sounds very much like I can't generate my web
pages unless I code every 160-255 character as a six-byte &#nnn;
string, which is not only a pain but makes editing harder.

(I tried looking at character encodings in Vim, and indeed it does
have a utf-8 option, but after I do my editing I run all my pages
through a very complicated awk script, and it looks like awk can't
handle UTF-8, at least not in Windows.)

Stan Brown, Tehachapi, California, USA
HTML 4.01 spec:
CSS 2.1 spec:
Why We Won't Help You: