ISO 8859-1

ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. 1, consisting of 191 characters from the Latin script, each encoded as a single 8-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with the exception of correct quotation marks and apostrophe for many of them): Albanian, Basque, Catalan, Danish, Dutch (missing Ĳ and ĳ), English, Faroese, French (missing Œ and œ), Finnish (missing Š, š, Ž, and ž), German, Icelandic (missing „ and “), Irish, Italian, Latin, Norwegian, Portuguese, Rhaeto-Romanic, Scottish, Spanish, Swedish. Other languages covered include Afrikaans and Swahili. Thus, this character encoding is used throughout The Americas, Western Europe, Oceania, and much of Africa.

Contents

1 Differences with ISO/IEC 8859-15

2 Code table

3 ISO 8859-1 vs ISO-8859-1

4 Windows-1252

5 Macintosh character sets

6 External links

Differences with ISO/IEC 8859-15

ISO/IEC 8859-1 suffers from a number of deficiencies, including the omission of a few French letters, a single glyph representation for the letter IJ, two Finnish letters used for transcription of some foreign names and in a few loanwords, and the lack of common glyphs such as the dagger †, typographic quotation marks and dashes, and other characters. Additionally the euro symbol is not encoded. For this reason, ISO/IEC 8859-15 has been developed as an update of ISO/IEC 8859-1 to add the euro sign and other required additional characters. (This required however the removal of some less used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.)

Code table

Since all 191 characters encoded by ISO/IEC 8859-1 are graphic and compatible with most web browsers, they can be shown as glyphs in the following table. Since they would not normally be visible, the space character, the no-break space character, and the soft hyphen character are represented by abbreviations for their names. All other characters are represented literally. In the table, the row and column headings indicate the hexadecimal digit combinations to produce the 8-bit code value; e.g., the letter L is at code point 4C (hex), or binary 01001100.

ISO/IEC 8859-1
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	unused
1x	unused
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
8x	unused
9x	unused
Ax	NBSP	¡	¢	£	¤	¥	¦	§	¨	©	ª	«	¬	SHY	®	¯
Bx	°	±	²	³	´	µ	¶	·	¸	¹	º	»	¼	½	¾	¿
Cx	À	Á	Â	Ã	Ä	Å	Æ	Ç	È	É	Ê	Ë	Ì	Í	Î	Ï
Dx	Ð	Ñ	Ò	Ó	Ô	Õ	Ö	×	Ø	Ù	Ú	Û	Ü	Ý	Þ	ß
Ex	à	á	â	ã	ä	å	æ	ç	è	é	ê	ë	ì	í	î	ï
Fx	ð	ñ	ò	ó	ô	õ	ö	÷	ø	ù	ú	û	ü	ý	þ	ÿ

Code values 00-1F, 7F, and 80-9F are not assigned to characters by ISO/IEC 8859-1.

ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published along with ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification ECMA-94 (http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-094.pdf), by which name it is still sometimes known.

ISO 8859-1 vs ISO-8859-1

The IANA has approved ISO-8859-1 (note the extra hyphen), a superset of ISO/IEC 8859-1, for use on the Internet. This character map, or character set or code page, supplements the assignments made by ISO/IEC 8859-1, mapping control characters to code values 00-1F, 7F, and 80-9F. It thus provides for 256 characters via every possible 8-bit value. The IANA allows all of the following aliases for ISO-8859-1 to be used case-insensitively:

ISO_8859-1:1987
ISO_8859-1
ISO-8859-1
iso-ir-100
csISOLatin1
latin1
l1
IBM819
CP819

The name Latin-1 is an informal alias unrecognized by ISO or the IANA, but is perhaps meaningful in some computer software. "Latin-1" is used by the Unicode standard to refer to characters in the U+0000 to U+00FF range. The following table shows the ISO-8859-1 character map. Control characters, the space character, the no-break space character, and the soft hyphen character are represented by 2-, 3-, or 4-letter abbreviations for their names. All other characters are represented literally.

ISO-8859-1
	-0	-1	-2	-3	-4	-5	-6	-7	-8	-9	-A	-B	-C	-D	-E	-F
0-	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	TAB	LF	VT	FF	CR	SO	SI
1-	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2-	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3-	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4-	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5-	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6-	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7-	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
8-	PAD	HOP	BPH	NBH	IND	NEL	SSA	ESA	HTS	HTJ	VTS	PLD	PLU	RI	SS2	SS3
9-	DCS	PU1	PU2	STS	CCH	MW	SPA	EPA	SOS	SGCI	SCI	CSI	ST	OSC	PM	APC
A-	NBSP	¡	¢	£	¤	¥	¦	§	¨	©	ª	«	¬	SHY	®	¯
B-	°	±	²	³	´	µ	¶	·	¸	¹	º	»	¼	½	¾	¿
C-	À	Á	Â	Ã	Ä	Å	Æ	Ç	È	É	Ê	Ë	Ì	Í	Î	Ï
D-	Ð	Ñ	Ò	Ó	Ô	Õ	Ö	×	Ø	Ù	Ú	Û	Ü	Ý	Þ	ß
E-	à	á	â	ã	ä	å	æ	ç	è	é	ê	ë	ì	í	î	ï
F-	ð	ñ	ò	ó	ô	õ	ö	÷	ø	ù	ú	û	ü	ý	þ	ÿ

There are additional parts to the ISO/IEC 8859 standard that have corresponding IANA-approved character maps, e.g. ISO/IEC 8859-10 (Latin alphabet no. 6) is very similar to character map ISO-8859-10. Each of the ISO/IEC 8859-x parts encodes characters in the same way: they cover the ASCII range (hex 20-7E) plus 96 additional characters in the A0-FF range, for a total of 191 characters. The ISO-8859-x maps each add the ISO 646 C0 "control" characters from 00-1F, a control character at 7F, and control characters in the 80-9F range, thus encompassing a total of 256 characters. ISO-8859-1 is unique among these maps in that its coded characters are equivalent to the first 256 code points of Unicode. ISO-8859-1 is the standard encoding used by the X Window System on most Unix machines.

Windows-1252

The legacy components of Microsoft Windows in English and some other Western languages use, by default, an encoding that is a superset of ISO/IEC 8859-1, but differs from ISO-8859-1, using displayable characters rather than control characters in the 0x80 to 0x9F range. This encoding is known to Windows by the code page number 1252, IANA-approved name Windows-1252.

Many web browsers treat ISO-8859-1 as Windows-1252 (the extra control codes in ISO-8859-1 are forbidden in HTML anyway), and so codes from it are often seen in web pages that declare their encoding as ISO-8859-1.

A popular misconception is that ANSI is synonomous with this code page. In fact Windows uses ANSI to refer to the system's ANSI code page, which will be 1252 only in the English versions and a few Western localizations.

The following table shows Windows-1252, with changes from ISO-8859-1 highlighted:

Windows-1252 (CP1252)
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	TAB	LF	VT	FF	CR	SO	SI
1x	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
8x	€		‚	ƒ	„	…	†	‡	ˆ	‰	Š	‹	Œ		Ž
9x		‘	’	“	”	•	–	—	˜	™	š	›	œ		ž	Ÿ
Ax	NBSP	¡	¢	£	¤	¥	¦	§	¨	©	ª	«	¬	SHY	®	¯
Bx	°	±	²	³	´	µ	¶	·	¸	¹	º	»	¼	½	¾	¿
Cx	À	Á	Â	Ã	Ä	Å	Æ	Ç	È	É	Ê	Ë	Ì	Í	Î	Ï
Dx	Ð	Ñ	Ò	Ó	Ô	Õ	Ö	×	Ø	Ù	Ú	Û	Ü	Ý	Þ	ß
Ex	à	á	â	ã	ä	å	æ	ç	è	é	ê	ë	ì	í	î	ï
Fx	ð	ñ	ò	ó	ô	õ	ö	÷	ø	ù	ú	û	ü	ý	þ	ÿ

In Windows-1252, positions 81, 8D, 8F, 90, and 9D are unused. The euro character at position 80 was not present in earlier versions of this code page.

Macintosh character sets

The original Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman in 1984, meant to be suitable for Western European desktop publishing. It was a superset of ASCII, like ISO-8859-1, but had nothing else in common with the ISO standards. A later version, differentiated by the un-hyphenated moniker MacRoman, replaced the generic currency symbol with the Euro symbol.

The distinction between ISO-8859-1, Windows-1252, and MacRoman is a common source of confusion among computer programmers and on the Internet.

External links

ISO/IEC 8859-1:1998 (http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf) final draft of the standard (PDF)
Windows Codepages (http://www.microsoft.com/globaldev/reference/WinCP.asp)
Differences between ANSI, ISO-8859-1 and MacRoman Character Sets (http://www.alanwood.net/demos/charsetdiffs.html)
The Letter Database (http://www.eki.ee/letter/)
ASCII - ISO 8859-1 Table with HTML Entity Names (http://www.bbsinc.com/iso8859.html)
The ISO 8859 Alphabet Soup (http://czyborra.com/charsets/iso8859.html) - Roman Czyborra's history of ISO character setsda:ISO 8859-1

de:ISO 8859-1 es:ISO 8859-1 fr:ISO 8859-1 fi:ISO 8859-1 nl:ISO 8859-1 sv:ISO/IEC 8859-1 tt:ISO 8859-1 zh:ISO 8859-1

Retrieved from "https://academickids.com:443/encyclopedia/index.php/ISO_8859-1"

Categories: ISO 8859 | IEC standards

ISO 8859-1

Differences with ISO/IEC 8859-15

Code table

ISO 8859-1 vs ISO-8859-1

Windows-1252

Macintosh character sets

External links

Navigation

Search

Toolbox

Personal tools