Locale
objects
Locale
objects gives methods to get
strings for country and language
Name: ISO-10646-UCS-2
MIBenum: 1000
Source: the 2-octet Basic Multilingual Plane, aka Unicode
this needs to specify network byte order: the standard
does not specify (it is a 16-bit integer space)
Alias: csUnicode
A language is a way that humans interact...the most common of which are speech, writing and signing. Language identifiers for use in internet protocols are defined in RFC3066
A set of graphic characters used for the written form of one or more languages
Coded Character Set (CCS) is a mapping from a set of abstract characters to a set of integers
Character Encoding Scheme (CES) is a mapping
from a Coded Character Set or several coded character
sets to a set of octets.
A definition of a character encoding scheme
consists of:
- A description of an algorithm which transforms every
possible sequence of octets to either a sequence of
pairs
The term "charset" means a set of rules for mapping from a sequence of octets to a sequence of characters, such as the combination of a coded character set and a character encoding scheme; this is also what is used as an identifier in MIME "charset=" parameters, and registered in the IANA charset registry ... (Note that this is NOT a term used by other standards bodies, such as ISO).
www
may add in the domain names
www.monash.edu.au
www
is not a word
in any language
toASCII
and toUnicode
toASCII
toUnicode
.
This means that any application that does e.g.
print getHostByName()
should be altered to
print toUnicode(getHostByName())
or even to
print toBig5(toUnicode(getHostByName()))