Tutorial week 4
-
There is no way to tell which encoding a text file is written in.
You need a convention, such as filename extensions or standardised start.
Assume that files start with a string such as "language: .....".
-
Save files in a variety of different encodings such as ISO 8859-1.
ISO 8859-2, ISO 2022, BIG-5, UTF-8 etc
-
Do these look different? (In Unix, the command
od -b file
will show the bytes in a file)
-
Can you devise an algorithm to detect encoding from these results?
-
What would be a good string to use to test encoding?
-
Which Java fonts can support display of
-
Thai characters such as
U+0E01 THAI CHARACTER KO KAI 'ก'?
-
Tamil U+0B86 TAMIL LETTER AA 'ஆ'
-
Tagalog U+1700 TAGALOG LETTER A 'ᜀ'
-
The Chinese character for dog U+72AC
KANGXI RADICAL DOG '犬'
-
Write a program to determine the set of fonts that can display
characters in the current locale
Jan Newmarch (http://jan.newmarch.name)
jan@newmarch.name
Last modified: Mon Mar 21 20:47:50 EST 2005
Copyright ©Jan Newmarch
Copyright © Jan Newmarch, Monash University, 2007
This work is licensed under a
Creative Commons License
The moral right of Jan Newmarch to be identified as the author of this page has been asserted.