The Java char
type is a 2 byte integer.
It can only represent chanracters from the Basic plane.
The String
type is a sequence of
characters, each of 2 bytes.
To represent characters outside of the BMP, you need to use
a string of two Java char
s, one for the
high surrogate and the other for the low surrogate.
The class Character
can be used to construct
an array of chars of length one or two, using the static method
char[] Character.toChars(int codepoint)
The Java class Normalizer
can be used to convert a
string to normalized form:
String normalized_string = Normalizer.normalize(target_chars,
Normalizer.Form.NFD);
where the target_chars
can be a String
.
Strings can be converted to an array of UTF-8 bytes by
byte[] bytes = string.getBytes(StandardCharsets.UTF_8);
To convert the other way, use
String string = new String(bytes, StandardCharsets.UTF_8);
Java uses the default charset for many classes converting between strings and bytes. It's value can be seen by
Charset.defaultCharset().displayName();
The page
Guide to Character Encoding
claims that on macOS this will be UTF-8 while for Windows systems
it will be Windows-1252.
To avoid errors, the relevant classes should explicitly determine
the charset used.
To read UTF-8 strings from an InputStream
,
wrap it in an InputStreamReader
with the
character encoding:
new InputStreamReader(inputStream, StandardCharsets.UTF_8)
To write UTF-8 strings to an OutputStream
,
wrap it in an OutputStreamWriter
with the
character encoding:
new OutputStreamWriter(outputStream, StandardCharsets.UTF_8)
Other relevant classes are dealt with similarly.
A revised EchoClient using this has only a few lines changed: EchoClient.java:
import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;
public class EchoClient {
public static final int SERVER_PORT = 2000;
public static void main(String[] args){
if (args.length != 1) {
System.err.println("Usage: Client address");
System.exit(1);
}
InetAddress address = null;
try {
address = InetAddress.getByName(args[0]);
} catch(UnknownHostException e) {
e.printStackTrace();
System.exit(2);
}
Socket sock = null;
try {
sock = new Socket(address, SERVER_PORT);
} catch(IOException e) {
e.printStackTrace();
System.exit(3);
}
InputStream in = null;
try {
in = sock.getInputStream();
} catch(IOException e) {
e.printStackTrace();
System.exit(4);
}
OutputStream out = null;
try {
out = sock.getOutputStream();
} catch(IOException e) {
e.printStackTrace();
System.exit(5);
}
BufferedReader socketReader = new BufferedReader(new InputStreamReader(in, StandardCharsets.UTF_8));
PrintStream socketWriter = new PrintStream(out, false, StandardCharsets.UTF_8);
BufferedReader consoleReader =
new BufferedReader(new InputStreamReader(System.in));
String line = null;
while (true) {
line = null;
try {
System.out.print("Enter line:");
line = consoleReader.readLine();
System.out.println("Read '" + line + "'");
} catch(IOException e) {
e.printStackTrace();
System.exit(6);
}
if (line.equals("BYE"))
break;
try {
socketWriter.println(line);
} catch(Exception e) {
e.printStackTrace();
System.exit(7);
}
try {
System.out.println(socketReader.readLine());
} catch(IOException e) {
e.printStackTrace();
System.exit(8);
}
}
System.exit(0);
}
} // EchoClient
To convert a hostname to an IDN name, use
String IDN.toASCII(hostname)
To convert back, use
String IDN.toUnicode(hostname)
Copyright © Jan Newmarch, jan@newmarch.name
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia"
by
Jan Newmarch
is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
.
Based on a work at
https://jan.newmarch.name/NetworkProgramming/
.