Text: Characters and Strings


Character and string representations

The Java char type is a 2 byte integer. It can only represent chanracters from the Basic plane. The String type is a sequence of characters, each of 2 bytes.

To represent characters outside of the BMP, you need to use a string of two Java chars, one for the high surrogate and the other for the low surrogate. The class Character can be used to construct an array of chars of length one or two, using the static method

      char[] Character.toChars(int codepoint)

Unicode normalization

The Java class Normalizer can be used to convert a string to normalized form:

    String normalized_string = Normalizer.normalize(target_chars,
where the target_chars can be a String.

Converting strings to and from UTF-8

Strings can be converted to an array of UTF-8 bytes by

      byte[] bytes = string.getBytes(StandardCharsets.UTF_8);

To convert the other way, use

      String string = new String(bytes, StandardCharsets.UTF_8);

Reading and writing UTF-8 strings

Java uses the default charset for many classes converting between strings and bytes. It's value can be seen by

The page Guide to Character Encoding claims that on macOS this will be UTF-8 while for Windows systems it will be Windows-1252. To avoid errors, the relevant classes should explicitly determine the charset used.

To read UTF-8 strings from an InputStream, wrap it in an InputStreamReader with the character encoding:

      new InputStreamReader(inputStream, StandardCharsets.UTF_8)

To write UTF-8 strings to an OutputStream, wrap it in an OutputStreamWriter with the character encoding:

      new OutputStreamWriter(outputStream, StandardCharsets.UTF_8)

Other relevant classes are dealt with similarly.

A revised EchoClient using this has only a few lines changed: EchoClient.java:

import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;

public class EchoClient {

    public static final int SERVER_PORT = 2000;
    public static void main(String[] args){

	if (args.length != 1) {
	    System.err.println("Usage: Client address");

	InetAddress address = null;
	try {
	    address = InetAddress.getByName(args[0]);
	} catch(UnknownHostException e) {

	Socket sock = null;
	try {
	    sock = new Socket(address, SERVER_PORT);
	} catch(IOException e) {

	InputStream in = null;
	try {
	    in = sock.getInputStream();
	} catch(IOException e) {

	OutputStream out = null;
	try {
	    out = sock.getOutputStream();
	} catch(IOException e) {

	BufferedReader socketReader = new BufferedReader(new InputStreamReader(in, StandardCharsets.UTF_8));

	PrintStream socketWriter = new PrintStream(out, false, StandardCharsets.UTF_8);

	BufferedReader consoleReader =  
                   new BufferedReader(new InputStreamReader(System.in));

	String line = null;
	while (true) {
	    line = null;
	    try {
		System.out.print("Enter line:");
		line = consoleReader.readLine();
		System.out.println("Read '" + line + "'");
	    } catch(IOException e) {

	    if (line.equals("BYE"))
	    try {
	    } catch(Exception e) {

	    try {
	    } catch(IOException e) {
} // EchoClient

Internationalized domain names

To convert a hostname to an IDN name, use

      String IDN.toASCII(hostname)

To convert back, use

      String IDN.toUnicode(hostname)

Java Resources

Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal