Text: Characters and Strings

Go

Character and string representations

A character is represented by a rune, which is an alias for an int32. It represents a Unicode code point and is stored in UTF-8 format.

A string is a sequence of bytes. Usually it is used to hold text in UTF-8 format. This mean it can be accessed in two ways:

As a sequence of bytes

for i := 0; i < len(str); i++ {
    fmt.Printf("%x  starts at byte position %d\n", str[i], i)
}
      
with output

e6  starts at byte position 0
97  starts at byte position 1
a5  starts at byte position 2
e6  starts at byte position 3
9c  starts at byte position 4
ac  starts at byte position 5
e8  starts at byte position 6
aa  starts at byte position 7
9e  starts at byte position 8
      
As a sequence of runes

for index, runeValue := range str {
    fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}
      
with output

9e  starts at byte position 8
U+65E5 '日' starts at byte position 0
U+672C '本' starts at byte position 3
U+8A9E '語' starts at byte position 6
      
These can be checked against a site such as Unicode Converter - Decimal, text, URL, and unicode converter which shows that '日' for example has UTF-8 format of '\xe6\x97\xa5'.

Unicode normalization

Normalization is done using the package norm by import "golang.org/x/text/unicode/norm". For example, to normalize a byte array, use norm.NFC.Bytes(b).

Converting strings to and from UTF-8

Strings will usually contain characters encoded in UTF-8. The UTF-8 bytes will be given by treating the string as an array/slice of bytes. An array of UTF-8 bytes can be converted to a UTF-8 string by casting it: string([]byte). If Go cannot properly decode bytes into UTF-8, then it gives the Unicode Replacement Character \uFFFD.

Reading and writing UTF-8 strings

Nothing special has to be done.

Internationalized domain names

Go has the package golang.org/x/net/idna with functios toASCII() and ToUnicode()

Go Resources


Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal