A character is represented by a rune
, which is an alias
for an int32
. It represents a Unicode code point
and is stored in UTF-8 format.
A string is a sequence of bytes. Usually it is used to hold text in UTF-8 format. This mean it can be accessed in two ways:
for i := 0; i < len(str); i++ {
fmt.Printf("%x starts at byte position %d\n", str[i], i)
}
with output
e6 starts at byte position 0
97 starts at byte position 1
a5 starts at byte position 2
e6 starts at byte position 3
9c starts at byte position 4
ac starts at byte position 5
e8 starts at byte position 6
aa starts at byte position 7
9e starts at byte position 8
for index, runeValue := range str {
fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}
with output
9e starts at byte position 8
U+65E5 '日' starts at byte position 0
U+672C '本' starts at byte position 3
U+8A9E '語' starts at byte position 6
Normalization is done using the package norm
by import "golang.org/x/text/unicode/norm"
.
For example, to normalize a byte array, use
norm.NFC.Bytes(b)
.
Strings will usually contain characters encoded in UTF-8.
The UTF-8 bytes will be given by treating the string as an array/slice of bytes.
An array of UTF-8 bytes can be converted to a UTF-8 string by casting it:
string([]byte)
. If Go cannot properly decode bytes into UTF-8,
then it gives the Unicode Replacement Character \uFFFD.
Nothing special has to be done.
Go has the package golang.org/x/net/idna
with functios toASCII()
and
ToUnicode()
Copyright © Jan Newmarch, jan@newmarch.name
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia"
by
Jan Newmarch
is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
.
Based on a work at
https://jan.newmarch.name/NetworkProgramming/
.