Char is 32 bits in size, and can represent
all Unicode characters.
Int(ch) will return the Unicode codepoint value,
while the function
Char(int) will convert a Unicode code point
Strings are encoded in UTF-8 format. Strings treated as arrays of bytes
can be indexed by byte location as in
But for non-ASCII characters, they will occupy two or more bytes,
so many indices will be invalid and throw an error.
The length of a string
length(s) is the number of characters
it contains, which may be less than the number of bytes.
However, a string is an iterable object, so you can loop through all the
characters in a string by
for c in s println(c) end
Julia Unicode strings can be normalized using the function
where the normalform is one of
Julia strings are already in UTF-8 form.
Julia reads and writes in UTF-8 anyway.
This does not appear to have been dealt with yet.
There is a
Punycoder.jl which should do it.
Also there is a post on Github
stdlib/Sockets: `getaddrinfo("☃.net")` non-ASCII hostname (RFC 3492)
suggesting that in 2018 upstream
this automatically, but it hasn't flowed through to Julia
on my machine yet.
Copyright © Jan Newmarch, email@example.com
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .