The Julia Char
is 32 bits in size, and can represent
all Unicode characters.
The function Int(ch)
will return the Unicode codepoint value,
while the function Char(int)
will convert a Unicode code point
to a Char
.
Strings are encoded in UTF-8 format. Strings treated as arrays of bytes
can be indexed by byte location as in str[1]
.
But for non-ASCII characters, they will occupy two or more bytes,
so many indices will be invalid and throw an error.
The length of a string length(s)
is the number of characters
it contains, which may be less than the number of bytes.
However, a string is an iterable object, so you can loop through all the
characters in a string by
for c in s
println(c)
end
Julia Unicode strings can be normalized using the function
Unicode.normalize(s::AbstractString, normalform::Symbol)
where the normalform is one of :NFC
,
:NFD
, :NFKC
, or :NFKD
.
Julia strings are already in UTF-8 form.
Julia reads and writes in UTF-8 anyway.
This does not appear to have been dealt with yet.
There is a
Punycoder.jl which should do it.
Also there is a post on Github
stdlib/Sockets: `getaddrinfo("☃.net")` non-ASCII hostname (RFC 3492)
suggesting that in 2018 upstream libuv
will
do
this automatically, but it hasn't flowed through to Julia
on my machine yet.
Copyright © Jan Newmarch, jan@newmarch.name
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia"
by
Jan Newmarch
is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
.
Based on a work at
https://jan.newmarch.name/NetworkProgramming/
.