Text: Characters and Strings

Julia

Character and string representations

The Julia Char is 32 bits in size, and can represent all Unicode characters. The function Int(ch) will return the Unicode codepoint value, while the function Char(int) will convert a Unicode code point to a Char.

Strings are encoded in UTF-8 format. Strings treated as arrays of bytes can be indexed by byte location as in str[1]. But for non-ASCII characters, they will occupy two or more bytes, so many indices will be invalid and throw an error.

The length of a string length(s) is the number of characters it contains, which may be less than the number of bytes. However, a string is an iterable object, so you can loop through all the characters in a string by


      for c in s
           println(c)
      end      
  

Unicode normalization

Julia Unicode strings can be normalized using the function


      Unicode.normalize(s::AbstractString, normalform::Symbol)
  
where the normalform is one of :NFC, :NFD, :NFKC, or :NFKD.

Converting strings to and from UTF-8

Julia strings are already in UTF-8 form.

Reading and writing UTF-8 strings

Julia reads and writes in UTF-8 anyway.

Internationalized domain names

This does not appear to have been dealt with yet. There is a Punycoder.jl which should do it. Also there is a post on Github stdlib/Sockets: `getaddrinfo("☃.net")` non-ASCII hostname (RFC 3492) suggesting that in 2018 upstream libuv will do this automatically, but it hasn't flowed through to Julia on my machine yet.

Julia Resources


Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal