Text: Characters and Strings

Python

Character and string representations

Python does not have a char type: singloe characters are strings of length one. In Python 3, strings are sequences of UTF-8 encoded characters. See Unicode HOWTO.

Unicode normalization

This can be done using the unicodedata.normalize() function


      unicodedata.normalize('NFD', s)
  

Converting strings to and from UTF-8

Strings are already in UTF-8 format.

Reading and writing UTF-8 strings

To encode a string to an array of bytes use


      str.encode('utf-8')
  
To encode an array of bytes to a string use

      str(bytes, encoding='utf-8')
  

Internationalized domain names

The module encodings.idna has methods ToASCII() and ToUnicode()

Python Resources


Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal