Text: Characters and Strings


Character and string representations

Python does not have a char type: singloe characters are strings of length one. In Python 3, strings are sequences of UTF-8 encoded characters. See Unicode HOWTO.

Unicode normalization

This can be done using the unicodedata.normalize() function

      unicodedata.normalize('NFD', s)

Converting strings to and from UTF-8

Strings are already in UTF-8 format.

Reading and writing UTF-8 strings

To encode a string to an array of bytes use

To encode an array of bytes to a string use

      str(bytes, encoding='utf-8')

Internationalized domain names

The module encodings.idna has methods ToASCII() and ToUnicode()

Python Resources

