Characters are (normally) represented as 2-byte UTF-16 integers, covering the BMP subset of Unicode.
Strings are a asequence of 16-bit integer values. Normally this would be a sequence of UTF-16 encoded characters.
JavaScript simplifies normalized text handling by leaving it to others: source code is assumed to be in Unicode Normalised Form C, and "textual data coming into the execution environment from outside (e.g., user input, text read from a file or received over the network, etc.) be converted to Unicode Normalised Form C before the running program sees it." ( ECMA: The String Type )
However, there is also the function
String.prototype.normalize()
to convert strings to normal form.
The function socket.write()
by default writes a string in UTF-8
format.
For reading, the socket can have its encoding set by
socket.setEncoding('utf8')
and then data read will be encoded from UTF-8 to the JavaScript string
.
To convert between strings and byte arrays has been discussed at Stackoverflow How to convert UTF8 string to byte array?
The node.js Punycode moduke has been deprecated and instead recommended to use the user-supplied punycode.js module at A robust Punycode converter that fully complies to RFC 3492 and RFC 5891. .
Copyright © Jan Newmarch, jan@newmarch.name
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia"
by
Jan Newmarch
is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
.
Based on a work at
https://jan.newmarch.name/NetworkProgramming/
.