This commit fixes two bugs in the generation of $'...' shellquoted
strings:
1. A bug introduced in f9d28935. In UTF-8 locales, a byte that is
invalid in UTF-8, e.g. hex byte 86, would be shellquoted as
\u[86], which is not the same as the correct quoting, \x86.
2. A bug inherited from 93u+. Single bytes (e.g. hex 11) were
always quoted as \x11 and not \x[11], even if a subsequent
character was a hexadecimal digit. However, the parser reads
past two hexadecimal digits, so we got:
$ printf '%q\n' $'\x[11]1'
$'\x111'
$ printf $'\x111' | od -t x1
0000000 c4 91
0000002
After the bug fix, this works correctly:
$ printf '%q\n' $'\x[11]1'
$'\x[11]1'
$ printf $'\x[11]1' | od -t x1
0000000 11 31
0000002
src/cmd/ksh93/sh/string.c: sh_fmtq():
- Make the multibyte code for $'...' more readable, eliminating the
'isbyte' flag.
- When in a multibyte locale, make sure to shellquote both invalid
multibyte characters and unprintable ASCII characters as
hexadecimal bytes (\xNN). This reinstates 93u+ behaviour.
- When quoting bytes, use isxdigit(3) to determine if the next
character is a hex digit, and if so, protect the quoted byte with
square brackets.
src/cmd/ksh93/tests/quoting2.sh:
- Move the 'printf %q' shellquoting regression tests here from
builtins.sh; they test the shellquoting algorithm, not so much
the printf builtin itself.
- Add regression tests for these bugs.