mirror of
git://git.code.sf.net/p/cdesktopenv/code
synced 2025-03-09 15:50:02 +00:00
printf: Fix HTML and URI encoding (%H, %#H)
This applies a number of fixes to the printf formatting directives
%H and %#H (as well as their equivalents %(html)q and %(url)q):
1. Both formatters have been made multibyte/UTF-8 aware, and no
longer delete multibyte characters. Invalid UTF-8 byte sequences
are rendered as ASCII question marks.
2. %H no longer wrongly encodes spaces as non-breaking spaces
( ) and instead correctly encodes the UTF-8 non-breaking
space as such.
3. %H now converts the single quote (') to '%#39;' instead of
''' which is not a valid entity in all HTML versions.
4. %#H failed to encode some reserved characters (e.g. '?') while
encoding some unreserved ones (e.g. '~'). It now percent-encodes
all characters except those 'unreserved' as per RFC3986 (ASCII
alphanumeric plus -._~).
Prior discussion:
ce8d1467
-4a6d-883b-45ad-fc3c7b90e681%40inlv.org
src/cmd/ksh93/include/defs.h:
src/cmd/ksh93/sh/string.c:
- defs.h: If compiling without SHOPT_MULTIBYTE, redefine the
mbwide() macro (which tests if we're in a multibyte locale) as 0.
This lets the compiler optimiser do the work that would otherwise
require a lot of tedious '#if SHOPT_MULTIBYTE' directives.
- string.c: Remove some now-unneeded '#if SHOPT_MULTIBYTE' stuff.
- defs.h, string.c: Rename is_invisible() to sh_isprint(), invert
the boolean return value, and make it an extern for use in
fmthtml() -- see below. If compiling without SHOPT_MULTIBYTE,
simply #define sh_isprint() as equivalent to isprint(3).
- defs.h: Add URI_RFC3986_UNRESERVED macro for fmthtml() containing
the characters "unreserved" for purposes of URI percent-encoding.
src/cmd/ksh93/bltins/print.c: fmthtml():
- Remove kludge that skipped all multibyte characters (!).
- Complete rewrite to implement fixes described above.
- Don't bother with '#if SHOPT_MULTIBYTE' directives (see above).
src/cmd/ksh93/data/builtins.c:
- sh_optprintf[]: %H: Add single quote to encoded chars doc.
- Edit credits and bump version date.
src/cmd/ksh93/tests/builtins.sh:
- Update and tweak old regression tests.
- Add a number of new tests for UTF-8 HTML and URI encoding, which
are only run when running tests in a UTF-8 locale (shtests -u).
This commit is contained in:
parent
aff63e382d
commit
8477d2ce22
7 changed files with 149 additions and 61 deletions
|
@ -1180,7 +1180,7 @@ USAGE_LICENSE
|
|||
;
|
||||
|
||||
const char sh_optprintf[] =
|
||||
"[-1c?\n@(#)$Id: printf (AT&T Research) 2009-02-02 $\n]"
|
||||
"[-1c?\n@(#)$Id: printf (AT&T Research/ksh93) 2020-08-10 $\n]"
|
||||
USAGE_LICENSE
|
||||
"[+NAME?printf - write formatted output]"
|
||||
"[+DESCRIPTION?\bprintf\b writes each \astring\a operand to "
|
||||
|
@ -1211,7 +1211,7 @@ USAGE_LICENSE
|
|||
"[+%B?Treat the argument as a variable name and output the value "
|
||||
"without converting it to a string. This is most useful for "
|
||||
"variables of type \b-b\b.]"
|
||||
"[+%H?Output \astring\a with characters \b<\b, \b&\b, \b>\b, "
|
||||
"[+%H?Output \astring\a with characters \b<\b, \b&\b, \b>\b, \b'\b, "
|
||||
"\b\"\b, and non-printable characters properly escaped for "
|
||||
"use in HTML and XML documents. The alternate flag \b#\b "
|
||||
"formats the output for use as a URI.]"
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue