mirror of
git://git.code.sf.net/p/cdesktopenv/code
synced 2025-03-09 15:50:02 +00:00
printf: Fix HTML and URI encoding (%H, %#H)
This applies a number of fixes to the printf formatting directives
%H and %#H (as well as their equivalents %(html)q and %(url)q):
1. Both formatters have been made multibyte/UTF-8 aware, and no
longer delete multibyte characters. Invalid UTF-8 byte sequences
are rendered as ASCII question marks.
2. %H no longer wrongly encodes spaces as non-breaking spaces
( ) and instead correctly encodes the UTF-8 non-breaking
space as such.
3. %H now converts the single quote (') to '%#39;' instead of
''' which is not a valid entity in all HTML versions.
4. %#H failed to encode some reserved characters (e.g. '?') while
encoding some unreserved ones (e.g. '~'). It now percent-encodes
all characters except those 'unreserved' as per RFC3986 (ASCII
alphanumeric plus -._~).
Prior discussion:
ce8d1467
-4a6d-883b-45ad-fc3c7b90e681%40inlv.org
src/cmd/ksh93/include/defs.h:
src/cmd/ksh93/sh/string.c:
- defs.h: If compiling without SHOPT_MULTIBYTE, redefine the
mbwide() macro (which tests if we're in a multibyte locale) as 0.
This lets the compiler optimiser do the work that would otherwise
require a lot of tedious '#if SHOPT_MULTIBYTE' directives.
- string.c: Remove some now-unneeded '#if SHOPT_MULTIBYTE' stuff.
- defs.h, string.c: Rename is_invisible() to sh_isprint(), invert
the boolean return value, and make it an extern for use in
fmthtml() -- see below. If compiling without SHOPT_MULTIBYTE,
simply #define sh_isprint() as equivalent to isprint(3).
- defs.h: Add URI_RFC3986_UNRESERVED macro for fmthtml() containing
the characters "unreserved" for purposes of URI percent-encoding.
src/cmd/ksh93/bltins/print.c: fmthtml():
- Remove kludge that skipped all multibyte characters (!).
- Complete rewrite to implement fixes described above.
- Don't bother with '#if SHOPT_MULTIBYTE' directives (see above).
src/cmd/ksh93/data/builtins.c:
- sh_optprintf[]: %H: Add single quote to encoded chars doc.
- Edit credits and bump version date.
src/cmd/ksh93/tests/builtins.sh:
- Update and tweak old regression tests.
- Add a number of new tests for UTF-8 HTML and URI encoding, which
are only run when running tests in a UTF-8 locale (shtests -u).
This commit is contained in:
parent
aff63e382d
commit
8477d2ce22
7 changed files with 149 additions and 61 deletions
|
@ -29,6 +29,11 @@
|
|||
#define defs_h_defined
|
||||
|
||||
#include <ast.h>
|
||||
#if !SHOPT_MULTIBYTE
|
||||
# undef mbwide
|
||||
# define mbwide() (0) /* disable multibyte without need for further '#if SHOPT_MULTIBTYE' */
|
||||
#endif
|
||||
|
||||
#include <sfio.h>
|
||||
#include <error.h>
|
||||
#include "FEATURE/externs"
|
||||
|
@ -441,6 +446,14 @@ extern int sh_whence(char**,int);
|
|||
extern Namval_t *sh_fsearch(Shell_t*,const char *,int);
|
||||
#endif /* SHOPT_NAMESPACE */
|
||||
|
||||
#if SHOPT_MULTIBYTE
|
||||
extern int sh_isprint(int);
|
||||
#else
|
||||
# define sh_isprint(c) isprint(c)
|
||||
#endif /* SHOPT_MULTIBYTE */
|
||||
|
||||
#define URI_RFC3986_UNRESERVED "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~"
|
||||
|
||||
#ifndef ERROR_dictionary
|
||||
# define ERROR_dictionary(s) (s)
|
||||
#endif
|
||||
|
|
|
@ -17,4 +17,4 @@
|
|||
* David Korn <dgk@research.att.com> *
|
||||
* *
|
||||
***********************************************************************/
|
||||
#define SH_RELEASE "93u+m 2020-08-09"
|
||||
#define SH_RELEASE "93u+m 2020-08-10"
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue