mirror of
				git://git.code.sf.net/p/cdesktopenv/code
				synced 2025-03-09 15:50:02 +00:00 
			
		
		
		
	printf: Fix HTML and URI encoding (%H, %#H)
This applies a number of fixes to the printf formatting directives
%H and %#H (as well as their equivalents %(html)q and %(url)q):
1. Both formatters have been made multibyte/UTF-8 aware, and no
   longer delete multibyte characters. Invalid UTF-8 byte sequences
   are rendered as ASCII question marks.
2. %H no longer wrongly encodes spaces as non-breaking spaces
   ( ) and instead correctly encodes the UTF-8 non-breaking
   space as such.
3. %H now converts the single quote (') to '%#39;' instead of
   ''' which is not a valid entity in all HTML versions.
4. %#H failed to encode some reserved characters (e.g. '?') while
   encoding some unreserved ones (e.g. '~'). It now percent-encodes
   all characters except those 'unreserved' as per RFC3986 (ASCII
   alphanumeric plus -._~).
Prior discussion:
ce8d1467-4a6d-883b-45ad-fc3c7b90e681%40inlv.org
src/cmd/ksh93/include/defs.h:
src/cmd/ksh93/sh/string.c:
- defs.h: If compiling without SHOPT_MULTIBYTE, redefine the
  mbwide() macro (which tests if we're in a multibyte locale) as 0.
  This lets the compiler optimiser do the work that would otherwise
  require a lot of tedious '#if SHOPT_MULTIBYTE' directives.
- string.c: Remove some now-unneeded '#if SHOPT_MULTIBYTE' stuff.
- defs.h, string.c: Rename is_invisible() to sh_isprint(), invert
  the boolean return value, and make it an extern for use in
  fmthtml() -- see below. If compiling without SHOPT_MULTIBYTE,
  simply #define sh_isprint() as equivalent to isprint(3).
- defs.h: Add URI_RFC3986_UNRESERVED macro for fmthtml() containing
  the characters "unreserved" for purposes of URI percent-encoding.
src/cmd/ksh93/bltins/print.c: fmthtml():
- Remove kludge that skipped all multibyte characters (!).
- Complete rewrite to implement fixes described above.
- Don't bother with '#if SHOPT_MULTIBYTE' directives (see above).
src/cmd/ksh93/data/builtins.c:
- sh_optprintf[]: %H: Add single quote to encoded chars doc.
- Edit credits and bump version date.
src/cmd/ksh93/tests/builtins.sh:
- Update and tweak old regression tests.
- Add a number of new tests for UTF-8 HTML and URI encoding, which
  are only run when running tests in a UTF-8 locale (shtests -u).
			
			
This commit is contained in:
		
							parent
							
								
									aff63e382d
								
							
						
					
					
						commit
						8477d2ce22
					
				
					 7 changed files with 149 additions and 61 deletions
				
			
		|  | @ -1180,7 +1180,7 @@ USAGE_LICENSE | |||
| ; | ||||
| 
 | ||||
| const char sh_optprintf[] = | ||||
| "[-1c?\n@(#)$Id: printf (AT&T Research) 2009-02-02 $\n]" | ||||
| "[-1c?\n@(#)$Id: printf (AT&T Research/ksh93) 2020-08-10 $\n]" | ||||
| USAGE_LICENSE | ||||
| "[+NAME?printf - write formatted output]" | ||||
| "[+DESCRIPTION?\bprintf\b writes each \astring\a operand to " | ||||
|  | @ -1211,7 +1211,7 @@ USAGE_LICENSE | |||
| 	"[+%B?Treat the argument as a variable name and output the value " | ||||
| 		"without converting it to a string.  This is most useful for " | ||||
| 		"variables of type \b-b\b.]" | ||||
| 	"[+%H?Output \astring\a with characters \b<\b, \b&\b, \b>\b, " | ||||
| 	"[+%H?Output \astring\a with characters \b<\b, \b&\b, \b>\b, \b'\b, " | ||||
| 		"\b\"\b, and non-printable characters properly escaped for " | ||||
| 		"use in HTML and XML documents.  The alternate flag \b#\b " | ||||
| 		"formats the output for use as a URI.]" | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue