mirror of
git://git.code.sf.net/p/cdesktopenv/code
synced 2025-03-09 15:50:02 +00:00
Multibyte character handling overhaul; allow global disable
The SHOPT_MULTIBYTE compile-time option did not make much sense as disabling it only disabled multibyte support for ksh/libshell, not libast or libcmd built-in commands. This commit allows disabling multibyte support for the entire codebase by defining the macro AST_NOMULTIBYTE (e.g. via CCFLAGS). This slightly speeds up the code and makes an optimised binary about 5% smaller. src/lib/libast/include/ast.h: - Add non-multibyte fallback versions of the multibyte macros that are used if AST_NOMULTIBYTE is defined. This should cause most multibyte handling to be automatically optimised out everywhere. - Reformat the multibyte macros for legibility. - Similify mbchar() and and mbsize() macros by defining them in terms of mbnchar() and mbnsize(), eliminating code duplication. - Correct non-multibyte fallback of mbwidth(). For consistent behaviour, control characters and out-of-range values should return -1 as they do for UTF-8. The fallback is now the same as default_wcwidth() in src/lib/libast/comp/setlocale.c. src/lib/libast/comp/setlocale.c: - If AST_NOMULTIBYTE is defined, do not compile in the debug and UTF-8 locale conversion functions, including several large conversion tables. Define their fallback macros as 0 as these are used as function pointers. src/cmd/ksh93/SHOPT.sh, src/cmd/ksh93/Mamfile: - Change the SHOPT_MULTIBYTE default to empty, indicating "probe". - Synchronise SHOPT_MULTIBYTE with !AST_NOMULTIBYTE by default. src/cmd/ksh93/include/defs.h: - When SHOPT_MULTIBYTE is zero but AST_NOMULTIBYTE is not non-zero, then enable AST_NOMULTIBYTE here to use the ast.h non-multibyte fallbacks for ksh. When this is done, the effect is that multibyte is optimized out for ksh only, as before. - Remove previous fallback for disabling multibyte (re:c2cb0eae). src/cmd/ksh93/include/lexstates.h, src/cmd/ksh93/sh/lex.c: - Define SETLEN() macro to assign to LEN (i.e. _Fcin.fclen) for multibyte only and do not assign to it directly. With no SHOPT_MULTIBYTE, define that macro as empty. This allows removing multiple '#if SHOPT_MULTIBYTE' directives from lex.c, as that code will all be optimised out automatically if it's disabled. src/cmd/ksh93/include/national.h, src/cmd/ksh93/sh/string.c: - Fix flagrantly incorrect non-multibyte fallback for sh_strchr(). The latter returns an integer offset (-1 if not found), whereas strchr(3) returns a char pointer (NULL if not found). Incorporate the fallback into the function for correct handling instead of falling back to strchr(3) directly. src/cmd/ksh93/sh/macro.c: - lastchar() optimisation: avoid function call if SHOPT_MULTIBYTE is enabled but we're not actually in a multibyte locale. src/cmd/ksh93/sh/name.c: - Use ja_size() even with SHOPT_MULTIBYTE disabled (re:2182ecfa). Though no regression tests failed, the non-multibyte fallback for typeset -L/-R/-Z length calculation was probably not quite correct as ja_size() does more. The ast.h change to mbwidth() ensures correct behaviour for non-multibyte locales. src/cmd/ksh93/tests/shtests: - Since its value in SHOPT.sh is now empty by default, add a quick feature test (for the length of the UTF-8 character 'é') to check if SHOPT_MULTIBYTE needs to be enabled for the regression tests.
This commit is contained in:
parent
59e79dc026
commit
7c4418ccdc
16 changed files with 147 additions and 101 deletions
|
|
@ -476,7 +476,17 @@ typeset -l x=
|
|||
|
||||
unset x
|
||||
typeset -L4 x=$'\001abcdef'
|
||||
[[ ${#x} == 5 ]] || err_exit "width of character '\001' is not zero"
|
||||
exp=$'\001abcd'
|
||||
[[ e=${#x} -eq 5 && $x == "$exp" ]] || err_exit "typeset -L: width of control character '\001' is not zero" \
|
||||
"(expected length 5 and $(printf %q "$exp"), got length $e and $(printf %q "$x"))"
|
||||
typeset -R10 x=$'a\tb'
|
||||
exp=$' a\tb'
|
||||
[[ e=${#x} -eq 11 && $x == "$exp" ]] || err_exit "typeset -R: width of control character '\t' is not zero" \
|
||||
"(expected length 11 and $(printf %q "$exp"), got length $e and $(printf %q "$x"))"
|
||||
typeset -Z10 x=$'1\t2'
|
||||
exp=$'000000001\t2'
|
||||
[[ e=${#x} -eq 11 && $x == "$exp" ]] || err_exit "typeset -Z: width of control character '\t' is not zero" \
|
||||
"(expected length 11 and $(printf %q "$exp"), got length $e and $(printf %q "$x"))"
|
||||
|
||||
unset x
|
||||
typeset -L x=-1
|
||||
|
|
|
|||
|
|
@ -312,8 +312,9 @@ SHOPT()
|
|||
}
|
||||
. "${SHOPTFILE:-../SHOPT.sh}"
|
||||
unset -f SHOPT
|
||||
[[ -n $SHOPT_MULTIBYTE ]] || SHOPT_MULTIBYTE=$( LC_ALL=C.UTF-8; x=$'\xc3\xa9'; print $(( ${#x}==1 )) )
|
||||
if (( !SHOPT_MULTIBYTE && utf8 && !posix && !compile ))
|
||||
then echo "The -u/--utf8 option is unavailable as SHOPT_MULTIBYTE is turned off in ${SHOPTFILE:-SHOPT.sh}." >&2
|
||||
then echo "-u/--utf8 is unavailable because multibyte support was not compiled in." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue