From 7c4418ccdc2c7018f2ac04594e8be61702fb365e Mon Sep 17 00:00:00 2001 From: Martijn Dekker Date: Thu, 7 Jul 2022 21:58:23 +0200 Subject: [PATCH] Multibyte character handling overhaul; allow global disable MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The SHOPT_MULTIBYTE compile-time option did not make much sense as disabling it only disabled multibyte support for ksh/libshell, not libast or libcmd built-in commands. This commit allows disabling multibyte support for the entire codebase by defining the macro AST_NOMULTIBYTE (e.g. via CCFLAGS). This slightly speeds up the code and makes an optimised binary about 5% smaller. src/lib/libast/include/ast.h: - Add non-multibyte fallback versions of the multibyte macros that are used if AST_NOMULTIBYTE is defined. This should cause most multibyte handling to be automatically optimised out everywhere. - Reformat the multibyte macros for legibility. - Similify mbchar() and and mbsize() macros by defining them in terms of mbnchar() and mbnsize(), eliminating code duplication. - Correct non-multibyte fallback of mbwidth(). For consistent behaviour, control characters and out-of-range values should return -1 as they do for UTF-8. The fallback is now the same as default_wcwidth() in src/lib/libast/comp/setlocale.c. src/lib/libast/comp/setlocale.c: - If AST_NOMULTIBYTE is defined, do not compile in the debug and UTF-8 locale conversion functions, including several large conversion tables. Define their fallback macros as 0 as these are used as function pointers. src/cmd/ksh93/SHOPT.sh, src/cmd/ksh93/Mamfile: - Change the SHOPT_MULTIBYTE default to empty, indicating "probe". - Synchronise SHOPT_MULTIBYTE with !AST_NOMULTIBYTE by default. src/cmd/ksh93/include/defs.h: - When SHOPT_MULTIBYTE is zero but AST_NOMULTIBYTE is not non-zero, then enable AST_NOMULTIBYTE here to use the ast.h non-multibyte fallbacks for ksh. When this is done, the effect is that multibyte is optimized out for ksh only, as before. - Remove previous fallback for disabling multibyte (re: c2cb0eae). src/cmd/ksh93/include/lexstates.h, src/cmd/ksh93/sh/lex.c: - Define SETLEN() macro to assign to LEN (i.e. _Fcin.fclen) for multibyte only and do not assign to it directly. With no SHOPT_MULTIBYTE, define that macro as empty. This allows removing multiple '#if SHOPT_MULTIBYTE' directives from lex.c, as that code will all be optimised out automatically if it's disabled. src/cmd/ksh93/include/national.h, src/cmd/ksh93/sh/string.c: - Fix flagrantly incorrect non-multibyte fallback for sh_strchr(). The latter returns an integer offset (-1 if not found), whereas strchr(3) returns a char pointer (NULL if not found). Incorporate the fallback into the function for correct handling instead of falling back to strchr(3) directly. src/cmd/ksh93/sh/macro.c: - lastchar() optimisation: avoid function call if SHOPT_MULTIBYTE is enabled but we're not actually in a multibyte locale. src/cmd/ksh93/sh/name.c: - Use ja_size() even with SHOPT_MULTIBYTE disabled (re: 2182ecfa). Though no regression tests failed, the non-multibyte fallback for typeset -L/-R/-Z length calculation was probably not quite correct as ja_size() does more. The ast.h change to mbwidth() ensures correct behaviour for non-multibyte locales. src/cmd/ksh93/tests/shtests: - Since its value in SHOPT.sh is now empty by default, add a quick feature test (for the length of the UTF-8 character 'é') to check if SHOPT_MULTIBYTE needs to be enabled for the regression tests. --- src/cmd/ksh93/Mamfile | 6 ++++ src/cmd/ksh93/README | 10 ++++-- src/cmd/ksh93/SHOPT.sh | 2 +- src/cmd/ksh93/include/defs.h | 17 ++++------ src/cmd/ksh93/include/fcin.h | 4 +-- src/cmd/ksh93/include/lexstates.h | 5 ++- src/cmd/ksh93/include/national.h | 10 ++---- src/cmd/ksh93/sh/fcin.c | 2 ++ src/cmd/ksh93/sh/lex.c | 21 ++++-------- src/cmd/ksh93/sh/macro.c | 5 +-- src/cmd/ksh93/sh/name.c | 53 ++++++++++--------------------- src/cmd/ksh93/sh/string.c | 16 +++++++--- src/cmd/ksh93/tests/attributes.sh | 12 ++++++- src/cmd/ksh93/tests/shtests | 3 +- src/lib/libast/comp/setlocale.c | 30 +++++++++++++++-- src/lib/libast/include/ast.h | 52 ++++++++++++++++++++++-------- 16 files changed, 147 insertions(+), 101 deletions(-) diff --git a/src/cmd/ksh93/Mamfile b/src/cmd/ksh93/Mamfile index 1a8f688a9..516c9be90 100644 --- a/src/cmd/ksh93/Mamfile +++ b/src/cmd/ksh93/Mamfile @@ -27,6 +27,12 @@ make install exec - SHOPT() exec - { exec - case $1 in + exec - 'MULTIBYTE=') + exec - echo + exec - echo '#if !defined(SHOPT_MULTIBYTE) && !AST_NOMULTIBYTE' + exec - echo '#define SHOPT_MULTIBYTE 1' + exec - echo '#endif' + exec - ;; exec - *=?*) echo exec - echo "#ifndef SHOPT_${1%%=*}" exec - echo "#define SHOPT_${1%%=*} ${1#*=}" diff --git a/src/cmd/ksh93/README b/src/cmd/ksh93/README index b41a23f84..567fbb86e 100644 --- a/src/cmd/ksh93/README +++ b/src/cmd/ksh93/README @@ -100,8 +100,9 @@ The options have the following defaults and meanings: As of 2021-05-10, no tool that can parse this database is known. If you know of any, please contact us. - MULTIBYTE on Multibyte character handling. Requires mblen() and - mbctowc(). + MULTIBYTE Multibyte character handling. This is on by default unless + the flag -DAST_NOMULTIBYTE is passed to the compiler via + CCFLAGS. The UTF-8 character set is fully supported. NAMESPACE on Adds a 'namespace' reserved word that allows defining name spaces. Variables and functions defined within a block like @@ -191,6 +192,11 @@ Note: Do not add compiler flags that cause the compiler to emit terminal escape codes, such as -fdiagnostics-color=always; this will cause the build to fail as the probing code greps compiler diagnostics. +If you are certain that you don't need support for UTF-8 and other multibyte +character locales and really want to save some memory and CPU cycles, add +'-DAST_NOMULTIBYTE' to CCFLAGS to compile out all multibyte character +handling in ksh and supporting libraries. Not recommended for most users. + For more information, run: bin/package help diff --git a/src/cmd/ksh93/SHOPT.sh b/src/cmd/ksh93/SHOPT.sh index fa7c60ca7..1380de49c 100644 --- a/src/cmd/ksh93/SHOPT.sh +++ b/src/cmd/ksh93/SHOPT.sh @@ -25,7 +25,7 @@ SHOPT GLOBCASEDET= # -o globcasedetect: adapt globbing/completion to case-inse SHOPT HISTEXPAND=1 # csh-style history file expansions SHOPT KIA= # ksh -R