1
0
Fork 0
mirror of git://git.code.sf.net/p/cdesktopenv/code synced 2025-03-09 15:50:02 +00:00

Multibyte character handling overhaul; allow global disable

The SHOPT_MULTIBYTE compile-time option did not make much sense as
disabling it only disabled multibyte support for ksh/libshell, not
libast or libcmd built-in commands. This commit allows disabling
multibyte support for the entire codebase by defining the macro
AST_NOMULTIBYTE (e.g. via CCFLAGS). This slightly speeds up the
code and makes an optimised binary about 5% smaller.

src/lib/libast/include/ast.h:
- Add non-multibyte fallback versions of the multibyte macros that
  are used if AST_NOMULTIBYTE is defined. This should cause most
  multibyte handling to be automatically optimised out everywhere.
- Reformat the multibyte macros for legibility.
- Similify mbchar() and and mbsize() macros by defining them in
  terms of mbnchar() and mbnsize(), eliminating code duplication.
- Correct non-multibyte fallback of mbwidth(). For consistent
  behaviour, control characters and out-of-range values should
  return -1 as they do for UTF-8. The fallback is now the same as
  default_wcwidth() in src/lib/libast/comp/setlocale.c.

src/lib/libast/comp/setlocale.c:
- If AST_NOMULTIBYTE is defined, do not compile in the debug and
  UTF-8 locale conversion functions, including several large
  conversion tables. Define their fallback macros as 0 as these are
  used as function pointers.

src/cmd/ksh93/SHOPT.sh,
src/cmd/ksh93/Mamfile:
- Change the SHOPT_MULTIBYTE default to empty, indicating "probe".
- Synchronise SHOPT_MULTIBYTE with !AST_NOMULTIBYTE by default.

src/cmd/ksh93/include/defs.h:
- When SHOPT_MULTIBYTE is zero but AST_NOMULTIBYTE is not non-zero,
  then enable AST_NOMULTIBYTE here to use the ast.h non-multibyte
  fallbacks for ksh. When this is done, the effect is that
  multibyte is optimized out for ksh only, as before.
- Remove previous fallback for disabling multibyte (re: c2cb0eae).

src/cmd/ksh93/include/lexstates.h,
src/cmd/ksh93/sh/lex.c:
- Define SETLEN() macro to assign to LEN (i.e. _Fcin.fclen) for
  multibyte only and do not assign to it directly. With no
  SHOPT_MULTIBYTE, define that macro as empty. This allows removing
  multiple '#if SHOPT_MULTIBYTE' directives from lex.c, as that
  code will all be optimised out automatically if it's disabled.

src/cmd/ksh93/include/national.h,
src/cmd/ksh93/sh/string.c:
- Fix flagrantly incorrect non-multibyte fallback for sh_strchr().
  The latter returns an integer offset (-1 if not found), whereas
  strchr(3) returns a char pointer (NULL if not found). Incorporate
  the fallback into the function for correct handling instead of
  falling back to strchr(3) directly.

src/cmd/ksh93/sh/macro.c:
- lastchar() optimisation: avoid function call if SHOPT_MULTIBYTE
  is enabled but we're not actually in a multibyte locale.

src/cmd/ksh93/sh/name.c:
- Use ja_size() even with SHOPT_MULTIBYTE disabled (re: 2182ecfa).
  Though no regression tests failed, the non-multibyte fallback for
  typeset -L/-R/-Z length calculation was probably not quite
  correct as ja_size() does more. The ast.h change to mbwidth()
  ensures correct behaviour for non-multibyte locales.

src/cmd/ksh93/tests/shtests:
- Since its value in SHOPT.sh is now empty by default, add a quick
  feature test (for the length of the UTF-8 character 'é') to check
  if SHOPT_MULTIBYTE needs to be enabled for the regression tests.
This commit is contained in:
Martijn Dekker 2022-07-07 21:58:23 +02:00
parent 59e79dc026
commit 7c4418ccdc
16 changed files with 147 additions and 101 deletions

View file

@ -260,9 +260,7 @@ int sh_lex(Lex_t* lp)
register int n, c, mode=ST_BEGIN, wordflags=0;
int inlevel=lp->lexd.level, assignment=0, ingrave=0;
int epatchar=0;
#if SHOPT_MULTIBYTE
LEN=1;
#endif /* SHOPT_MULTIBYTE */
SETLEN(1);
if(lp->lexd.paren)
{
lp->lexd.paren = 0;
@ -1819,18 +1817,15 @@ static int here_copy(Lex_t *lp,register struct ionod *iop)
if(n!=S_NL)
{
/* skip over regular characters */
#if SHOPT_MULTIBYTE
do
{
if(fcleft()< MB_LEN_MAX && mbsize(fcseek(0))<0)
if(mbsize(fcseek(0)) < 0 && fcleft() < MB_LEN_MAX)
{
n = S_EOF;
LEN = -fcleft();
SETLEN(-fcleft());
break;
}
}
#endif /* SHOPT_MULTIBYTE */
while((n=STATE(state,c))==0);
}
if(n==S_EOF || !(c=fcget()))
@ -1846,17 +1841,15 @@ static int here_copy(Lex_t *lp,register struct ionod *iop)
if(!lp->lexd.dolparen && (c=sfwrite(sp,bufp,c))>0)
iop->iosize += c;
}
#if SHOPT_MULTIBYTE
if(LEN==0)
LEN=1;
SETLEN(1);
if(LEN < 0)
{
n = LEN;
c = fcmbget(&LEN);
LEN += n;
SETLEN(LEN + n);
}
else
#endif /* SHOPT_MULTIBYTE */
c = lexfill(lp);
if(c<0)
break;
@ -1874,10 +1867,8 @@ static int here_copy(Lex_t *lp,register struct ionod *iop)
sfputc(sp,'\\');
}
}
#if SHOPT_MULTIBYTE
if(LEN < 1)
LEN = 1;
#endif
SETLEN(1);
bufp = fcseek(-LEN);
}
else