1
0
Fork 0
mirror of git://git.code.sf.net/p/cdesktopenv/code synced 2025-03-09 15:50:02 +00:00

Add support for multibyte characters to $IFS (#92)

Add support for multibyte characters to $IFS

This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch.

src/cmd/ksh93/sh/macro.c:
- Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified
  for compatibility with C89). Explanation from https://github.com/att/ast/pull/737:

  Previously, the varsub method used for the macro expansion of $param, ${param},
  and ${param op word} would incorrectly expand the internal field separator (IFS)
  if it was a multibyte character. This was due to truncation based on the
  incorrect assumption that the IFS would never be larger than a single byte.

  This change fixes this issue by carefully tracking the number of bytes that
  should be persisted in the IFS case and ensuring that all bytes are written
  during expansion and substitution.

  Bug report: https://github.com/att/ast/issues/13

- Fixed another bug that caused multibyte characters with the same initial byte
  to be treated as the same character by the IFS. This bug was occurring because
  the first byte of a multibyte character wasn't being written to the stack when
  the IFS delimiter had the same initial byte:

  $ IFS=£
  $ v='§'
  $ set -- $v
  $ v="${1-}"
  $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
  00000000  a7 0a                                             |..|
  00000002

  Bug report: https://github.com/att/ast/issues/1372

src/cmd/ksh93/tests/variables.sh:
- Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs.
- Add a regression test for att/ast#1372 based on the reproducer.
This commit is contained in:
Johnothan King 2020-07-25 11:46:11 -07:00 committed by GitHub
parent 8c16f38a88
commit 8b5f11dcd7
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 75 additions and 7 deletions

View file

@ -1954,10 +1954,34 @@ retry2:
}
else if(d)
{
#if SHOPT_MULTIBYTE
Sfio_t *sfio_ptr = (mp->sp) ? mp->sp : stkp;
/*
* We know from above that if we are not performing @-expansion
* then we assigned `d` the value of `mp->ifs`, here we check
* whether or not we have a valid string of IFS characters to
* write as it is possible for `d` to be set to `mp->ifs` and
* yet `mp->ifsp` to be NULL.
*/
if(mode != '@' && mp->ifsp)
{
/*
* Handle multi-byte characters being used for the internal
* field separator (IFS).
*/
int i;
for(i = 0; i < mbsize(mp->ifsp); i++)
sfputc(sfio_ptr,mp->ifsp[i]);
}
else
sfputc(sfio_ptr,d);
#else
if(mp->sp)
sfputc(mp->sp,d);
else
sfputc(stkp,d);
#endif
}
}
if(arrmax)
@ -2403,7 +2427,16 @@ static void mac_copy(register Mac_t *mp,register const char *str, register int s
if(n==S_MBYTE)
{
if(sh_strchr(mp->ifsp,cp-1)<0)
{
/*
* The multi-byte character that was found has the same initial
* byte as the IFS delimiter, but it's a different character. Put
* the first byte onto the stack and continue; multi-byte characters
* otherwise lose their initial byte.
*/
sfputc(stkp,c);
continue;
}
n = mbsize(cp-1) - 1;
if(n==-2)
n = 0;