The '-o nolog' option (which prevented function definitions from being
recorded in the history file) was removed a long time ago, leaving
only a stub for backwards compatibility to stop 'set' from erroring
out if the option is set. But some other vestiges remained.
src/cmd/ksh93/sh/path.c,
src/cmd/ksh93/sh/subshell.c,
src/cmd/ksh93/sh/xec.c:
- Remove a few pointless 'sh_onstate(SH_NOLOG)' statements. As of
93u+ or earlier, this is never checked for anywhere.
src/cmd/ksh93/sh.1:
- They forgot to remove the 'nolog' option documentation here.
Specify that it's obsolete and has no effect.
src/cmd/ksh93/data/builtins.c: sh_set[]:
- Be more concise.
Virtual/non-forking subshells that change the present working
directory (PWD) with 'cd' suffer from a serious race condition. The
PWD is changed within the same process. This means it may not be
possible to change back to the original PWD when exiting the
subshell, as some other process may destroy the PWD or modify its
permissions in the meantime. ksh did not handle this error
condition at all, so, after exiting a subshell that invoked 'cd',
it could silently end up running the script's following command(s)
in the wrong directory. Which might be 'rm -rf *'. So, ouch.
The proper and obvious fix is never to allow a virtual subshell to
change the PWD, as it can never be guaranteed you can return to a
previous directory. If the PWD is changed in a child process, there
is no need to restore it in the parent process, and this whole
problem is avoided. So subshells really should always fork on
encountering a 'cd' command.
But forking is slow. It is not uncommon for scripts to 'cd' in a
subshell that is run repeatedly in a loop.
There is also the issue of custom builtins that can be added to ksh
via shared libraries. In the standard shell language, 'cd' is the
only command that changes the PWD, so we could just make that
command fork the subshell it is run from. But there's no telling
what a custom builtin might do.
So this commit implements a compromise that will not affect
performance unless there is the pathological condition of a PWD
that has been rendered inaccessible in some way:
1. When entering a virtual subshell, if the parent shell's PWD
proves inaccessible upon saving it, the subshell will now fork into
a separate process, avoiding the unrestorable PWD problem.
2. If some attack renders the parent shell's PWD unrestorable
*after* ksh enters a virtual subshell, ksh will now error out when
exiting it. There is nothing else left to do then. Continuing would
mean running arbitrary commands in the wrong PWD.
src/cmd/ksh93/sh/subshell.c:
- Put all the code/variables only needed for fchdir() behind '#if
_lib_fchdir'. This makes it clearer what's what.
(I don't know if there is still any system out there without
fchdir(3); I haven't found any. The chdir(3) fallback version may
be removed later as there is no way to make it remotely secure.)
- Fix the attempt to use the O_PATH mode for open(2) as a fallback
for nonexistent O_SEARCH on Linux. Define _GNU_SOURCE on Linux,
or <fcntl.h> (which is included indirectly) won't define O_PATH.
- Fix use of O_SEARCH. The code was simply wrong, repeating an
open(".",O_RDONLY) instead. Since a nonexistent O_SEARCH is now
redefined as either O_PATH or O_RDONLY, we can simply
open(".",O_SEARCH) and be done with it.
- Fix fatal error handling. Introduce fatal error condition for
failure to fchdir(3) back to the parent's PWD; rename 'duped' to
'fatalerror' and use it for error numbers; save and restore errno
on fatal error so the message will report the cause. (We must
call errormsg() near the end of sh_subshell() to avoid crashes.)
- If open(".",O_SEARCH) was not able get a file descriptor to our
PWD on entry, then call sh_subfork() immediately before running
the subshell commands. (Forking earlier causes a crash.)
- When restoring the PWD, if fchdir(3) fails, do *not* fall back to
chdir(3). We already know the PWD is inaccessible, so if chdir(3)
"succeeds" then, it's very likely to be a substitute injected by
an attacker.
src/cmd/ksh93/bltins/cd_pwd.c:
- If we don't have fchdir(3), then sh_subshell() must fall back to
chdir(2) to restore the PWD. That is highly vulnerable, as a
well-timed rename would allow an attacker to usurp the PWD. We
can't do anything about that if some custom builtin changes the
PWD, but we can at least make 'cd' always fork a subshell, which
slows down ksh but removes the need for the parent shell ever to
restore the PWD. (There is certainly no popular system where this
is relevant and there might not be any such current system.)
This commit adds no regression test because a portable regression
test is not really doable. Different kernels, external /bin/pwd
utilities, etc. all have quite different behaviour under the
pathological condition of an inaccessible PWD, so both the
before-fix and the after-fix behaviour differs. See link below.
Resolves: https://github.com/ksh93/ksh/issues/141
Thanks to Stéphane Chazelas for the bug report.
src/cmd/ksh93/features/options:
- Fix unportable SHELLMAGIC test:
1. /bin/echo does not work on all systems, but /usr/bin/env is a
de-facto standard path (even NixOS gave in and has it).
2. Do not try to execute a temp file in /tmp as it might be
mounted noexec. Use our own $EXECROOT instead.
- rm unnecessary 'exec 9<&-' (it was never opened globally).
- Remove broken SHOPT_UCB test. It had a syntax error, but the
result is not used anywhere.
src/cmd/ksh93/bltins/typeset.c: b_typeset():
- For integer bases change argnum check to default values that
are < 2 or > 64 to 10 instead of allowing invalid base values
that ksh cannot process.
src/cmd/ksh93/bltins/typeset.c: setall():
- Remove argnum check for integer base of 1 as base cannot be 1.
- Relocate strlen(name) to inside of conditional check for
np->nvfun as this code does not need to run all.
- Remove no-op oldname code
src/cmd/ksh93/tests/attributes.sh:
- Add typeset -i base bounds checks to default base 10
src/cmd/ksh93/tests/builtins.sh:
- Fix a test so it doesn't fail if 'whence -a' finds multiple paths
for 'ls'.
src/cmd/ksh93/tests/coprocess.sh
- Update known failure comment with current information.
src/cmd/ksh93/sh/args.c: sh_argprocsub():
- Save and restore state more efficiently by just saving and
restoring all the state bits in one go using the
sh_{get,set}state() macros, which are defined in defs.h as:
#define sh_getstate() (sh.st.states)
#define sh_setstate(x) (sh.st.states = (x))
(and there is yet more evidence that it doesn't matter whether
we use a 'shp->' pointer or 'sh.' direct access).
src/cmd/ksh93/sh/main.c: exfile():
- Remove a no-op 'sh_offstate(SH_INTERACTIVE);'. It was in the
'else' clause of 'if(sh_isstate(SH_INTERACTIVE))' so if we get
there, it is known that this flag is already off.
- To properly disable job control, we also have to save and restore
the job.jobcontrol variable.
src/cmd/ksh93/sh/xec.c: sh_exec():
- Remove some no-op flaggery from this highly performance-sensitive
point in the code. Given the immediately preceding:
volatile int was_errexit = sh_isstate(SH_ERREXIT);
volatile int was_monitor = sh_isstate(SH_MONITOR);
the following:
sh_offstate(SH_ERREXIT);
if(was_errexit&flags)
sh_onstate(SH_ERREXIT);
can be reformulated as:
if(!(flags & sh_state(SH_ERREXIT)))
sh_offstate(SH_ERREXIT);
(IOW, if it was already on, don't turn it off and then on again)
...and the following:
if(was_monitor&flags)
sh_onstate(SH_MONITOR);
can be removed; it's a no-op because it wasn't preceded by an
sh_offstate() and if 'was_monitor' is true, this option is known
to be on. (I considered they may have forgotten an 'sh_offstate'
there like in the SH_ERREXIT case, but adding that causes several
regressions in a shtests run.)
src/cmd/ksh93/include/defs.h:
- Remove comment that is evidently long outdated; there is not (or
no longer) a Shscoped_t type defined anywhere, nor are these
struct fields replicated in any other type definition.
- Add comment to clarify what the 'states' int in 'struct
sh_scoped' is for.
By definition, subshells are never interactive, so they should
disable behaviour associated with interactive shells even if the
main shell is interactive.
Most visibly, running a background job from a subshell like
( some_command & )
now no longer prints a job ID that you cannot use in the main shell.
This behaviour change matches pdksh/mksh, bash, zsh, dash, et al.
Prior discussion:
https://www.mail-archive.com/austin-group-l@opengroup.org/msg06456.html
(plus the preceding thread)
src/cmd/ksh93/sh/subshell.c: sh_subshell():
- Before running the command(s) in the subshell using sh_exec(),
turn off the SH_INTERACTIVE shell state flag. (No need to add
code to restore it as this function already saves and restores
the entire shell state.)
src/cmd/ksh93/bltins/misc.c: b_bg():
- If there is no job control when using 'bg', 'fg' or 'disown',
always print the "no job control" error message and not only if
the shell is in the interactive state. This is also what
pdksh/mksh, bash and zsh do.
Mildly interesting: apparently there was once an idea to implement
shared-state command substitutions as a shell option like 'set -o
subshare'. They were implemented using a new ${ syntax; } instead,
but there is a vestigial SH_SUBSHARE option ID in shell.h plus a
check for it in subshell.c that would cause backtick-style command
substitutions (comsub==1) to share their state. That option isn't
defined in data/options.c so it's impossible for a user to set it.
src/cmd/ksh93/include/shell.h,
src/cmd/ksh93/sh/subshell.c:
- Remove SH_SUBSHELL option vestiges.
src/cmd/ksh93/include/defs.h:
- Correct my comment on 'comsub' flag; I was wrong about what the
values meant. 2 is for a shared-state comsub. (re: 4ce486a7)
Autoloading a function caused the calling script's $LINENO to be
off by the number of lines in the function definition file. In
addition, while running autoloaded functions, errors/warnings were
reported with wrong line numbers.
src/cmd/ksh93/sh/path.c:
- Save $LINENO (shp->inlineno) before autoloading a function, reset
it to 1 so that the correct line number offset is remembered for
the function definition, and restore it after.
src/cmd/ksh93/tests/variables.sh:
- Add regression test for $LINENO, directly and in error messages,
within and outside a non-autoloaded and an autoloaded function.
Fixes: https://github.com/ksh93/ksh/issues/116
ksh 93u+ introduced a regression in the combination of the
'set -o pipefail' and 'set -e'/'set -o errexit' options:
$ ksh93 -o errexit -o pipefail -c \
'(exit 3) | true; echo "still here despite $? status"'
still here despite 3 status
The bug is in how the the huge sh_exec() function in xec.c handles
the 'echeck' flag. Near the end of sh_exec(), this flag triggers a
sh_chktrap() call to check whether to trigger any traps, including
the ERR trap -- and that same function also handles the errexit
option, which is basically the same as 'trap "exit" ERR'.
We can learn more easily how sh_exec() works by inserting debug
warnings in all its 'switch(type&COMMSK)' cases, like:
case TCOM:
errormsg(SH_DICT,ERROR_warn(0),"[DEBUG] TCOM");
... and same for all the others. With that done, the output
of a very simple dummy pipeline looks as follows:
$ arch/*/bin/ksh -c 'true | true | true'
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TFIL
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TFORK
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TFORK
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TSETIO
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TCOM
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TCOM
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] TCOM
So, it looks like sh_exec() handles this pipeline as follows:
TFIL
|_____TFORK
| |_____TCOM
|_____TFORK
| |_____TCOM
|_____TSETIO
|_____TCOM
Each time a pipeline like command1 | command2 | ... is executed,
sh_exec() is invoked with type TFIL; this then recursively invokes
sh_exec() to handle the individual elements. The last element of
the pipe triggers a sh_exec() run with type TSETIO; since it is run
in the current shell environment, it is effectively treated as a
command with an input redirection. All the previous elements are of
type TFORK instead, because they are executed asynchronously in
separate, forked subshell processes. Finally, the TFORK or TSETIO
code then recursively calls sh_exec() again with type TCOM to
actually execute the commands.
When reading the code, we find that the 'echeck' flag is set as
part of the TSETIO code. This makes sense of why only an error in
the last element of the pipe triggers the errexit/ERR trap action.
So that's the bug: the flag is set in the wrong place.
This can be fixed by setting that flag in the TFIL handling code
instead, as this is what calls everything else and collects all the
exit statuses. So the sh_chktrap() call is now executed after
handling the entire pipeline, at the TFIL recursion level.
This also allows getting rid of the special-casing in the buggy
TSETIO version. The SH_ERREXIT state is restored at the end of each
sh_exec() call, so since we're now doing this at a lower recursion
level, it will already have been restored.
src/cmd/ksh93/sh/xec.c: sh_exec():
- Fix the bug as per the above.
src/cmd/ksh93/tests/options.sh:
- Add tests for errexit and ERR trap combined with pipefail.
src/cmd/ksh93/tests/basic.sh:
- Tweak a couple of tests that reported a trap wasn't triggered
even if it was actually triggered more than once.
Fixes: https://github.com/ksh93/ksh/issues/121
Thanks to Stéphane Chazelas for the bug report.
'typeset -xu' and 'typeset -xl' would export the variable but fail
to change case in the value under certain conditions.
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-xufix.patch
This applies the patch essentially without change and adds a
regression test based on the reproducer provided in the RH bug.
Unfortunately there is no description of how the patch works and
it's a little obscure to me. As far as I can figure out, the cause
of the problem was that nv_newattr() erroneously processed a
nonexistent size option-argument such as what can be given to
options like typeset -F, e.g. typeset -F3 for 3 digits after the
dot. A nonexistent size argument is represented by the value of -1.
Red Hat erratum: https://bugzilla.redhat.com/1221766
"Previously, the gcc utility optimized out a non-NULL test in the
ksh implementation of the strdup() function. This caused an
unexpected termination when ksh was executed in a clean chroot
environment. With this update, ksh compilation parameters have been
updated to prevent optimizing out a non-NULL test, and ksh no
longer crashes in clean chroot environments."
The optimizer bug occurs in that function's single-line body:
return (s && (t = oldof(0, char, n = strlen(s) + 1, 0))) ? (char*)memcpy(t, s, n) : (char*)0;
So it must be the test for non-NULL 's' that fails. And 's' is
declared in the function definition, as follows:
extern char*
strdup(register const char* s)
So that makes me wonder if we can work around the bug by simply
removing the 'const' (and the 'register' while we're at it).
However, I have no easy way to verify that at the moment. The Red
Hat patch instead tells gcc to disable optimization for this
function using a #pragma directive.
I have no idea if that gcc optimiser bug has been fixed in the
meantime, but experience from c258a04f has shown that we cannot
trust that it has been fixed (that other optimizer bug is at least
a decade old and still not fixed). So, in it goes, until someone
shows evidence that we no longer need it.
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-badgcc.patch
src/lib/libast/string/strdup.c:
- Tell GCC to disable all optimisations for strdup().
Discussion/analysis: https://bugzilla.redhat.com/1506344
iousepipe() might write out of bounds, causing a crash, if
subpipe[2] is set to a value >= sh.gd.lim.open_max.
src/cmd/ksh93/sh/xec.c: iousepipe():
- Validate the FD using sh_iovalidfd() before the write.
Prior discussion:
https://bugzilla.redhat.com/1454804
On 2017-05-23 13:33:25 UTC, Paulo Andrade wrote:
> In previous ksh versions, when exiting the scope of a ksh
> (not posix) function, it would restore the trap table of
> the "calling context" and if the reason the function exited
> was a signal, it would call sh_fault() passing as argument
> the signal value.
> Newer ksh checks it, but calls kill(getpid(), signal_number)
> after restoring the trap table, but only calls for SIGINT and
> SIGQUIT.
[...]
> The old way appears to have been more appropriate, but there
> must be a reason to only pass SIGINT and SIGQUIT as it is an
> explicit patch.
The last paragraph is where I differ. This would not be the first
example of outright breakage that appeared to be added deliberately
and that 93u+m has fixed or removed, see e.g. 8477d2ce ('printf %H'
had code that deleted all multibyte characters), cefe087d, or
781f0a39. Sometimes it seems the developers added a little
experiment and then forgot all about it, so it became a misfeature.
In this instance, the correct pre-2012 ksh behaviour is still
explicitly documented in (k)sh.1: "A trap condition that is not
caught or ignored by the function causes the function to terminate
and the condition to be passed on to the caller". Meaning, if there
is no function-local trap, the signal defaults to the parent scope.
There is no language that limits this to SIGINT and SIGQUIT only.
It also makes no sense at all to do so -- signals such as SIGPIPE,
SIGTERM, or SIGSEGV need to be caught by default and to do
otherwise results in misbehaviour by default.
src/cmd/ksh93/sh/xec.c: sh_funscope():
- When resending a signal after restoring the global traps state,
remove the spurious check that limits this to SIGINT and SIGQUIT.
- Replace it with a check for nsig!=0, as that means there were
parent trap states to restore. Otherwise 'kill' may be called
with an invalid signal argument, causing a crash on macOS.
src/cmd/ksh93/tests/signal.sh:
- Update a test to check that a function-local SIGTERM trap is
triggered correctly when signalled from another process.
- Complete the tests for 3aee10d7; this bug needed fixing before
we could test that previous fix in a ksh function scope.
- Add a test for triggering global traps from ksh functions,
testing multiple POSIX-standard signals.
The ksh-20120801-trapcom.patch patch contains an off-by-one error,
which was also imported into 93u+m. When saving signals:
ceb77b136f/src/cmd/ksh93/sh/subshell.c (L572-L592)
572 if((nsig=shp->st.trapmax)>0 || shp->st.trapcom[0])
573 {
574 ++nsig;
575 savsig = malloc(nsig * sizeof(char*));
576 /*
577 * the data is, usually, modified in code like:
578 * tmp = buf[i]; buf[i] = strdup(tmp); free(tmp);
579 * so shp->st.trapcom needs a "deep copy" to properly save/restore pointers.
580 */
581 for (isig = 0; isig < nsig; ++isig)
582 {
583 if(shp->st.trapcom[isig] == Empty)
584 savsig[isig] = Empty;
585 else if(shp->st.trapcom[isig])
586 savsig[isig] = strdup(shp->st.trapcom[isig]);
587 else
588 savsig[isig] = NULL;
589 }
On line 574, the number of signals 'nsig' is increased by one. That
increase is permanent, so the 'for' loop on line 581 tries to save
one signal state too many.
The increase was a holdout from the ksh93 code from before the
patch. After the patch, it is not required; it is fine to malloc as
many records as there are trapcom elements to save. So it should
simply be removed. xec.c has the same code to save trap states for
ksh functions, and the same applies.
src/cmd/ksh93/sh/subshell.c: sh_subshell():
- Don't increase nsig.
src/cmd/ksh93/sh/xec.c: sh_funscope():
- Same.
src/cmd/ksh93/tests/signal.sh:
- Add test.
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch
Prior discussion:
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01037.htmlhttps://www.mail-archive.com/ast-users@lists.research.att.com/msg01038.htmlhttps://www.mail-archive.com/ast-users@lists.research.att.com/msg01042.htmlhttps://bugzilla.redhat.com/1212992
On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote:
> I have a user with a ksh crashing problem, and that has
> some "Write error: No space left on device" messages
> in /var/log/messages.
>
> After some debugging, and creating a chroot on a file
> disk image, and a test user, and slowly filling the
> "on file" filesystem, e.g.
>
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2
>
> until leaving just around 12K, I managed to reproduce the
> problem, and be able to debug it with valgrind and vgdb;
> debugging on these conditions is tricky, as cannot tell
> valgrind to spawn gdb, because then gdb itself would fail
> to start.
>
> So, after following the code enough, I learned that at places
> it handles SH_JMPEXIT, there was almost non existing
> handling of SH_JMPERREXIT.
>
> ksh would evently cause a crash due to the struct
> subshell allocated on stack, in sh/subshell.c:sh_subshell
> kept set to the global subshell_data, after it siglongjmp
> back the stack due to, not fully handling the out of disk
> space errors. It would print a few messages, everytime
> a pipe was created, e.g.:
>
> /etc/profile: line 28: write to 3 failed [No space left on device]
>
> until eventually crashing due to corrupted memory; e.g. the
> references to stack data from sh_subsell in the global
> subshell_data. One strange thing to me in coredump analysis
> was that subshell_data prev field was pointing to itself when
> it eventually crashed, what later was understood and expected...
>
> The attached patch handles SH_JMPERREXIT in the code
> paths SH_JMPEXIT is handled, and the failed login, on
> full disk, ends in a pause() call:
>
> ---terminal 1---
> $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
> --vgdb-error=0 /bin/ksh -l
> ==17730== (action at startup) vgdb me ...
> ==17730==
> ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
> ==17730== /path/to/gdb /bin/ksh
> ==17730== and then give GDB the following command
> ==17730== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
> ==17730== --pid is optional if only one valgrind process is running
> ==17730==
> ==17730== Syscall param mount(type) points to unaddressable byte(s)
> ==17730== at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
> ==17730== by 0x493E58: fs3d_mount (fs3d.c:115)
> ==17730== by 0x493C8B: fs3d (fs3d.c:57)
> ==17730== by 0x423E41: sh_init (init.c:1302)
> ==17730== by 0x405CD3: sh_main (main.c:141)
> ==17730== by 0x405B84: main (pmain.c:45)
> ==17730== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==17730==
> ==17730== (action on error) vgdb me ...
> ==17730== Continuing ...
> /etc/profile: line 28: write to 3 failed [No space left on device]
> ---8<---
>
> ---terminal 2---
> (gdb) c
> Continuing.
> ^C
> Program received signal SIGTRAP, Trace/breakpoint trap.
> 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> #1 0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
> #2 0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
> #3 0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
> file=0x524804 <e_sysprofile> "/etc/profile")
> at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
> #4 0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
> at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
> #5 0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
> (gdb)
> ---8<---
Prior discussion:
https://bugzilla.redhat.com/1217236
Summary: doing
( some_simple_command & )
i.e., launching a background job from within a subshell, on a ksh
interactive login shell, caused misbehaviour (command not run).
For me on 93u+m, the misbehaviour was different -- an outright
crash in the job handling code following SIGCHLD, backtracing to:
0 ksh job_unpost + 49 (jobs.c:1655)
1 ksh job_reap + 1632 (jobs.c:468)
2 libsystem_platform.dylib _sigtramp + 29
3 ??? 0 + 4355528544
4 ksh ed_getchar + 102 (edit.c:1048)
5 ksh ed_emacsread + 659 (emacs.c:280)
[...]
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-nohupfork.patch
Lines 1874-1886 in sh_exec() optimise the highly specific case of
'( simple_command & )' by avoiding a sh_subshell() call that sets
up a virtual subshell before forking:
https://github.com/ksh93/ksh/blob/bd283959/src/cmd/ksh93/sh/xec.c#L1874-L1886
1874 else if(((type=t->par.partre->tre.tretyp)&FAMP) && ((type&COMMSK)==TFORK))
1875 {
1876 pid_t pid;
1877 sfsync(NIL(Sfio_t*));
1878 while((pid=fork())< 0)
1879 _sh_fork(shp,pid,0,0);
1880 if(pid==0)
1881 {
1882 sh_exec(t->par.partre,flags);
1883 shp->st.trapcom[0]=0;
1884 sh_done(shp,0);
1885 }
1886 }
1887 else
1888 sh_subshell(shp,t->par.partre,flags,0);
The original patch inserts the following before the sh_done call on
line 1884:
sh_offoption(SH_INTERACTIVE);
sh_done() checks for SH_INTERACTIVE and only runs job handling code
if that option is on.
Also, I had missed the need for an update of shgd->current_pid
here. Since 843b546c replaced lots of getpid() calls by reading
that variable, this could cause ksh to SIGCHLD the wrong process.
But even after adding the shgd->current_pid update to the RH patch,
I was still able to make it crash.
src/cmd/ksh93/sh/xec.c: sh_exec(): case TPAR:
- Disable this optimisation outright for interactive or job
control-enabled shells. I really don't trust it at all, but there
have been no problem reports for scripts without job control, so
keep it until such reports surface.
- Update shgd->current_pid so the child doesn't end up signalling
the wrong process (re: 843b546c).
___________________________________________________________________
P.S.:
It was noted in the Red Hat discussion that ( ... & ) does a double
fork, i.e., a virtual/non-forked subshell always forks before
forking the background job. This extra fork is done by the
following two lines in under 'case TFORK:' in sh_exec() in xec.c:
https://github.com/ksh93/ksh/blob/bd283959/src/cmd/ksh93/sh/xec.c#L1534-L1535
1534 if((type&(FAMP|TFORK))==(FAMP|TFORK))
1535 sh_subfork();
This is executed if we're in a virtual/non-forked subshell, i.e.
shp->subshell is true (note it is false for forked subshells). So
it forks the virtual subshell (the first fork) before running the
background job (the second fork). A background job is recognised by
'type' having both the FAMP (AMP == ampersand == &) and TFORK bits
set and no others.
So the obvious way to remove the double fork is to comment out
these two lines. Indeed, testing that confirms it gone and ksh
appears to work fine at first glance. Does it really? Nearly!
There are just four regression failures in a 'bin/shtests -p' run:
options.sh[394]: & job delayed by --pipefail, expected '1212 or 1221', got '1122'
signal.sh[280]: subshell ignoring signal does not send signal to parent (expected 'SIGUSR1', got 'done')
signal.sh[289]: subshell catching signal does not send signal to parent (expected 'SIGUSR1', got 'done')
subshell.sh[467]: sleep& in subshell hanging
So, those two lines cannot be removed now and we have to keep the
double fork. But removing them doesn't appear to break very much,
so it may be possible to add some tests so we only do an extra fork
when really necessary. That is a project for another day.
For one Red Hat customer, the following reproducer consistently
crashed, tough I was not able to reproduce it and neither was RH.
However, the crash analysis is sound (see below).
function dlog
{
fc -ln -0
}
trap dlog DEBUG
>/tmp/blah
Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-arraylen.patch
The Red Hat bug thread is closed to the public as it also contains
some correspondence with their customer. But it has an excellent
crash analysis from Thomas Gardner which I'm including here for the
record (the line numbers are for their ksh at the time, not 93u+m).
===begin analysis===
> The creation of an empty file instead of a command that executes
> anything causes the coredump.
[...]
> Here is my analysis on the core that was provided by the customer:
>
> (gdb) bt
> #0 sh_fmtq (string=0x1 <Address 0x1 out of bounds>)
> at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/string.c:340
> #1 0x0000000000457e96 in out_string (cp=<value optimized out>, c=32,
> quoted=<value optimized out>, iop=<value optimized out>)
> at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:444
> #2 0x000000000045804c in sh_debug (shp=0x76d180, trap=0x7f2f13a821e0 "dlog",
> name=<value optimized out>, subscript=<value optimized out>,
> argv=0x76e070, flags=<value optimized out>)
> at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:548
> #3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4)
> at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265
> [...need go no further...]
>
> In frame 2, we can see it cycling through your classic
> (char **)argv array like:
>
> 543 while(cp = *argv++)
> 544 {
> 545 if((flags&ARG_EXP) && argv[1]==0)
> 546 out_pattern(iop, cp,' ');
> 547 else
> 548 out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv);
> 549 }
> 550 if(flags&ARG_ASSIGN)
> 551 sfputc(iop,')');
> 552 else if(iop==stkstd)
>
> (we seg-fault after going down the out_string function in line
> 548 up there). The string pointer that points to = 0x1 up in
> frame #0 (sh_fmtq) traces back to the "cp" variable in line 548
> up there. The "argv" variable being referenced up there just gets
> passed in as the fifth argument to this function.
>
> In frame #3 (sh_exec, line 1265), the line that makes the call
> that takes us to frame 2 is:
>
> 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_R AW);
>
> so "com" (the fifth argument) is what's going wrong as it
> descends down through these calls. Looking at where it comes
> from, well, it's assigned here:
>
> 1241 if(argn==0)
> 1242 {
> 1243 /* fake 'true' built-in */
> 1244 np = SYSTRUE;
> 1245 *argv = nv_name(np);
> 1246 com = argv;
> 1247 }
>
> because as we can see:
>
> (gdb) f 3
> #3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4)
> at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265
> 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW);
> (gdb) p argn
> $2 = 0
> (gdb)
>
> argn is == 0 here. The tip-off here is that nv_name clearly
> returns a simple pointer to an array of characters, not an array
> of pointers to arrays of characters as is evidenced by the fact
> that the assignment is "*argv = nv_name(np);" not "argv =
> nv_name(np);". Looking at the function nv_name proves that it
> does indeed return a single pointer to an array of characters,
> not a pointer to an array of pointers to arrays of characters.
> Now, com is defined as a 'char **':
>
> 1002 char *cp=0, **com=0, *comn;
>
> (as it is expected to be in the calls that follow) also, that
> argv is also defined as the effective equivalent a 'char **':
>
> 1237 static char *argv[1];
>
> Yup, argv is actually an array of pointers (char ** equivalent),
> but that array is restricted to having exactly one element.
> Recalling the assignment in the previously quoted line:
>
> 1245 *argv = nv_name(np);
>
> we see that the one and only element in that argv array is
> getting assigned a pointer to an array of characters here.
> Nothing necessarily wrong with that, but remember the loop we
> looked at earlier in frame #2 (sh_debug). It went like:
>
> 543 while(cp = *argv++)
> 544 {
> 545 if((flags&ARG_EXP) && argv[1]==0)
> 546 out_pattern(iop, cp,' ');
> 547 else
> 548 out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv);
> 549 }
>
> which is clearly expecting argv in this context (com in frame 3,
> which really points to that static local single element array
> that is also pointed to by argv in frame 2) to be an array of
> pointers of indefinite size, each element being a pointer, but
> whose last element will be a null pointer. Well, in frame 3 it is
> clearly an array with only a single element, and that one element
> is NOT pointing to null. Watch this:
>
> (gdb) f 3
> #3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4)
> at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265
> 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW);
> (gdb) p com
> $8 = (char **) 0x76e060
> (gdb) p &argv
> $9 = (char *(*)[1]) 0x76e060
> (gdb) p com[0]
> $11 = 0x5009c6 "true"
> (gdb) p com[1]
> $10 = 0x1 <Address 0x1 out of bounds>
> (gdb) p argv[0]
> $12 = 0x5009c6 "true"
> (gdb) p argv[1]
> $13 = 0x1 <Address 0x1 out of bounds>
> (gdb)
>
> So, as expected, com and &argv point to the same place, the first
> element points to the constant string "true", but since the array
> is defined as having only one element, when you refer to a second
> element in that array, you get well, whatever random crap happens
> to be in that memory location. When we try to reproduce this
> problem, apparently we're getting 0 there (or we're not quite
> following this same code path, which is also possible), but the
> customer happens to have a "1" in that memory location.
===end analysis===
src/cmd/ksh93/sh/xec.c: sh_exec():
- When processing TCOM (simple command) with an empty/null command,
increase the size of the static dummy argv[1] array to argv[2],
ensuring a terminating NULL element so that 'while(cp = *argv++)'
loops don't crash. (Note that static objects are automatically
initialised to zero in C.)
src/cmd/ksh93/tests/io.sh:
- Adapt the reproducer, testing a null-command redirection 1000x.
Of course I was wrong to say the bug had nothing to do with
functions; traps in ksh functions are local, are handled the same
way as traps that are local to virtual subshells, and had the same
crashing bug. So this adds a test for that as well.
Contrary to the RH bug report, this is yet another bug with
virtual/non-forked subshells and has nothing to do with functions.
If a signal is ignored (empty trap) in the main shell while any
trap (empty or not) is set on the same signal in a subshell, a
crash eventually occurred upon restoring state when leaving the
subshell.
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-trapcom.patch
Prior discussion:
https://bugzilla.redhat.com/1117404
Paulo Andrade wrote there:
> The problem is that the sh_subshell function was saving pointers
> that could change, and when restoring, bad things would happen.
[...]
> The only comment I added:
> /* contents of shp->st.trapcom may change */
> may be a bit misleading, the "bad" save/restore already knows it,
> probably I should have added a better description telling that the
> data is, usually, modified in code like:
>
> tmp = buf[i]; buf[i] = strdup(tmp); free(tmp);
>
> so the shp->st.trapcom needs a "deep copy", as done in the
> patch, to properly save/restore pointers.
src/cmd/ksh93/sh/subshell.c,
src/cmd/ksh93/sh/xec.c:
- sh_subshell(), sh_funscope(): Make *savsig/*savstak into a
**savsig array. Use strdup(3) to save the data and get known
pointers that will not change. Free these upon restore.
- Change the comment from the patch as Paulo wished he had done.
src/cmd/ksh93/tests/subshell.sh:
- Test 2500 times. This should trigger the crash most of the time.
Another Red Hat patch. "Prior to this update, the result of a
command substitution was lost if a file descriptor used for the
substitution was previously explicitly closed. With this update,
ksh no longer reuses file descriptors that were closed during the
execution of a command substitution. Now, command substitutions
work as expected in the described situation."
Prior discussion:
https://bugzilla.redhat.com/1116072
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140929-safefd.patch
src/cmd/ksh93/include/io.h,
src/cmd/ksh93/sh/io.c:
- Add sh_iosafefd() function to get a file descriptor that is not
in use or otherwise occupied (including marked as closed).
src/cmd/ksh93/sh/subshell.c: sh_subshell():
- Use that function to obtain a safe FD upon restoring state when
exiting a command substitution. I don't really know the how and
why -- all that I/O magic is still beyond me and the code is
uncommented as usual.
src/cmd/ksh93/tests/subshell.sh:
- Add regression test from the reproducer in the bug, reduced to
the minimum necessary.
The undocumented alarm builtin executes actions unsafely so that
'read' with an IFS assignment crashed when an alarm was triggered.
This applies an edited version of a Red Hat patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-alarmifs.patch
Prior discussion:
https://bugzilla.redhat.com/1176670
src/cmd/ksh93/bltins/alarm.c:
- Add a TODO note based on dgk's 2014 email cited in the RH bug.
- When executing the trap function, save and restore the IFS table.
src/cmd/ksh93/sh/init.c: get_ifs():
- Remove now-unnecessary SHOPT_MULTIBYTE preprocessor directives as
8477d2ce lets the compiler optimise out multibyte code if needed.
- Initialise the 0 position of the IFS table to S_EOF. This
corresponds with the static state tables in data/lexstates.c.
src/cmd/ksh93/tests/builtins.sh:
- Crash test.
This applies the following Red Hat patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-cdfork.patch
The associated bug report is public, but nearly all info (such as
a reproducer) has been wiped: https://bugzilla.redhat.com/1168611
However, the errata blurb is mildly informative:
"Previously, ksh sometimes incorrectly initialized a variable
holding the path of the working directory. If a program changed the
working directory between forking and ksh execution, then ksh could
contain an incorrect value in the working directory variable. With
this update, initialization of the working directory variable has
been corrected, and ksh now contains the correct value in the
aforementioned situation."
Also, the patch makes a lot of sense on the face of it. It removes
an optimisation in path_pwd() that checks for the directory defined
by e_crondir[] in data/msg.c, which is:
const char e_crondir[] = "/usr/spool/cron/atjobs";
Of /usr/spool not existed on any system for decades as it is common
to mount usr as read-only, so all the writable stuff was moved to
/var. So that would never check out. And if 'flag' is nonzero, the
optimizing 'count++' is executed regardless of whether that
directory exists, ensuring that it never gets the real PWD and
defaults to returning ".".
src/cmd/ksh93/sh/path.c:
- Apply patch as described.
- Mark 'flag' variable as NOT_USED to suppress compiler warning.
Keep it for backwards compat, as some programs that link with
libshell might use this function (though it's undocumented).
src/cmd/ksh93/include/path.h,
src/cmd/ksh93/data/msg.c:
- Remove now-unused e_crondir[].
This imports a new version of the code to import environment
variable values that was sent to Red Hat from upstream in 2014.
It avoids importing environment variables whose names are not valid
in the shell language, as it would be impossible to change or unset
them. However, they stay in the environment to be passed to child
processes.
Prior discussion: https://bugzilla.redhat.com/1147645
Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-oldenvinit.patch
src/cmd/ksh93/sh/init.c:
- env_init(): Import new, simplified code to import environment
variable name/value pairs. Instead of doing the heavy lifting
itself, this version uses nv_open(), passing the NV_IDENT flag to
reject and skip invalid names.
- Get rid of gotos and a static var by splitting off the code to
import attributes into a new env_import_attributes() function.
This is a better way to avoid importing attributes when
initialising the shell in POSIX mode (re: 00d43960
- Remove an nv_mapchar() call that was based on some unclear
flaggery which was also removed by upstream as sent to Red Hat.
I don't know what that did, if anything; looks like it might have
had something to do with typeset -u/-l, but those particular
attributes have never been successfully inherited through the
environment.
(Maybe that's another bug, or maybe I just don't care as
inheriting attributes is a misfeature anyway; we have to put up
with it because legacy scripts might use it. Maybe someone can
prove it's an unacceptable security risk to import attributes
like readonly from an environment variable that is inherently
vulnerable to manipulation. That would be nice, as a CVE ID
would give us a solid reason to get rid of this nonsense.)
- Remove an 'else cp += 2;' that was very clearly a no-op; 'cp' is
immediately overwritten on the next loop iteration and not used
past the loop.
src/cmd/ksh93/tests/variables.sh:
- Test.
According to 'whence --man', 'whence -f' should ignore functions:
-f Do not check for functions.
Right now this is only accomplished partially. As of commit
a329c22d 'whence -f' avoids any output when encountering a
function (in ksh93u+ 'whence -f' has incorrect output). The
return value is still wrong though:
$ foo() { true; }
$ whence -f foo; echo $?
0
This commit fixes the return value and makes 'type -f' error out
when given a function (like in Bash).
src/cmd/ksh93/bltins/whence.c:
- If -f was passed, set 'cp' to NULL since functions should be
ignored (as documented).
- Simplify return value by avoiding bitwise logic.
src/cmd/ksh93/tests/builtins.sh:
- Add regression tests for 'whence -f' and 'type -f'.
Co-authored-by: Martijn Dekker <martijn@inlv.org>
Another Red Hat patch of a patch. With the new comsub mechanism,
functions could sometimes return the wrong exit status when invoked
from a command substitution.
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-fununset.patch
I have determined that the extra setexit() in the Red Hat patch,
which copies the current exit status to $?, is not needed, as the
code for running functions already sets $? on termination. I've
added extra regression tests to prove this.
By the way, the setexit() macro is defined like this in defs.h:
#define exitset() (sh.savexit=sh.exitval)
That's more evidence (see also 3654ee73) that it does not
matter whether you address the shell's status struct via a
pointer. That macro is used in places that use shp pointers.
But, that aside...
src/cmd/ksh93/sh/subshell.c: sh_subshell():
- When waiting within a command substitution for a forked process
to end, save & restore sh.exitval (the exit status of the command
currently being run) so that job_wait() cannot override it.
src/cmd/ksh93/tests/functions.sh:
- Add tests based in part on the reproducer from rhbz#1116508.
src/cmd/ksh93/sh/xec.c: sh_funct():
- The np->nvalue.rp pointer was dereferenced before the check that
it is non-null. Do this check before dereferencing it.
Since at least 1999, whence -v on pdksh (and its successor mksh)
reports the path where an autoloadable function may be found:
$ mkdir ~/fun; FPATH=~/fun
$ echo 'myfn() { echo hi; }' >~/fun/myfn
$ whence -v myfn
myfn is a undefined (autoload from /home/user/fun/myfn) function
Whereas ksh93 only reports, rather uselessly:
myfn is an undefined function
As of this commit, whence -v/-a on ksh 93u+m does the same as
pdksh, but with correct grammar:
myfn is an undefined function (autoload from /home/user/fun/myfn)
This may be a small violation of my own "no new features" policy
for 93u+m, but I couldn't resist. This omission has been annoying
me, and it's just embarrassing to lack a pdksh feature :)
src/cmd/ksh93/include/path.h,
src/cmd/ksh93/data/msg.c:
- Add e_autoloadfrom[] = " (autoload from %s)" message.
src/cmd/ksh93/bltins/whence.c: whence():
- Report the path (if any) when reporting an undefined function.
This needs to be done in two places:
1. When a function has been explicitly marked undefined with
'autoload', we need to do a quick path_search() loop to find
the path. (These undefined functions take precedence over
regular commands, so are reported first.)
2. When a function is not explicitly autoloaded but merely
available in $FPATH, that path search was already done, so all
we need to do is report it. (These are reported last.)
Note that the output remains as on 93u+ if no function definition
file is found on $FPATH. This is also like pdksh/mksh.
src/cmd/ksh93/data/builtins.c:
- Bump 'whence' version date. The inline docs never detailed very
exactly what 'whence -v' reports, so no need for further edits.
src/cmd/ksh93/tests/path.sh:
- Regress-test the new whence behaviour plus actual autoloading,
including the command override behaviour of autoloaded functions.
The fixargs() function is invoked when ksh needs to run a script
without a #!/hashbang/path. Instead of letting the kernel invoke a
shell, ksh exfile()s the script itself from sh_main(). In the
forked child, it calls fixargs() to set the argument list in the
environment to the args of the new script, so that 'ps' and
/proc/PID/cmdline show the expected output.
But fixargs() is broken because, on systems other than HP-UX (on
which ksh uses pstat(2)), ksh simply inserts a terminating zero.
The arguments list is not a zero-terminated C string. Unix systems
expect the entire arguments buffer to be zeroed out, otherwise 'ps'
and /proc/*/cmdline will have fragments of previous command lines
in the output.
The Red Hat patch for this bug is:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-argvfix.patch
However, that fix is incomplete because 'command_len' was also
hardcoded to be limited to 64 characters (!), which still gave
invalid 'ps' output if the erased command line was longer.
src/cmd/ksh93/sh/main.c: fixargs():
- Remove CMD_LENGTH macro which was defined as 64.
- Remove code that limited the erasure of the arguments buffer to
CMD_LENGTH characters. That code also had quite a dodgy strdup()
call -- it copies arguments to the heap, but they are never freed
(or even used), so it's a memory leak. Also, none of this is
ever done if the length is calculated using pstat(2) on HP-UX,
which is a clear indication that it's unnecessary.
(I think this code block must have been some experiment they
forgot to remove. One reason why I think so is that a 64 byte
arguments limit never made sense, even in the 1980s when they
wrote ksh on 80-column CRT displays. Another indication of this
is that fixing it didn't require adding anything; the code to do
the right thing was already there, it was just being overridden.)
- Zero out the full arguments length as in the Red Hat patch.
src/cmd/ksh93/tests/basic.sh:
- Add test. It's sort of involved because 'ps' is one of the least
portable commands in practice, in spite of standardisation.
This fixes a regression introduced in commit f9c127e3.
When the legacy code for older versions of libast was
removed, the fmtident wrapper wasn't removed. As a result,
the version string output by Ctrl+Alt+V is garbled because
the fmtident wrapper doesn't do any formatting:
$ <Ctrl+Alt+V>
^J@(#)$Id: Version AJM 93u+m 2020-09-14
src/cmd/ksh93/sh/string.c:
- Remove the old version of fmtident that was overriding
the current version of fmtident provided by libast
(in src/lib/libast/string/fmtident.c).
There was no check for the -B/braceexpand option before calling
path_expand() to process brace expansion, making it impossible to
turn off brace expansion within command substitutions. Normally the
lexer flags brace expansion so that this code is not reached, but
shell code within command substitutions is handled differently.
Red Hat patches this by adding this check to the function itself:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140301-fikspand.patch
But I think it's more logical to patch it at the point of decision.
src/cmd/ksh93/sh/macro.c: endfield():
- Decide to call either path_generate() or path_expand() based on
the state of the SH_BRACEEXPAND shell option.
- Fix '#if SHOPT_BRACEPAT' preprocessor check that previously
hardcoded this decision at compile time.
src/cmd/ksh93/tests/options.sh:
- Add tests.
The new command substitution mechanism imported in 970069a6 from
Red Hat patches introduced this bug: backtick-style command
substitutions hang when processing about 117KiB of data or more.
It is fixed by another Red Hat patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140415-hokaido.patch
It saves the value of the shp->comsub flag so that it is set to 2
(usually meaning new-style $(comsubs)) in two specific cases even
when processing backtick comsubs. This stops the sh_subtmpfile()
function in subshell.c from creating a /tmp file. However, I think
that approach is quite ugly, so I'm taking a slightly different one
that has the same effect.
src/cmd/ksh93/include/defs.h,
src/cmd/ksh93/sh/subshell.c:
- Redefine sh_subtmpfile() to pass the comsub flag as an argument.
(Remove the shp pointer argument, which is redundant; a pointer
to the shell state can easily be obtained in the function.)
src/cmd/ksh93/sh/xec.c: sh_exec():
- Apply the Red Hat fix by passing flag 2 to sh_subtmpfile().
src/cmd/ksh93/tests/subshell.sh:
- Move regress test from ce68e1be from basic.sh to here; this is
the place for command substitution tests as they are subshells.
- Add regress test for this bug.
All other changed files:
- Update sh_subtmpfile() calls to pass on the shp->comsub flag.
When using typeset -l or -u on a variable that cannot be changed
when the shell is in restricted mode, ksh crashed.
This fixed is inspired by this Red Hat fix, which is incomplete:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-tpstl.patch
The crash was caused by the nv_shell() function. It walks though a
discipline function tree to get the pointer to the interpreter
associated with it. Evidently, the problem is that some pointer in
that walk is not set correctly for all special variables.
Thing is, ksh only has one shell language interpreter, and only one
global data structure (called 'sh') to keep its main state[*]. Yet,
the code is full of 'shp' pointers to that structure. Most (not
all) functions pass that pointer around to each other, accessing
that struct indirectly, ostensibly to account for the non-existent
possibility that there might be more than one interpreter state.
The "why" of that is an interesting cause for speculation that I
may get to sometime. For now, it is enough to know that, in the
code as it is, it matters not one iota what pointer to the shell
interpreter state is used; they all point to the same thing (unless
it's broken, as in this bug).
So, rather than fixing nv_shell() and/or associated pointer
assignments, this commit simply removes it, and replaces it with
calls to sh_getinterp(), which always returns a pointer to sh (see
init.c, where that function is defined as literally 'return &sh').
[*] Defined in shell.h, with the _SH_PRIVATE part in defs.h
src/cmd/ksh93/include/defs.h,
src/cmd/ksh93/sh/name.c:
- Remove nv_shell().
src/cmd/ksh93/sh/init.c:
- In all the discipline functions for special variables, initialise
shp using sh_getinterp() instead of nv_shell().
src/cmd/ksh93/tests/variables.sh:
- Add regression test for typeset -l/-u on all special variables.
Now that we have ${.sh.pid} a.k.a. shgd->current_pid, which is
updated using getpid() whenever forking a new process, there is no
need for anything else to ever call getpid(); we can use the stored
value instead. There were a lot of these syscalls kicking around,
some of them in performance-sensitive places.
The following lists only changes *other* than changing getpid() to
shgd->currentpid.
src/cmd/ksh93/include/defs.h:
- Comments: clarify what shgd->{pid,ppid,current_pid} are for.
src/cmd/ksh93/sh/main.c,
src/cmd/ksh93/sh/init.c:
- On reinit for a new script, update shgd->{pid,ppid,current_pid}
in the sh_reinit() function itself instead of calling sh_reinit()
from sh_main() and then updating those immediately after that
call. It just makes more sense this way. Nothing else ever calls
sh_reinit() so there are no side effects.
src/cmd/ksh93/sh/xec.c: _sh_fork():
- Update shgd->current_pid in the child early, so that the rest of
the function can use it instead of calling getpid() again.
- Remove reassignment of SH_PIDNOD->nvalue.lp value pointer to
shgd->current_pid (which makes ${.sh.pid} work in the shell).
It's constant and was already set on init.
This imports another fix from Red Hat/Fedora. Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-crash.patch
src/cmd/ksh93/include/jobs.h,
src/cmd/ksh93/sh/jobs.c,
src/cmd/ksh93/sh/subshell.c,
src/cmd/ksh93/sh/xec.c:
- Import the Red Hat fix with these differences:
- Rename the 'hack1_waitall' variable to 'bktick_waitall' and add
a comment describing what it's for.
- Remove unused 'pipefail' variable.
src/cmd/ksh93/tests/basic.sh:
- Regression test from reproducer given in the Red Hat bug report.
- Add special handling to SIGKILL it, as it might freeze hard.