Prior discussion:
https://bugzilla.redhat.com/1217236
Summary: doing
( some_simple_command & )
i.e., launching a background job from within a subshell, on a ksh
interactive login shell, caused misbehaviour (command not run).
For me on 93u+m, the misbehaviour was different -- an outright
crash in the job handling code following SIGCHLD, backtracing to:
0 ksh job_unpost + 49 (jobs.c:1655)
1 ksh job_reap + 1632 (jobs.c:468)
2 libsystem_platform.dylib _sigtramp + 29
3 ??? 0 + 4355528544
4 ksh ed_getchar + 102 (edit.c:1048)
5 ksh ed_emacsread + 659 (emacs.c:280)
[...]
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20120801-nohupfork.patch
Lines 1874-1886 in sh_exec() optimise the highly specific case of
'( simple_command & )' by avoiding a sh_subshell() call that sets
up a virtual subshell before forking:
https://github.com/ksh93/ksh/blob/bd283959/src/cmd/ksh93/sh/xec.c#L1874-L1886
1874 else if(((type=t->par.partre->tre.tretyp)&FAMP) && ((type&COMMSK)==TFORK))
1875 {
1876 pid_t pid;
1877 sfsync(NIL(Sfio_t*));
1878 while((pid=fork())< 0)
1879 _sh_fork(shp,pid,0,0);
1880 if(pid==0)
1881 {
1882 sh_exec(t->par.partre,flags);
1883 shp->st.trapcom[0]=0;
1884 sh_done(shp,0);
1885 }
1886 }
1887 else
1888 sh_subshell(shp,t->par.partre,flags,0);
The original patch inserts the following before the sh_done call on
line 1884:
sh_offoption(SH_INTERACTIVE);
sh_done() checks for SH_INTERACTIVE and only runs job handling code
if that option is on.
Also, I had missed the need for an update of shgd->current_pid
here. Since 843b546c replaced lots of getpid() calls by reading
that variable, this could cause ksh to SIGCHLD the wrong process.
But even after adding the shgd->current_pid update to the RH patch,
I was still able to make it crash.
src/cmd/ksh93/sh/xec.c: sh_exec(): case TPAR:
- Disable this optimisation outright for interactive or job
control-enabled shells. I really don't trust it at all, but there
have been no problem reports for scripts without job control, so
keep it until such reports surface.
- Update shgd->current_pid so the child doesn't end up signalling
the wrong process (re: 843b546c).
___________________________________________________________________
P.S.:
It was noted in the Red Hat discussion that ( ... & ) does a double
fork, i.e., a virtual/non-forked subshell always forks before
forking the background job. This extra fork is done by the
following two lines in under 'case TFORK:' in sh_exec() in xec.c:
https://github.com/ksh93/ksh/blob/bd283959/src/cmd/ksh93/sh/xec.c#L1534-L1535
1534 if((type&(FAMP|TFORK))==(FAMP|TFORK))
1535 sh_subfork();
This is executed if we're in a virtual/non-forked subshell, i.e.
shp->subshell is true (note it is false for forked subshells). So
it forks the virtual subshell (the first fork) before running the
background job (the second fork). A background job is recognised by
'type' having both the FAMP (AMP == ampersand == &) and TFORK bits
set and no others.
So the obvious way to remove the double fork is to comment out
these two lines. Indeed, testing that confirms it gone and ksh
appears to work fine at first glance. Does it really? Nearly!
There are just four regression failures in a 'bin/shtests -p' run:
options.sh[394]: & job delayed by --pipefail, expected '1212 or 1221', got '1122'
signal.sh[280]: subshell ignoring signal does not send signal to parent (expected 'SIGUSR1', got 'done')
signal.sh[289]: subshell catching signal does not send signal to parent (expected 'SIGUSR1', got 'done')
subshell.sh[467]: sleep& in subshell hanging
So, those two lines cannot be removed now and we have to keep the
double fork. But removing them doesn't appear to break very much,
so it may be possible to add some tests so we only do an extra fork
when really necessary. That is a project for another day.