Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide what to do about the unknown getn discipline #435

Closed
McDutchie opened this issue Jan 21, 2022 · 20 comments
Closed

Decide what to do about the unknown getn discipline #435

McDutchie opened this issue Jan 21, 2022 · 20 comments
Labels
TODO Things to be done before releasing

Comments

@McDutchie
Copy link

McDutchie commented Jan 21, 2022

I just discovered that, contrary to all the documentation, ksh understands not four initial kinds of discipline functions, but five. There is an undocumented getn discipline:

const char *nv_discnames[] = { "get", "set", "append", "unset", "getn", 0 };

I've experimentally found that the getn discipline is like get, but for arithmetic expressions. For instance, for $foo, the discipline function foo.get is called, but for $((foo)) (with no dollar sign before foo), the discipline function foo.getn is called (with no fallback to foo.get if foo.getn is not defined) (edit: yeah, that's wrong, see further below). This is not documented anywhere and I cannot even find a single mention on Google.

$ foo.get() { echo GET; }
$ foo.getn() { echo GETN; }
$ : $foo
GET
$ : $((foo))
GETN
$ : $(($foo))
GET

A search in the ast-open-archive repo reveals that the getn discipline was quietly added in version 2009-08-21 93t+.

Annoyingly, there is no corresponding distinction for the set discipline. There is no setn discipline and the foo.set discipline function is called for both foo=bar and ((foo=1337)).

So, there are some things for the community to consider. Here are my views:

  1. I think it's a bug that foo.get is not called for $((foo)). It's contrary to all the documentation to date, which says that foo.get is called if the foo variable is referenced. Referencing it in an arithmetic expression without a preceding dollar sign is still referencing it.
  2. I think it's wrong that $((foo)) and $(($foo)) call different disciplines. These expansions should be and are widely assumed to be equivalent. (See also 5da8eb3)

So I think we should remove getn and make arithmetic expressions call get instead. I don't think backwards compatibility is a major concern here as getn is completely undocumented and no one seems to have discovered it before me. It's not even used in any of the other AST code in ast-open-archive, regression tests included. If anyone can find a trace of prior use, that might change things.

On the other hand, if we decide to keep and document getn, perhaps we should introduce a corresponding setn and make ksh call foo.setn instead of foo.set for $((foo)). But that would be a greater compatibility concern as the current behaviour of the set discipline is widely known and documented.

Or we could change nothing and just document the current situation. It does not seem to be possible to introduce consistency without introducing some incompatibility or another.

I tend to put a high priority on consistency, so all things considered, removing getn and folding it into get seems to be the lesser evil to me.

Thoughts, opinions?

@McDutchie McDutchie added the TODO Things to be done before releasing label Jan 21, 2022
@JohnoKing
Copy link

The following thread on the old mailing list seems relevant:
https://www.mail-archive.com/[email protected]/msg00601.html

@hyenias
Copy link

hyenias commented Jan 22, 2022

There is a difference when referencing numeric values within arithmetic expressions as compared to using their parameter substitutions besides being faster--namely precision. So, perhaps these get/getn disciplines allow someone when working with math to code using full precision (or a different level of precision/rounding/etc.) as compared to outside of arithmetic expressions, maybe performing formatting equal to or greater than what can be accomplished by typeset's option combinations. Reference #353

$ typeset -F3 x=2./3
$ echo $((x))
0.66666666666666663
$ echo $(($x))
0.667

# Making foo behave like a numeric
$ foo.get() { .sh.value=0.667; }
$ foo.getn() { (( .sh.value=2./3 )); }
$ echo $foo
0.667
$ echo $(($foo))
0.667
$ echo $((foo))
0.66666666666666663

@hyenias
Copy link

hyenias commented Jan 22, 2022

As what happens with $KSH_VERSION, it demonstrates some of this flexible behavior benefitting the programmer.

$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+4e6d1433 2021-12-17
$ echo $((KSH_VERSION))
20211217

@McDutchie
Copy link
Author

McDutchie commented Jan 22, 2022

For instance, for $foo, the discipline function foo.get is called, but for $((foo)) (with no dollar sign before foo), the discipline function foo.getn is called (with no fallback to foo.get if foo.getn is not defined).

Wait, it appears I was completely wrong about the 'no fallback' part. Sorry.

$ unset -f foo.getn
$ foo.get() { ((.sh.value=2.0/3)); }
$ echo $foo $((foo)) $(($foo))
0.666666666666666667 0.666666666666666667 0.666666666666666667

Well, that does change things.

@JohnoKing
Copy link

While experimenting with getn discipline functions I found a crash (it's intermittent, so it might take a few tries for the reproducer to segfault).

$ cat /tmp/a
foo=one
foo.getn()
{
	.sh.value=1
}
foo.get()
{
	.sh.value=one
}
unset -f foo.getn

$ ENV=/dev/null gdb arch/linux.i386-64/bin/ksh
GNU gdb (GDB) 11.2
[Redacted unnecessary info]
Reading symbols from arch/linux.i386-64/bin/ksh...
(gdb) run
Starting program: /home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh
$ . /tmp/a
$ echo $((foo))
/home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh: one: parameter not set
$ set -o emacs
$ [Detaching after vfork from child process 2315287]
. /tmp/a

Program received signal SIGSEGV, Segmentation fault.
nv_arrayptr (np=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/array.c:929
929	        if(nv_isattr(np,NV_ARRAY))
(gdb) bt
#0  nv_arrayptr (np=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/array.c:929
#1  0x000055555557813e in nv_putsub (np=0x0, sp=0x0, mode=268435456) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/array.c:1179
#2  0x000055555556922d in block_done (bp=0x7fffffffcc10) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:215
#3  0x0000555555569e42 in assign (np=0x55555572ec70, val=0x555555703585 "one", flags=0, handle=0x55555572ed30) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:371
#4  0x0000555555569082 in nv_putv (np=0x55555572ec70, value=0x555555703585 "one", flags=0, nfp=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:145
#5  0x00005555555a52d7 in nv_putval (np=0x55555572ec70, string=0x555555703585 "one", flags=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:1631
#6  0x00005555555a4fa9 in nv_open (name=0x555555703581 "foo=one", root=0x555555706720, flags=131584) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:1550
#7  0x00005555555a1f97 in nv_setlist (arg=0x555555703570, flags=131584, typ=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:612
#8  0x00005555555c4549 in sh_exec (t=0x0, flags=4) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1117
#9  0x00005555555c75d6 in sh_exec (t=0x0, flags=4) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2079
#10 0x00005555555c3128 in sh_eval (iop=0x0, mode=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:717
#11 0x00005555555dba9d in b_dot_cmd (n=0, argv=0x555555703338, context=0x5555556eab10 <sh+1456>) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:324
#12 0x00005555555c5440 in sh_exec (t=0x0, flags=4) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1366
#13 0x00005555555684a5 in exfile (iop=0x0, fno=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:591
#14 0x00005555555679d9 in sh_main (ac=1, av=0x7fffffffe508, userinit=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:352
#15 0x0000555555566dbe in main (argc=1, argv=0x7fffffffe508) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:45
(gdb) p np
$1 = (Namval_t *) 0x0
(gdb) frame 2
#2  0x000055555556922d in block_done (bp=0x7fffffffcc10) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:215
215		                nv_putsub(bp->np, bp->sub,(bp->isub<0?0:bp->isub)|ARRAY_SETSUB);
(gdb) p bp->np
$2 = (Namval_t *) 0x0

@McDutchie
Copy link
Author

So nv_putsub() is called from block_done() with np==NULL. Seems easy enough to prevent that:

--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -211,7 +211,7 @@ static struct blocked *block_info(Namval_t *np, struct blocked *pp)
 static void block_done(struct blocked *bp)
 {
 	blist = bp = bp->next;
-	if(bp && (bp->isub>=0 || bp->sub))
+	if(bp && (bp->isub>=0 || bp->sub) && bp->np)
 		nv_putsub(bp->np, bp->sub,(bp->isub<0?0:bp->isub)|ARRAY_SETSUB);
 }

That still leaves problems though. After that patch, a third . /tmp/a produces a hang.

@JohnoKing
Copy link

I ran the reproducer under ASan and it crashes with a buffer overflow (when run with and without the above patch applied).

$ . /tmp/a
$ echo $((foo))
arch/linux.i386-64/bin/ksh: one: parameter not set
=================================================================
==834360==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd7de7f998 at pc 0x55f1431be2b1 bp 0x7ffd7de7c410 sp 0x7ffd7de7c400
READ of size 8 at 0x7ffd7de7f998 thread T0
    #0 0x55f1431be2b0 in block_info /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:196
    #1 0x55f1431be8b6 in assign /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:246
    #2 0x55f1431be16c in nv_putv /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:145
    #3 0x55f143265537 in nv_putval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:1631
    #4 0x55f143257c3f in tilde_expand2 /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:2635
    #5 0x55f143246372 in copyto /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:665
    #6 0x55f143241902 in sh_macexpand /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:239
    #7 0x55f143243d3c in sh_macpat /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:416
    #8 0x55f1432c3d8b in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2367
    #9 0x55f1432c0e2d in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2083
    #10 0x55f1433017b4 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    #11 0x55f1432cd07c in sh_funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:3305
    #12 0x55f1432bbf3a in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1531
    #13 0x55f1432c0db0 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2079
    #14 0x55f1433017b4 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    #15 0x55f1432cd07c in sh_funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:3305
    #16 0x55f1432ce5c6 in sh_fun /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:3395
    #17 0x55f1431c112c in lookup /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:416
    #18 0x55f1431c1b7f in lookups /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:461
    #19 0x55f1431bd5cc in nv_getv /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:60
    #20 0x55f14326cf59 in nv_getval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:2768
    #21 0x55f14321e23b in io_prompt /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/io.c:2145
    #22 0x55f14321d2ab in slowread /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/io.c:1974
    #23 0x55f143400c76 in sfrd /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfrd.c:242
    #24 0x55f1433f002d in _sffilbuf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sffilbuf.c:101
    #25 0x55f143403214 in sfreserve /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfreserve.c:144
    #26 0x55f1431bb5a5 in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:527
    #27 0x55f1431b9c26 in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:352
    #28 0x55f1431b7ffd in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:45
    #29 0x7f26413abb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #30 0x55f1431b7f0d in _start (/home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh+0x5cf0d)

Address 0x7ffd7de7f998 is located in stack of thread T0 at offset 88 in frame
    #0 0x55f1432b5a0c in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:916

  This frame has 37 object(s):
    [32, 36) 'argn' (line 936)
    [48, 52) 'jobid' (line 1568)
    [64, 68) 'savein' (line 2235)
    [80, 84) 'was_interactive' (line 938)
    [96, 100) 'was_errexit' (line 939) <== Memory access at offset 88 underflows this variable
    [112, 116) 'was_monitor' (line 940)
    [128, 132) 'echeck' (line 941)
    [144, 148) 'scope' (line 1239)
    [160, 164) 'share' (line 1239)
    [176, 180) 'indx' (line 1457)
    [192, 196) 'jmpval' (line 1703)
    [208, 212) 'r' (line 2230)
    [224, 232) 'cp' (line 935)
    [256, 264) 'nq' (line 965)
    [288, 296) 'was_vi' (line 1246)
    [320, 328) 'was_emacs' (line 1249)
    [352, 360) 'was_gmacs' (line 1249)
    [384, 392) 'nullptr' (line 2109)
    [416, 432) 'ta' (line 2396)
    [448, 464) 'tb' (line 2396)
    [480, 496) 'before_usr' (line 2398)
    [512, 528) 'before_sys' (line 2398)
    [544, 560) 'after_usr' (line 2398)
    [576, 592) 'after_sys' (line 2398)
    [608, 648) 'nr' (line 1464)
    [688, 744) 'node' (line 1460)
    [784, 796) 'pipes' (line 1569)
    [816, 828) 'pvo' (line 1970)
    [848, 860) 'pvn' (line 1971)
    [880, 912) 'arg' (line 2312)
    [944, 976) 'av' (line 2352)
    [1008, 1048) 'av' (line 2111)
    [1088, 1136) 'tm' (line 2398)
    [1168, 1216) 'argv' (line 2667)
    [1248, 1392) 'statb' (line 1251)
    [1456, 1600) 'stata' (line 1411)
    [1664, 1667) 'unop' (line 2685)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/nvdisc.c:196 in block_info
Shadow bytes around the buggy address:
  0x10002fbc7ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002fbc7ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002fbc7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002fbc7f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002fbc7f20: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 04 f2
=>0x10002fbc7f30: 04 f2 04[f2]04 f2 04 f2 04 f2 04 f2 04 f2 04 f2
  0x10002fbc7f40: 04 f2 04 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2
  0x10002fbc7f50: 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2
  0x10002fbc7f60: 00 00 f2 f2 00 00 f2 f2 00 00 f2 f2 00 00 f2 f2
  0x10002fbc7f70: 00 00 f2 f2 00 00 00 00 00 f2 f2 f2 f2 f2 00 00
  0x10002fbc7f80: 00 00 00 00 00 f2 f2 f2 00 00 00 04 00 00 00 04
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==834360==ABORTING

@McDutchie McDutchie added the 1.0 label Feb 17, 2022
@McDutchie
Copy link
Author

I don't understand the cause of that last crash. The backtrace suggests it's in the middle of performing a tilde expansion (tilde_expand2()) but no tilde expansion happens in the reproducer.

I was able to reproduce the first crash with ASan on clang. With the patch below it does not crash any more. The change is that blist is set to NULL if bp->np is NULL. This patch may not be the best possible fix as bp->np should probably never be NULL to begin with, but I haven't yet been able to figure out how that happens.

diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index 725ff079e..c6ad4d50b 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -211,8 +211,13 @@ static struct blocked *block_info(Namval_t *np, struct blocked *pp)
 static void block_done(struct blocked *bp)
 {
 	blist = bp = bp->next;
-	if(bp && (bp->isub>=0 || bp->sub))
-		nv_putsub(bp->np, bp->sub,(bp->isub<0?0:bp->isub)|ARRAY_SETSUB);
+	if(bp)
+	{
+		if(!bp->np)
+			blist = NIL(struct blocked *);
+		else if(bp->isub>=0 || bp->sub)
+			nv_putsub(bp->np, bp->sub,(bp->isub<0?0:bp->isub)|ARRAY_SETSUB);
+	}
 }

 /*

@McDutchie
Copy link
Author

More crashing (including after applying the patch above):

$ LC_NUMERIC=C ENV=/./dev/null arch/*/bin/ksh
$ foo.getn() { .sh.value=2.3*4.5; }
$ typeset -F foo
Memory fault

Backtrace:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   ksh                           	0x0000000100d89bf2 assign + 2802 (nvdisc.c:375)
1   ksh                           	0x0000000100d88cb8 nv_putv + 520 (nvdisc.c:145)
2   ksh                           	0x0000000100dd6e2e _nv_unset + 974 (name.c:2520)
3   ksh                           	0x0000000100dde792 nv_newattr + 1522 (name.c:3023)
4   ksh                           	0x0000000100d7fbf6 setall + 5350
5   ksh                           	0x0000000100d815de b_typeset + 4638 (typeset.c:564)
6   ksh                           	0x0000000100dff615 sh_exec + 8261 (xec.c:1360)
7   ksh                           	0x0000000100d8807f exfile + 3247 (main.c:605)
8   ksh                           	0x0000000100d8707f sh_main + 3551 (main.c:367)
9   ksh                           	0x0000000100d6c5b6 main + 38 (pmain.c:45)
10  libdyld.dylib                 	0x00007fff70bfe3d5 start + 1

@McDutchie
Copy link
Author

This is clearly still experimental and nowhere near ready for release, so I'm removing the getn discipline from the 1.0 branch. I'll leave it in the dev branch in hopes of fixing it.

@McDutchie
Copy link
Author

On the other hand, that crash is quite easy to patch – it's a straightforward failure to check for a non-null pointer. Maybe I won't remove it from 1.0 just yet.

--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -367,7 +367,7 @@ static void assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
 done:
        if(bp== &block)
                block_done(bp);
-       if(nq && nq->nvalue.rp->running==1)
+       if(nq && nq->nvalue.rp && nq->nvalue.rp->running==1)
        {
                nq->nvalue.rp->running=0;
                _nv_unset(nq,0);

McDutchie added a commit that referenced this issue Jun 7, 2022
The 'getn' discipline is experimental and undocumented, the only
mention of it being an old mailing list post from David Korn:
https://www.mail-archive.com/[email protected]/msg00601.html

But it still should not crash.

$ LC_NUMERIC=C ENV=/./dev/null arch/*/bin/ksh
$ foo.getn() { .sh.value=2.3*4.5; }
$ typeset -F foo
Memory fault

src/cmd/ksh93/sh/nvdisc.c: assign():
- Check that the nvalue union has a non-NULL pointer before
  using it.

Progresses: #435
McDutchie added a commit that referenced this issue Jun 7, 2022
The 'getn' discipline is experimental and undocumented, the only
mention of it being an old mailing list post from David Korn:
https://www.mail-archive.com/[email protected]/msg00601.html

But it still should not crash.

$ LC_NUMERIC=C ENV=/./dev/null arch/*/bin/ksh
$ foo.getn() { .sh.value=2.3*4.5; }
$ typeset -F foo
Memory fault

src/cmd/ksh93/sh/nvdisc.c: assign():
- Check that the nvalue union has a non-NULL pointer before
  using it.

Progresses: #435
@McDutchie
Copy link
Author

Here is a patch for removing the getn discipline. This does not mean I will actually remove it. It can help to see what code is involved in it. The ksh93-history repo helped me in identifying it. It's actually not very much code at all.

Patch to remove `getn` discipline
--- a/src/cmd/ksh93/data/variables.c
+++ b/src/cmd/ksh93/data/variables.c
@@ -108,7 +108,7 @@ const struct shtable2 shtab_variables[] =
 	"",	0,					(char*)0
 };

-const char *nv_discnames[] = { "get", "set", "append", "unset", "getn", 0 };
+const char *nv_discnames[] = { "get", "set", "append", "unset", 0 };

 #if SHOPT_STATS
 const Shtable_t shtab_stats[] =
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -157,13 +157,12 @@ void nv_putv(Namval_t *np, const char *value, int flags, register Namfun_t *nfp)
 #define	ASSIGN		1
 #define	APPEND		2
 #define	UNASSIGN	3
-#define	LOOKUPN		4
 #define BLOCKED		((Namval_t*)&nv_local)

 struct	vardisc
 {
 	Namfun_t	fun;
-	Namval_t	*disc[5];
+	Namval_t	*disc[4];
 };

 struct blocked
@@ -378,15 +377,15 @@ done:
  * This function executes a lookup disc and then performs
  * the lookup on the given node <np>
  */
-static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
+static char*	lookup(Namval_t *np, Sfdouble_t *dp,Namfun_t *handle)
 {
 	register struct vardisc	*vp = (struct vardisc*)handle;
 	struct blocked		block, *bp = block_info(np, &block);
-	register Namval_t	*nq = vp->disc[type];
+	register Namval_t	*nq = vp->disc[LOOKUPS];
 	register char		*cp=0;
 	Namval_t		node;
 	union Value		*up = np->nvalue.up;
-	if(nq && !isblocked(bp,type))
+	if(nq && !isblocked(bp,LOOKUPS))
 	{
 		struct checkpt	checkpoint;
 		int		jmpval;
@@ -401,12 +400,7 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 			nv_onattr(SH_VALNOD,NV_NOFREE);
 			_nv_unset(SH_VALNOD,0);
 		}
-		if(type==LOOKUPN)
-		{
-			nv_onattr(SH_VALNOD,NV_DOUBLE|NV_INTEGER);
-			nv_setsize(SH_VALNOD,10);
-		}
-		block(bp,type);
+		block(bp,LOOKUPS);
 		block(bp, UNASSIGN);   /* make sure nv_setdisc doesn't invalidate 'vp' by freeing it */
 		sh_pushcontext(&checkpoint, 1);
 		jmpval = sigsetjmp(checkpoint.buff, 0);
@@ -416,15 +410,10 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 		if(sh.topfd != checkpoint.topfd)
 			sh_iorestore(checkpoint.topfd, jmpval);
 		unblock(bp,UNASSIGN);
-		unblock(bp,type);
-		if(!vp->disc[type])
+		unblock(bp,LOOKUPS);
+		if(!vp->disc[LOOKUPS])
 			chktfree(np,vp);
-		if(type==LOOKUPN)
-		{
-			cp = (char*)(SH_VALNOD->nvalue.cp);
-			*dp = nv_getnum(SH_VALNOD);
-		}
-		else if(cp = nv_getval(SH_VALNOD))
+		if(cp = nv_getval(SH_VALNOD))
 			cp = stkcopy(stkstd,cp);
 		_nv_unset(SH_VALNOD,NV_RDONLY);
 		if(!nv_isnull(&node))
@@ -438,12 +427,7 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 	if(nv_isarray(np))
 		np->nvalue.up = up;
 	if(!cp)
-	{
-		if(type==LOOKUPS)
-			cp = nv_getv(np,handle);
-		else
-			*dp = nv_getn(np,handle);
-	}
+		cp = nv_getv(np,handle);
 	if(bp== &block)
 		block_done(bp);
 	if(nq && nq->nvalue.rp && nq->nvalue.rp->running==1)
@@ -456,14 +440,7 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)

 static char*	lookups(Namval_t *np, Namfun_t *handle)
 {
-	return(lookup(np,LOOKUPS,(Sfdouble_t*)0,handle));
-}
-
-static Sfdouble_t lookupn(Namval_t *np, Namfun_t *handle)
-{
-	Sfdouble_t	d;
-	lookup(np,LOOKUPN, &d ,handle);
-	return(d);
+	return(lookup(np,(Sfdouble_t*)0,handle));
 }


@@ -561,10 +538,7 @@ char *nv_setdisc(register Namval_t* np,register const char *event,Namval_t *acti
 	else if(action)
 	{
 		Namdisc_t *dp = (Namdisc_t*)vp->fun.disc;
-		if(type==LOOKUPS)
-			dp->getval = lookups;
-		else if(type==LOOKUPN)
-			dp->getnum = lookupn;
+		dp->getval = lookups;
 		vp->disc[type] = action;
 	}
 	else

@McDutchie
Copy link
Author

I was able to reproduce the first crash with ASan on clang. With the patch below it does not crash any more.

The patch is not actually sufficient. When I compile ksh with ASan and that patch, and replicate the reproducer, then a very similar crash occurs, intermittently, with a non-null pointer that is invalid.

This is all taking much more of my time than it's actually worth. :/

@McDutchie
Copy link
Author

Considering that:

  • this is yet another undocumented, unfinished, forgotten about and abandoned experiment (the planned setn discipline was never implemented),
  • it's crashy,
  • the intermittent crashes are not realistically solvable (at least not by me),
  • virtually no one even knows about this, and
  • this is taking too much time and effort for too little benefit,

I now think it's best to just get rid of the whole thing.

I'm also not actually convinced this is the best design. Defining additional disciplines will confuse most users and a correct usage in scripts will be the exception rather than the rule.

The problem it was meant to solve could be perhaps be solved in a different way. In the normal get and set disciplines, .sh.value could inherit any integer/float typeset attributes of the parent variable, which would fix the string conversion problem. But that's a project for a post-1.0 release.

McDutchie added a commit that referenced this issue Jul 16, 2022
*.getn discipline functions cause .sh.value to have a float type
for arithmetic expressions that get the value of foo, avoiding the
problem of having to convert between floats and strings (e.g.
rounding errors). There is no corresponding .setn discipline.

A search in the ast-open-archive repo reveals that the getn
discipline was quietly added in version 2009-08-21 93t+, with not
even a mention in the RELEASE file.

The one available mention on the internet is this old thread:
https://www.mail-archive.com/[email protected]/msg00601.html
Apparently a setn discipline *was* planned, but never implemented.

getn discipline functions may also crash in several ways. I've been
unsuccessful at solving all the crashes, particularly as one of
them is intermittent. This should not be in the 1.0 release.

Further discussion: #435
@McDutchie McDutchie removed the 1.0 label Jul 16, 2022
@McDutchie
Copy link
Author

So, getn is now gone from the 1.0 branch. I'm leaving it on dev for now.

I've found a relatively simple way to fold the getn discipline into the regular get discipline. The patch below is against the dev branch.

When applied, if you define a foo.get discipline function, it works as normal if you get the string value ($foo) but works like the former getn, with .sh.value being a float, if you get the numeric value in an arithmetic expression (e.g., $((foo))).

This seems much more logical to me than a separate getn discipline. But it's a (small) incompatibility that could conceivably require some script to be changed, so it's definitely for a post-1.0 release.

@hyenias, would you like to test this?

diff --git a/src/cmd/ksh93/data/variables.c b/src/cmd/ksh93/data/variables.c
index a4b061b65..17acfc83b 100644
--- a/src/cmd/ksh93/data/variables.c
+++ b/src/cmd/ksh93/data/variables.c
@@ -109,7 +109,7 @@ const struct shtable2 shtab_variables[] =
 	"",	0,					(char*)0
 };

-const char *nv_discnames[] = { "get", "set", "append", "unset", "getn", 0 };
+const char *nv_discnames[] = { "get", "set", "append", "unset", 0 };

 #if SHOPT_STATS
 const Shtable_t shtab_stats[] =
diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index b0e3b5849..de5a04eee 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -154,16 +154,15 @@ void nv_putv(Namval_t *np, const char *value, int flags, register Namfun_t *nfp)
 	}
 }

-#define	LOOKUPS		0
+#define	LOOKUP		0
 #define	ASSIGN		1
 #define	APPEND		2
 #define	UNASSIGN	3
-#define	LOOKUPN		4

 struct	vardisc
 {
 	Namfun_t	fun;
-	Namval_t	*disc[5];
+	Namval_t	*disc[4];
 };

 struct blocked
@@ -290,8 +289,8 @@ static void	assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
 		savelex = *lexp;
 		sh_lexopen(lexp, 0);   /* needs full init (0), not what it calls reinit (1) */
 		block(bp,type);
-		if(bflag = (type==APPEND && !isblocked(bp,LOOKUPS)))
-			block(bp,LOOKUPS);
+		if(bflag = (type==APPEND && !isblocked(bp,LOOKUP)))
+			block(bp,LOOKUP);
 		sh_pushcontext(&checkpoint, 1);
 		jmpval = sigsetjmp(checkpoint.buff, 0);
 		if(!jmpval)
@@ -301,7 +300,7 @@ static void	assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
 			sh_iorestore(checkpoint.topfd, jmpval);
 		unblock(bp,type);
 		if(bflag)
-			unblock(bp,LOOKUPS);
+			unblock(bp,LOOKUP);
 		if(!vp->disc[type])
 			chktfree(np,vp);
 		*lexp = savelex;
@@ -377,16 +376,17 @@ done:
 /*
  * This function executes a lookup disc and then performs
  * the lookup on the given node <np>
+ * If <dp> is non-NULL, a numeric lookup is performed
  */
-static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
+static char*	lookup(Namval_t *np, Sfdouble_t *dp, Namfun_t *handle)
 {
 	register struct vardisc	*vp = (struct vardisc*)handle;
 	struct blocked		block, *bp = block_info(np, &block);
-	register Namval_t	*nq = vp->disc[type];
+	register Namval_t	*nq = vp->disc[LOOKUP];
 	register char		*cp=0;
 	Namval_t		node;
 	union Value		*up = np->nvalue.up;
-	if(nq && !isblocked(bp,type))
+	if(nq && !isblocked(bp,LOOKUP))
 	{
 		struct checkpt	checkpoint;
 		int		jmpval;
@@ -401,12 +401,12 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 			nv_onattr(SH_VALNOD,NV_NOFREE);
 			_nv_unset(SH_VALNOD,0);
 		}
-		if(type==LOOKUPN)
+		if(dp)
 		{
 			nv_onattr(SH_VALNOD,NV_DOUBLE|NV_INTEGER);
 			nv_setsize(SH_VALNOD,10);
 		}
-		block(bp,type);
+		block(bp,LOOKUP);
 		block(bp, UNASSIGN);   /* make sure nv_setdisc doesn't invalidate 'vp' by freeing it */
 		sh_pushcontext(&checkpoint, 1);
 		jmpval = sigsetjmp(checkpoint.buff, 0);
@@ -416,10 +416,10 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 		if(sh.topfd != checkpoint.topfd)
 			sh_iorestore(checkpoint.topfd, jmpval);
 		unblock(bp,UNASSIGN);
-		unblock(bp,type);
-		if(!vp->disc[type])
+		unblock(bp,LOOKUP);
+		if(!vp->disc[LOOKUP])
 			chktfree(np,vp);
-		if(type==LOOKUPN)
+		if(dp)
 		{
 			cp = (char*)(SH_VALNOD->nvalue.cp);
 			*dp = nv_getnum(SH_VALNOD);
@@ -439,10 +439,10 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 		np->nvalue.up = up;
 	if(!cp)
 	{
-		if(type==LOOKUPS)
-			cp = nv_getv(np,handle);
-		else
+		if(dp)
 			*dp = nv_getn(np,handle);
+		else
+			cp = nv_getv(np,handle);
 	}
 	if(bp== &block)
 		block_done(bp);
@@ -456,13 +456,13 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)

 static char*	lookups(Namval_t *np, Namfun_t *handle)
 {
-	return(lookup(np,LOOKUPS,(Sfdouble_t*)0,handle));
+	return(lookup(np,(Sfdouble_t*)0,handle));
 }

 static Sfdouble_t lookupn(Namval_t *np, Namfun_t *handle)
 {
 	Sfdouble_t	d;
-	lookup(np,LOOKUPN, &d ,handle);
+	lookup(np,&d,handle);
 	return(d);
 }

@@ -563,10 +563,11 @@ char *nv_setdisc(register Namval_t* np,register const char *event,Namval_t *acti
 	else if(action)
 	{
 		Namdisc_t *dp = (Namdisc_t*)vp->fun.disc;
-		if(type==LOOKUPS)
+		if(type==LOOKUP)
+		{
 			dp->getval = lookups;
-		else if(type==LOOKUPN)
 			dp->getnum = lookupn;
+		}
 		vp->disc[type] = action;
 	}
 	else

@hyenias
Copy link

hyenias commented Jul 16, 2022

Sure, I will check it out. I was unable to git apply the getn_removal patch above. So, I took the patch from 1.0 branch via https://github.com/ksh93/ksh/commit/42fc5c4c0da82b4427c34f2e87a76e0171e560b4.diff and successfully applied it to the dev branch.

@McDutchie
Copy link
Author

McDutchie commented Aug 20, 2024

The getn discipline was never removed after all, because doing so was causing regressions, IIRC (it's two years later and apparently I got sick of it and never posted).

There is only one known corner-case crash now anyway, this one. It is still reproducible, so I decided to take another look with fresh eyes and two more years of experience.

The problem seems to be that the global struct blocked *blist variable in nvdisc.c, which is the starting point for all the bp pointers, gets invalidated when the arithmetic subsystem throws the one: parameter not set error message and longjmps back to the prompt. And of course it gets invalidated then, because the struct blocked items in the list are all local/automatic variables in the assign() and lookup() functions, which no longer exist after a longjmp!

So, the fix should be simple: reset blist to NULL when the shell throws the error. This should be done in sh_exit() which is in fault.c, which cannot access the global variables of nvdisc.c. So the blist variable needs to move to the global sh state struct and become sh.blist. The following patch does this, and my testing says it fixes the crash.

WIth that, there are no more known crashing scenarios with the getn discipline, so I think this issue can be closed. Of course there is still the fundamental issue that it uses (long) double floating point values only, which cannot represent all possible integer values. But that is issue #771, not this one. Disciplines will have to be redesigned as part of solving that bug as well. :-(

Patch v1
diff --git a/src/cmd/ksh93/include/shell.h b/src/cmd/ksh93/include/shell.h
index 0792a0471..900ba02d8 100644
--- a/src/cmd/ksh93/include/shell.h
+++ b/src/cmd/ksh93/include/shell.h
@@ -389,6 +389,7 @@ struct Shell_s
 	/* nv_putsub() hack for nv_create() to avoid double arithmetic evaluation */
 	char		nv_putsub_already_called_sh_arith;
 	int		nv_putsub_idx;	/* saves array index obtained by nv_putsub() using sh_arith() */
+	void		*blist;		/* 'struct blocked' list for nvdisc.c */
 #if SHOPT_OPTIMIZE
 	char		**argaddr;	/* pointer to arguments for the loop invariants optimizer */
 	void		*optlist;	/* linked list of invariant nodes */
diff --git a/src/cmd/ksh93/sh/fault.c b/src/cmd/ksh93/sh/fault.c
index 4c5b2cbae..1c32ee7fb 100644
--- a/src/cmd/ksh93/sh/fault.c
+++ b/src/cmd/ksh93/sh/fault.c
@@ -607,6 +607,7 @@ void sh_exit(int xno)
 	sh.mktype = 0;
 	sh.invoc_local = 0;
 	sh.tilde_block = 0;
+	sh.blist = NULL;
 	if(job.in_critical)
 		job_unlock();
 	if(pp->mode == SH_JMPSCRIPT && !pp->prev) 
diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index d015b9c7b..be185b7f7 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -173,8 +173,6 @@ struct blocked
 	int		isub;
 };
 
-static struct blocked	*blist;
-
 #define isblocked(bp,type)	((bp)->flags & (1<<(type)))
 #define block(bp,type)		((bp)->flags |= (1<<(type)))
 #define unblock(bp,type)	((bp)->flags &= ~(1<<(type)))
@@ -189,7 +187,7 @@ static struct blocked *block_info(Namval_t *np, struct blocked *pp)
 	int		isub=0;
 	if(nv_isarray(np) && (isub=nv_aindex(np)) < 0)
 		sub = nv_associative(np,NULL,NV_ACURRENT);
-	for(bp=blist ; bp; bp=bp->next)
+	for(bp = sh.blist; bp; bp = bp->next)
 	{
 		if(bp->np==np && bp->sub==sub && bp->isub==isub)
 			return bp;
@@ -200,15 +198,15 @@ static struct blocked *block_info(Namval_t *np, struct blocked *pp)
 		pp->flags = 0;
 		pp->isub = isub;
 		pp->sub = sub;
-		pp->next = blist;
-		blist = pp;
+		pp->next = sh.blist;
+		sh.blist = pp;
 	}
 	return pp;
 }
 
 static void block_done(struct blocked *bp)
 {
-	blist = bp = bp->next;
+	sh.blist = bp = bp->next;
 	if(bp && (bp->isub>=0 || bp->sub))
 		nv_putsub(bp->np, bp->sub,(bp->isub<0?0:bp->isub)|ARRAY_SETSUB);
 }

@McDutchie
Copy link
Author

Patch v1 works, but is actually a kludge. Further research turned up that the problem is that nv_getv or nv_getn calls in lookup() may throw an error, as they do ("parameter not set") in the crash reproducer. When that happens, block_done is not getting called, which is the real problem.

It's fine for nv_get{v,n} to longjmp, but not until after restoring state in lookup(). This patch does that, and I think it's a better way to fix this crash.

Patch v2
diff --git a/src/cmd/ksh93/sh/nvdisc.c b/src/cmd/ksh93/sh/nvdisc.c
index d015b9c7b..f16ad9bba 100644
--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -434,13 +434,6 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 	}
 	if(nv_isarray(np))
 		np->nvalue.up = up;
-	if(!cp)
-	{
-		if(type==LOOKUPS)
-			cp = nv_getv(np,handle);
-		else
-			*dp = nv_getn(np,handle);
-	}
 	if(bp== &block)
 		block_done(bp);
 	if(nq && nq->nvalue.rp && nq->nvalue.rp->running==1)
@@ -451,6 +444,14 @@ static char*	lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
 	if(jmpval >= SH_JMPFUN)
 		siglongjmp(*sh.jmplist,jmpval);
 	sh_sigcheck();
+	/* nv_get{v,n} may throw an error and longjmp, so must come after restoring state */
+	if(!cp)
+	{
+		if(type==LOOKUPS)
+			cp = nv_getv(np,handle);
+		else
+			*dp = nv_getn(np,handle);
+	}
 	return cp;
 }
 

McDutchie added a commit that referenced this issue Aug 21, 2024
Reproducer: see the test added in tests/variables.sh.

Analysis: The global 'struct blocked *blist' variable in nvdisc.c,
which is the starting point for all the bp pointers, gets
invalidated when the arithmetic subsystem throws an error in the
reproducer. Which is obvious, because the struct blocked items in
the list are all local/automatic variables in the assign() and
lookup() functions, which no longer exist after a longjmp.

The root problem is that an error is thrown during the nv_getn()
call, so that block_done() is not getting called after the longjmp,
so the global blist variable is not reset, causing a probable crash
on any subsequent use of a discipline function.

src/cmd/ksh93/sh/nvdisc.c: lookup():
- Delay calling nv_getv or nv_getn until after restoring all
  state, including after calling block_done().

src/cmd/ksh93/sh.1:
- Finally add documentation for the getn discipline.

Resolves: #435
@dnewhall
Copy link

I just found this looking for documentation on getn for some Ksh documentation I am working on. Is this now at a state where it can be documented officially?

I'm glad this got fixed, because I was that one Ksh93 power user who has this in a couple scripts from back in the early '10s. I have no clue where I first learned about getn, however.

@McDutchie
Copy link
Author

Is this now at a state where it can be documented officially?

I documented it in the man page in e36e817.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO Things to be done before releasing
Projects
None yet
Development

No branches or pull requests

4 participants