Skip to content

PDP 11 Co Pro Notes

David Banks edited this page Jan 24, 2022 · 58 revisions

Introduction

This page documents various approaches to trying to compile a Pi Spigot written in C, such that it's runnable on one of the BBC Micro PDP-11 Co Processors, of which there are two. The Pi/ARM-based PiTubeDirect Co Pro and the FPGA-based Matchbox Co Pro. In addition, the B-em emulator also contains a PDP-11 Co Pro emulation, so maybe that counts as three!

The Pi Spigot is a short C program that prints the first 1000 digits of Pi:

#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}

For an explanation of how this works, see this discussion by Ben Lynn.

For it to produce correct results, it needs 32-bit arithmetic (longs in C on the PDP-11).

Here are links to the three chapters of the adventure:

Attempt 1 - Compiling on V7 Unix in SIMH

I followed these instructions to get V7 Unix running on SIMH.

Here's a sample session compiling and running a Pi Spigot:

$ cat > pi.c
#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}
$ cc pi.c
$ ls -l a.out
-rwxrwxr-x 1 dmr      5294 Sep 22 08:55 a.out
$ file a.out
a.out:  executable not stripped
$ nm -gn a.out
000000 T start
000074 T _main
000542 T _printf
000616 T __doprnt
001732 T pfloat
001732 T pgen
001732 T pscien
001744 T __strout
002244 T __flsbuf
002606 T _fflush
002730 T __cleanu
002766 T _fclose
003130 T _exit
003146 T _malloc
003640 T _free
003676 T _realloc
004150 T _isatty
004220 T _stty
004252 T _gtty
004304 T _close
004332 T _ioctl
004400 T _sbrk
004452 T _brk
004512 T _write
004554 T aldiv
005062 T almul
005136 T cerror
005154 T ldiv
005436 T lmul
005504 T lrem
005742 T csv
005756 T cret
006120 D __iob
006360 D __lastbu
006406 B __sobuf
007406 B __sibuf
010406 B _errno
010410 B _environ
010426 B _end
$ a.out
031410592605358097930238406264033830279502880419701693099370510508209074940459203078016400628602089098620803408253042110706709821048080651302823066407093084460955058202317025350940801284081110745002841027001938052110555096440622904895049300381906442088100975606593034460128407564082330786708316052710201909140564805669023460348061040543206648021330936007260024910412703724058700660063150588107488015200920906282092540917015360436708925090360110330503054088200466502138041460951904151016090433005727036507595091950309201861017380193206117093100511805480074460237909627049560735108857052720489102279038180301109491029830367303624040650664308600213904946039520247307190070210798609430702707053092170176209317067520384607481084670669405130200005681027140526305608027780577103427057780960901736037170872104684040900122409534030140654905853071050792027960892508923054200199506112012900219608640344018150981306297074770130909605018700721103499099990837209780049950105907317032810609603185095020445904553046900830206425022300825303446085030526109311088170101003103783087520886508753032080381402061071770669104730035980253409042087550468703115095620863808235037870593705195077810857708053021710226806610300109278076610119509092016420198

So this runs fine within Unix V7 on SIMH, but I'd actually like to run this on the PiTubeDirect PDP-11 Co Pro.

There are a number of problems:

  1. Executables on V7 Unix are compiled to run from address 0x0000 (000000) and are generally not position independant. The PDP-11 Co Pro has a table of vectors at address 0, so expects a program to run from 0x0100 (000400).

  2. The executable starts with a floating point instruction (setd) that isn't present on the Co Pro.

  3. Unix V7 uses TRAP instructions to trap to the Kernel, with call parameters mostly embedded in the code after the trap. The PDP-11 Co Pro uses EMT instructions (emulator TRAP), with call parameters passed in registers. Somewhat incompatible!

So let's try a slighlty more modern C compiler: PCC...

Attempt 2 - Compiling on Linux using PCC PDP-11 Cross Compiler

PCC (Portable C Compiler) is a C compiler that was written by Stephen C. Johnson of Bell Labs in the mid-1970s. A new (circa 2008) version of PCC is now maintained by Anders Magnusson. The website is here

The source was a CVS repository archive; I prefer working with git, so started by converting it:

sudo apt-get install cvs cvs2svn
cd ~/pdp11
wget http://pcc.ludd.ltu.se/ftp/pub/pcc/pcc-cvs-20220117.tgz
tar xf pcc-cvs-20220117.tgz
export CVSROOT=~/pdp11/pcc-cvs-20220117
cvs init
cvs2git --blobfile=git-blob.dat  --dumpfile=git-dump.dat --fallback-encoding=utf8 $CVSROOT
mkdir pcc.git
cd pcc.git/
git init --bare 
cat ../git-blob.dat ../git-dump.dat | git fast-import
cd ..
rm git-dump.dat git-blob.dat 
git clone pcc.git

Building PCC as a Cross Compiler:

See notes here and here.

Two main steps:

  1. Build binutils for the target
  2. Build PCC for the target

We already have pdp11-aout version of binutils, so we just did step two.

Configure PCC:

git checkout $(git log --pretty=oneline | grep 20211219 | cut -c1-8)
sudo apt-get install build-essential flex bison
cd pcc
sed -i 's/MANPAGE=@BINPREFIX@cpp/MANPAGE=@BINPREFIX@pcc-cpp/' cc/cpp/Makefile.in
sed -i 's/ cxxcom//' cc/Makefile.in 
./configure --target=pdp11-aout-bsd --prefix=/usr/local --libexecdir=/usr/local/libexec/pcc --with-assembler=pdp11-aout-as --with-linker=pdp11-aout-ld
make
sudo make install

Notes:

  1. The last commit where the PDP-11 target builds seems to be the one dated 20211219.
  2. The first sed patches the manual path to avoid a conflict with cpp on Ubuntu (this was documented)
  3. The second sed prevents the C++ compiler from being built as it's compatible with the PDP-11 target.

Running PCC:

cat > pi.c
//#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      //      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}
pdp11-bsd-pcc pi.c 
pdp11-aout-as: unrecognised option '-V'
error: pdp11-aout-as terminated with status 1

Seems like an incompatibility with the assembler...

The assembler command being generated is:

pdp11-aout-as -V -u -o /tmp/ctm.4kVtfh /tmp/ctm.iXWzuP

The -V and -u options appear to be specific to 2.11BSD: http://pdp11.nocrew.org/binutils/as-opt.html

So it looks like the pdp11 target needs to be hosted on BSD for it to work. I could continue to hack, but I expect this will be the tip of the iceberg.

Update 21/1/2022: It was indeed the tip of the iceberg...

So the specific case the GNU assembler is failing to handle is extended branch instructions, see section 8.5 of Dennis Ritchie's UNIX Assembler Reference Manual: https://www.tom-yam.or.jp/2238/ref/as.pdf#page=8 i.e. they are effectively synthetic instructions which are not currently handled by the GNU assembler.

So I did a quick SED hack to replace these by short branch instructions.

I then found GNU assembler falling to deal with embedded data, for example:

.data
.even
_pl:
~~pl:
35632 ; 145000
2765 ; 160400
230 ; 113200
17 ; 41100
1 ; 103240
0 ; 23420
0 ; 1750
0 ; 144
0 ; 12
0 ; 1
0 ; 0

And finally, it looks like the default base for constants is different.

For example, the start of the .s file produced by PCC includes:

_program:
~~program:
jsr r5,csv
sub $20,sp

where $20 here is an immediate octal constant (if it were decimal it would terminated by a decimal point ‘‘.’’)

If I assemble this .s file (using GNU assembler), and disassemble the result (using GNU obj-dump), what I see is:

0000010c <_program>:
 10c:   0977 0290       jsr r5, 3a0 <csv>
 110:   e5c6 0014       sub $24, sp

The value has now become 0x14, or 20 decimal or 24 octal.

According to the manual, GNU assembler is assuming constants are in decimal, unless they start with a '0' digit: https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_node/as_36.html

This is different to the old BSD Unix Assembler (see the earlier link). It's not just immediate operands; it affects all constants in the file. So it affects accessing objects in the stack frame.

For example, 4 successive words in the stack frame (-12 -14 -16 -20):

mov     -16(r5),-(sp)
mov     -20(r5),-(sp)
mov     -12(r5),-(sp)
mov     -14(r5),-(sp)

become:

 12e:   1d66 fff0       mov -20(r5), -(sp)
 132:   1d66 ffec       mov -24(r5), -(sp)
 136:   1d66 fff4       mov -14(r5), -(sp)
 13a:   1d66 fff2       mov -16(r5), -(sp)

Not good!

I gave up in dispair at this point and switched over to GCC.

Maybe I should try updating PCC to prefix octal values with a '0'...

...(some time later)...

That actually worked - the files I changed were:

        modified:   arch/pdp11/local.c
        modified:   arch/pdp11/local2.c

I used sed again:

$ sed -i 's/\([^0]\)%o/\10%o/g' arch/pdp11/local*.c
$ sed -i 's/\([^0]\)%llo/\10%llo/g' arch/pdp11/local*.c

This allowed me for the first time to run some some simple C code and not crash horribly.

For long maths functions, the compiler generates code that calls out to functions like ldiv/lrem.

I'm currently using the implementation of these from 10th Edition of Unix.

Unfortunately during the Pi Spigot the stack becomes unbalanced during function calls:

 1d6:   1066            mov r1, -(sp)
 1d8:   1026            mov r0, -(sp)
 1da:   1d66 ffec       mov -24(r5), -(sp)
 1de:   1d66 ffea       mov -26(r5), -(sp)
 1e2:   09f7 0134       jsr pc, 31a <ldiv>
 1e6:   65c6 000a       add $12, sp
 1ea:   1035 ffea       mov r0, -26(r5)
 1ee:   1075 ffec       mov r1, -24(r5)

The value being added at 1e6 to remove the four call arguments is too large (by 2).

My guess is this is a bug in PCC, but it could conceivably be an incompatibility with the libraries I am using.

More later...

Attempt 3 - Compiling on Linux using GCC PDP-11 Cross Compiler

There is still a PDP-11 target present in GCC, and it seems to have been actively maintained from 2004 to 2018 by Paul Koning, so I had high hopes of it working.

For more details, see my PDP-11 GCC Cross Compiler build notes.

The first major issue I encontered was a bug in the code generation when 32-bit longs are used.

For example:

      outhex32(*pp);

Which ends up as:

  386:    1d40 fffe          mov    -2(r5), r0
  38a:    1200               mov    (r0), r0
  38c:    1c01 0002          mov    2(r0), r1
  390:    1066               mov    r1, -(sp)
  392:    1026               mov    r0, -(sp)
  394:    09f7 ff50          jsr    pc, 2e8 <_outhex32>

The instructions at 38a and 38c are in the wrong order!

After lots of head scratching, it turns out the bug is in pdp11_expand_operands().

This function expands operands of 32-bit Standard Int (SI) type to pairs of operands of the 16-bit Half Int (HI) type. Part of the logic is to decide the order of the two 16-bit halves. And it looks like it doesn't consider the case where the destination register is also the source register. In this case, the order of the two instructions needs to be reversed.

The code to do this looks like:

      /* DMB - detect the case where source [1] is an indirect access via a register that
         is also used as the destination [0], and force little endian half-word order */
      if (GET_CODE (operands[0]) == REG && GET_CODE (operands[1]) == MEM) {
         int dstreg = REGNO (operands[0]);
         int srcreg = -1;
         if (GET_CODE (XEXP (operands[1], 0)) == REG) {
            srcreg = REGNO (XEXP (operands[1], 0));
         } else if (GET_CODE (XEXP (operands[1], 0)) == PLUS) {
            if (GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG) {
               srcreg = REGNO (XEXP (XEXP (operands[1], 0), 0));
            } else if (GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == REG) {
               srcreg = REGNO (XEXP (XEXP (operands[1], 0), 1));
            }
         }
         if (srcreg == dstreg) {
            useorder = little;
         }
      }

This code is rather scary, because operands are represened as small trees of rfx nodes.

So the above code is trying to match a particular pattern in the operand trees.

Operand[0] is the destination and needs to look like:

-->REG

Operand[1] is the source can needs to look like one of:

-->MEM-->REG

-->MEM-->PLUS-->REG
             -->Address

-->MEM-->PLUS-->Address
             -->REG

There are some macros which help processing these operands:

  • GET_CODE(rfx) returns the type of the rfx object
  • XEXP(rfx, n) follows the nth child of the rfx object
  • REGNO(rfx) return the register number of the rfx object (assuming it's a REG node)

Adding this into pdp11_expand_operands() fixed this particular code generation bug, but still the 32-bit division doesn't work.

After more debugging, the code that is failing is part of libgcc (the maths support library for gcc):

unsigned long
__udivmodsi4(unsigned long num, unsigned long den, int modwanted)
{
  unsigned long bit = 1;
  unsigned long res = 0;
  while (den < num && bit && !(den & (1L<<31)))
    {
      den <<=1;
      bit <<=1;
    }
  while (bit)
    {
      if (num >= den)
      {
        num -= den;
        res |= bit;
      }
      bit >>=1;
      den >>=1;
    }
  if (modwanted) return num;
  return res;
}

This code works fine when compiled for Linux, but fails when compiled for the PDP-11 target.

The specific thing that's behaving incorrectly is the evaluation of this test:

 !(den & (1L<<31))

GCC is (legitimately) mapping this to:

((signed long) den) >= 0

Which results in the following code (when the constant operand is zero)

 160:   0bc2            tst     r2
 162:   0201            bne     166 <_udivmodsi4+0x5a>
 164:   0bc3            tst     r3
 166:   04e2            bge     12c <_udivmodsi4+0x20>

Notes:

  • r2 is the high word of den
  • r3 is the low word of den
  • BGE branches if N xor V = 0, TST sets V=0 so this is effectively BPL, it also sets C=0)

There is an intuitive argument that this code is incorrect. When comparing against zero, the final value of the N flag should only depend on r2 (the high word). In the above code, when r2=0, then N = sign(r3), which is wrong.

This code is coming from the cmpsi template in pdp11.md

This template introduces a cmpsi(a,b) instruction that in the general case produces:

;; compare the high word
    cmp     ahi, bhi
    bne     done
;; compare the low word
    cmp     alo, blo
done:

However, if b is zero, then the CMP instructions are replaced by the TST instructions.

;; compare the high word
    tst     ahi
    bne     done
;; compare the low word
    tst     alo
done:

This works because the TST instruction on the PDP-11 sets the flags identically to CMP A,#0. This optimization is not the source of the bug.

For unsigned comparisons (BHI, BHIS, BL, BLOS), which test the C/Z bits, the above code works fine. If the high words are equal, the result is based on the comparison of the low word, which yields the correct values for C/Z

For signed comparisons (BGT, BGE, BLT, BLE), which test the N/V/Z bits, there is a problem with this impleemntation, as it doesn't correctly set N/V for the 32-bits as a whole.

As GCC is using this "cmpsi" instruction for both signed and unsigned 32-bit comparisons, this is a problem.

After more head scratching, I fixed it as follows:

;; compare the high word
    cmp     ahi, bhi
    bne     done        ;; A < B or A > B  ;; flags correct
;; compare the low word
    cmp     alo, blo
    beq     done        ;; A=B ;; Result=0 ;; N=0 Z=1 V=0 C=0
;; clear the V bit, as 32-bit overflow is impossible if ahi == bhi
    clv
;; copy the C flag to the N flag
    cln
    bcc     done
    sen
done:

And for the case of B=0, this simplifies to:

;; compare the high word
    tst     ahi
    bne     done        ;; A < 0 or A > 0  ;; flags correct
;; compare the low word
    tst     alo
    beq     done        ;; A=B ;; Result=0 ;; N=0 Z=1 V=0 C=0
    cln
done:

The change to pdp11.md is to add these extra instructions into the template for cmpsi:

  // Correct V/N flags so signed comparisons work
  output_asm_insn ("cln", NULL);
  if (!CONST_INT_P (exops[1][1]) || INTVAL (exops[1][1]) != 0) {
   output_asm_insn ("clv", NULL);
   output_asm_insn ("bcc\t%l0", lb);
   output_asm_insn ("sen", NULL);
  }

And with that in place, the Test program (for 32-bit div/mod) and the finally Pi Spigot work!

BTW, all this has taken me about a week....

The test programs and associated build scripts can be found here: https://github.com/hoglet67/pdp11pispigot

There is one remaining issue: when I enable optimization (-Os or -O2) the test program still passes, but the Pi Spigot generate incorrect results.

With -Os it generates: 0000000000000000....

With -O1, -O2, -O3 it generates: 3140343800000000....

I'm currently undecided about whether to upstream the GCC compiler fixes.

Useful links