journal

2024-10-13: FAT16 traversal phase 2
===================================

I finally committed the last chunk of FAT16 traversal code, this time for
processing directory entries. I implemented a trivial "dir" command that
just lists the entries in the root directory.

What this means is that I now have the building blocks I need to begin
pivoting toward making the ROM code just a code library and boot code.

So next I need to update the assembler so that I can generate executables
instead of ROM images, then write a loader routine that can load and
execute them.

Reviewing my notes from 2024-09-02 I think the Odyssey executable format can
be a bit simpler than what I proposed, because the directory entry already
lists the size of the file, down to the byte, which means it's possible to
allocate memory for executing the binary based on just that value. Then the
executable itself only needs to contain:

 * marker to indicate that it's actually an odyssey executable
 * flags, initially just containing memory allocation option in lowest two bits:
   0: malloc
   1: extmalloc 0xd000
   2: extmalloc 0xe000
   3: extmalloc 0xd000+0xe000
 * number of symbol rewrites (with each being a 16-bit word)

So the loader would use this process:

 1. Read the number of bytes
 2. Allocate temporary memory (512 bytes)
 3. Read the first sector
 4. Verify that this is an Odyssey executable
 5. Read memory allocation preference
 6. Free temporary memory
 7. Allocate memory
 8. Read entire executable into allocated memory
 9. Calculate the address of the first byte of the program $first
 10. Iterate through symbol rewrites; for each 16-bit value $offset:
    a. read 16-bit value $ptr at $offset
    b. add $first to $ptr
    c. write back to $offset
 11. Put $first into D
 12. CALL_D to jump into the program
 13. Free the allocated memory
 14. Return to caller

An example of an executable:

EXEOFFSET BINOFFSET DATA
0000-0002           "ODY"
0003                flags (1 byte)
0004-0005           0003  number of symbol rewrites (2 bytes)
0006-0007           000e  rewrite 0 binoffset
0008-0009           0013  rewrite 1 binoffset
000a-000b           0019  rewrite 2 binoffset
000c      0000      NOP   $first byte of program
000d      0001      JMP
000e-000f 0002-0003 000c  (.lab1)
0010      0004      NOP
0011      0005      NOP   <-- .lab2
0012      0006      JMP
0013-0014 0007-0008 000f  (.lab3)
0015      0009      NOP
0016      000a      NOP
0017      000b      NOP
0018      000c      JMP    <-- .lab1
0019-001a 000d-000e 0005  (.lab2)
001b      000f      RET    <-- .lab3

Let's assume this executable is loaded into extended memory at 0xe000. After
the loader rewrites symbols, it should look like this:

ADDRESS   EXEOFFSET BINOFFSET DATA
e000-e002 0000-0002           "ODY"
e003      0003                flags (1 byte)
e004-e005 0004-0005           0003  number of symbol rewrites (2 bytes)
e006-e007 0006-0007           000e  rewrite 0 binoffset
e008-e009 0008-0009           0013  rewrite 1 binoffset
e00a-e00b 000a-000b           0019  rewrite 2 binoffset
e00c      000c      0000      NOP   $first byte of program
e00d      000d      0001      JMP
e00e-e00f 000e-000f 0002-0003 e018  *** (.lab1; e000 [$base] + 000c [$first] + 000c [$ptr] = e018)
e010      0010      0004      NOP
e011      0011      0005      NOP   <-- .lab2
e012      0012      0006      JMP
e013-e014 0013-0014 0007-0008 e01b  *** (.lab3; e000 [$base] + 000c [$first] + 000f [$ptr] = e01b)
e015      0015      0009      NOP
e016      0016      000a      NOP
e017      0017      000b      NOP
e018      0018      000c      JMP    <-- .lab1
e019-e01a 0019-001a 000d-000e e011  *** (.lab2; e000 [$base] + 000c [$first] + 0005 [$ptr] = e011)
e01b      001b      000f      RET    <-- .lab3

Then $base+$first (e00c) is the CALL_D target


2024-09-22: FAT16 traversal phase 1
===================================

Good progress today! I was able to get the code size reduced enough
to be able to work on additional FAT16 computation code. This involved
rewriting heap.asm to be more compact (but slightly slower as it now uses
the stack and calls), and to remove the clock and console commands from
the shell.

I then added functions to fat16_calc.asm for converting cluster numbers
to LBA and vice-versa. I added shell functions for testing this out
and confirmed that I am able to traverse the filesystem:

    > mount

    "mounts" the FAT16 filesystem on ATA device 0 (the SD card)  "mounting"
    just means a 128-byte memory region is allocated and now contains the
    filesystem data (which is printed for convenience)

    The output shows that the root directory starts at LBA 0x0000007c.

    > ata-read 0 0 0x007c 0xd000

    Reads a 512-byte sector from ATA device 0, at LBA address 0x0000007c into
    memory at address 0xd000 - this is the first sector of the root directory.

    > hexdump 0xd000 +127

    Shows the first four directory entries of the root directory.  Each entry
    is 32 bytes (two rows).  The fourth entry is FOO.TXT and at offset 0x1a is
    the value 0x0005 (in little endian format) which is the FAT16 cluster
    number of the file.

    > cluster2lba 0x6180 0x0005

    Convert the cluster number to an LBA sector number (0x000000a8)

    > ata-read 0 0 0x00a8 0xd000

    Read sector 0x000000a8 into memory at 0xd000

    > hexdump 0xd000 +31

    Read the first 31 bytes of the sector out of memory, showing the file
    contents.

    > next_cluster 0x6180 5

    Reads the FAT and returns the value of the fifth cluster entry, in this
    case 0xffff meaning "end of file".  Reading other FAT entries works
    correctly, including entry 0 = 0xfff8, and entry 1 = 0xffff. Entries 2, 3,
    4 all are 0x0000 (free) and 5 is 0xffff (end of file).

The next thing to do is to implement a "dir" command that can print the
contents of the current directory. This isn't strictly necessary to move
on to the next steps, but it is a utility that will be handy to have in the
BIOS in the event that the system boots without being able to find a shell.

Finally I'll start working on a loader that can read a file from disk into
memory, resolve its address stuff, then call the first address.


2024-09-08: FAT16 mount stub
============================

I got the initial code done for mounting a FAT16 filesystem.  The
filesystem is "mounted" by reading the boot sector and transcribing
the filesytem data into a 128-byte data structure, which will then
be passed around to the various other FAT16 functions to perform
filesystem operations.  The data structure also contains the current
state of the filesytem, such as the current working directory.

Next I need to write the conversion utilities, at least cluster2sector
and sector2cluster. Then I can start working on the "dir" utility.

One challenge is that I'm almost out of ROM space.  I may have to
take a diversion and start stripping down the os_shell stuff
to get closer to the endgame of a basic BIOS that only has the code
to boot from the SD card, which means removing most of the shell
code and maybe even some library code that is only used by
certain applications.


2024-09-02: ATA read/write sectors
==================================

Finished up the ata.asm code for reading and writing arbitrary sectors.
Confirmed operation by writing some known content onto an SD card, and
showing that I could read it and the contents looked sane.  Then I used
`poke` to change the data, write it back, and then confirmed I could
read the updated data on my laptop.

Next step is to get the FAT16 mount operation working, which will involve
a series of global variables that store the various LBA offsets for
things like the FAT, root directory, etc. Finally a "dir" command that
can list the contents of a directory, and a "cd" command that can change
directories. And a "pwd" command to print the current directory, along
with updating the shell to display the current directory.

Once it is possible to navigate the filesystem, I need to update the
assembler so it can generate a symbol table, and use a symbol table
provided as input during assembly. I can then build the "BIOS" and flash
it to the ROM, then take the resulting symbol table to assemble other
binaries that will be writen to the SD card and executed from RAM.
To do this, I'll need to create an executable file format and a loader
that the shell/BIOS can call. The executable format will be something
like the following:

Offset Size Description
------ ---- -----------
0x0000  4   'ODEX' indicator that this is an Odyssey executable
0x0004  2   Size of binary, ignoring the first six bytes
...
last        last byte of binary
last+1  2   number of addresses to be updated
last+3  2   offset of address
last+5  2   offset of address
last+7  2   offset of address
...

So the loader will inspect the first sector of the executable and
ensure that it is an Odyssey executable.  If so, it is able to pull
the size of the binary and thus allocate sufficient RAM to load it.
It can then load the binary into RAM.  However, all the memory
addresses that refer to things inside the program, need to be updated
to use the new base address of the program.  So after the program
is a list of offsets that need to be updated.  The loader goes
to each of those offsets and adds the program's base address to
the value present at that address, thus making the address valid.

Once all that is done, the loader saves the CPU state and CALLs 
the executable in RAM.


2024-08-17: Clock/Reset repair and SD-adapter mounting
======================================================

Fixed up the CLK circuitry so that it buffers the output of the AND gate
that defines the 75% duty cycle through a D-FF.  Works like a champ, but
the clock signal doesn't look too much different.  There's still a weird
jitter at the midpoint of the rising and falling edges of the clock.
This work, plus replacing the inverter in column A with a proper schmitt
trigger inverter, fixed up the reset signaling too.

I then tackled the issue of unstable startup and random hangs.  I was
able to configure the logic analyzer to trigger on a HLT instruction
and observed that the ROM's /OE signal was flapping just before the HLT
instruction was triggered.  I traced this back to a burned out AND gate
on the video card board.  I replaced it with a 74AC08 and everything
looks to be super stable now.

And finally, I mounted the CF-card adapter to the acrylic backpanel, so
I'm set up now for doing the FAT16 implementation.


2024-08-11: ATA detection on boot
=================================

I finished writing the ATA detection code in lib/ata.asm, and so now when
the Odyssey boots up, it displays the connected ATA devices.  The master
and slave devices both work!  And I was able to successfully test both
the CF card adapter and the SD card adapter, simultaneously. The timeout
is working so that if no device is present, it times out after 500ms and
reports the error.

The next step is to write data read and write routines that take an LBA
address as input. These should be pretty quick as I already have the sector_read
function used for the identify commands.

However, the big glaring problem is with the RST and CLK signals. The
Odyssey can barely start from a cold boot, and pressing the RST key does
not reliably come out of reset.  I need to scrap the whole stack of circuitry
in the A-B columns and try again.  I don't think there is enough room for
all the chips I need, but maybe I can get clever and cram it all in.

I can take the original output from the VCO, and feed that into one D-FF
to cut the cycle time in half.  Then take the original VCO signal NAND
the half-time signal, to get the CLK output.  Feed that into a NAND inverter
to get /CLK.  RST can use the other side of the D-FF to latch its output.
And I can replace the hex inverter with a schmitt trigger variety again
so the RST signal is properly debounced.  If I tear out everything I've
got done now and start from scratch, I think I can fit that in.  Especially
if I move the reset button elsewhere to free up a few more pins.


2024-08-04: ATA first driver testing
====================================

The first test looks very good!  The CF-to-IDE adapter responds on the
ATA bus correctly, and the drive ID is read in and looks sane.

Some outstanding items for ATA:
 * The .ata_wait_data_request_ready function needs a timeout so that
   a drive that is not connected does not cause a hard lock and instead
   just returns an error.
 * I found this reference describing what the device ID data contains:
   https://github.com/ps2dev/ps2sdk/blob/master/common/include/atahw.h#L267
   I need to write a decoder that returns this data in a suitable format.
 * The ATA spec:
   https://people.freebsd.org/~imp/asiabsdcon2015/works/d2161r5-ATAATAPI_Command_Set_-_3.pdf
   is very clear about the byte ordering of the drive ID command's serial
   number, firmware, and description fields. I updated .ata_read_sector
   so that these fields appear correctly garbled, which I believe should
   mean that "real" data, e.g. FAT filesystem data, is in the right order.
 * After getting device ID out of the way, I need to create generic functions
   for reading and writing arbitrary LBA sectors, which should let me at least
   start probing a CF card's FAT filesystem to see if it looks like what I
   expect it to contain.

Before I was able to get the ATA port tested, I found problems with the
RST signal...the RST signal seems to get latched low and won't let go
sometimes. This is probably due to replacing the schmitt trigger inverter
used for debouncing the RST signal, as well as the fact that the CLK
signal coming out of the NAND gate way over in section F of the board,
is noisy as hell. Replacing the NAND gate with an AC part seems to have
helped clean up the signal a bit and is at least getting the board running
again. But at some point I need to go back and do something about clock
generation and distribution to clean it up. What is really weird is that
the two input signals to the NAND gate are pretty clean, but the output
is noisy. Maybe there's just too much fanout and I need to use a device
with higher drive current.


2024-07-21: ATA hardware complete
=================================

After a couple weeks of wiring up the ATA port, I was able to test it out
today and it looks like the signal timings are working as expected.  The
high register is working too.

Next step is to write some utility functions for performing ATA reads and
writes. Probably stacked, so a set of read/write functions for reading
or writing a single word (or a byte, for the control registers), and then
wrap those functions with commands for reading and writing an entire 512-byte
sector.  With that, I should be able to write an ATA identify utility.
I'd like to be able to support both the master and slave drive, so I'll
make sure the utilities can do that as well.

If all goes well, then I should be able to add ATA identification info
in the boot sequence, and can then start working on a FAT implementation.


2024-07-07: Hexdump command
===========================

Hexdump command is implemented.  It's not quite where I want it, but it's
good enough to peek at larger memory ranges without getting a headache.


2024-07-06: Musings on _SLOW opcodes
====================================

I've already bumped into a case where I need a _SLOW instruction that doesn't
exist, and I'm running out of address space to implement it.  So I'm thinking
that since the instructions are already implied to be "slow", why not just
make all the instructions read and write to the stack? That way I don't need
duplicate opcodes just for different register source/destination.  Using a
_SLOW instruction would imply having to first put the value to be written
onto the stack (or, the value being read is deposited onto the stack). This
further slows things down my making each operation two instructions, but
it saves a ton of opcode space and since the _SLOW instructions are only used
in a few select locations, the extra PUSH or POP instruction should not be
problematic.

I've made the necessary changes to the opcodes and the current os_shell
assembly, so that the UART instructions use the new push/pop based slow
instructions. I've also updated the peek and poke commands to use _SLOW
instructions.

I need to write a "hexdump" command. Last time I tried this I was unable to fit
everything on screen at once, but I think I can make it work if I leverage
the color display so that alternating bytes are different colors.

1                                                              64
+--------------------------------------------------------------+
ADDR HEX VALUES                          ASCII
    | 0 1 2 3  4 5 6 7  8 9 a b  c d e f|0123 4567 89ab cdef|
d000|xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx|.... .... .... ....|

Each row can display 16 bytes. The command line will accept a 16-bit
starting address and a 16-bit ending address.  Or the second
argument can be in the form of +<integer> which is how many bytes
to display from the starting address.

This is one of those cases where having a 64-color palette is going
to be handy, because I can use slightly different shades of colors on
alternating rows, as well as alternating the colors in the hex values
to differentiate them.

Address odd row: R3/G1/B0 even row: R3/G2/B0
Hex value odd row, even byte: R0/G2/B0 odd byte: R0/G2/B3
Hex value even row, even byte: R2/G2/B2 odd byte: R0/G3/B3


2024-06-08: UART flow control research
======================================

I replaced the crystal in the UART with a 14.7546MHz model and fixed up the
serial settings, and confirmed that all the baud rates from 1200 to 115200 now
work.  However, anything faster than 9600 results in lost characters when using
the console utility.  Slow input/output (like typing at the command line) works
fine, but as soon as a steady stream of text is being sent by the computer,
the Odyssey skips over characters and the output gets garbled.

I believe this to be because the UART receive buffer is overflowing, since there
is nothing in the code to prevent the write buffer pointer from overtaking the
read buffer pointer. Once that happens, 256 bytes of data just get ignored and
lost.

What I need, then, is some kind of flow control to tell the computer to stop
sending data when the buffer gets full.  I spent all afternoon trying to get
RTS/CTS flow control to work with gtkterm, and so far I have failed.

I did, however, confirm that toggling the /RTS bit in the modem control register
(MCR) correctly causes the computer side to see its CTS signal toggling, meaning
that a hardware flow control mechanism does appear to be wired up correctly.
The UART circuitry is wired up assuming the possibility of a "five wire" serial
interface, which uses the RTS/CTS hardware flow control mechanism.

But every time I configure gtkterm to enable RTS/CTS flow control, it de-asserts
its own /RTS signal, which causes the Odyssey to see a de-asserted /CTS signal,
and so the UART (correctly) refuses to send any data. Further, I can't get the
computer to send data to the Odyssey either, despite the Odyssey setting its
own /RTS bit.

The RS232 wiki page describes RTS/CTS (really RTR/CTS, as the "RTS" signal was
coopted by this scheme, which is why it seems weird for the UART to set "ready
to send" in order to tell the remote side to "start sending data").

  In this scheme, commonly called "RTS/CTS flow control" or "RTS/CTS
  handshaking" (though the technically correct name would be "RTR/CTS"), the
  DTE asserts RTS whenever it is ready to receive data from the DCE, and the
  DCE asserts CTS whenever it is ready to receive data from the DTE. Unlike the
  original use of RTS and CTS with half-duplex modems, these two signals
  operate independently from one another. This is an example of hardware flow
  control. However, "hardware flow control" in the description of the options
  available on an RS-232-equipped device does not always mean RTS/CTS
  handshaking.

I need to read through this more carefully (and perhaps try to find an example
of an actual transmit/receive example) to see where I'm implementing it
incorrectly.

The other problem might be near the end of that paragraph, where it talks
about the "original use" of CTS/RTS for half-duplex operations. It might be
that the 80C52 is sufficiently old that it can _only be used for half duplex
operation_ using RTS/CTS signals. Which is probably fine, but it means I
need to again figure out what that protocol looks like. Unfortunately it's
likely not the same as the RTS/CTS flow control used by gtkterm.

The other option might be to use software flow control (Xon/Xoff).  But this
has a distinct disadvantage in that there are then two ASCII codes that cannot
be transmitted over the wire, because they are interpreted as Xon or Xoff.
So I would really prefer to have RTS/CTS flow control operating.

I don't think it's going to be safe to just ignore it and not depend on some
kind of flow control, because regardless of the baud rate used, there's always
going to be the possibility of a race condition where more data is received
than the buffer can hold.  And once the receive buffer overflows, the only
options are to either make the sender stop sending data, or to stop writing
to the buffer, or to let the buffer overflow.  And the only option in that
list that doesn't cause data corruption or lost data is flow control of some
kind.


2024-06-02: Heavy interrupt handling crash debugging
====================================================

Now that the UART is working again, it's back to trying to find out why receiving
long streams of text over serial causes a crash.

Observations:
  * The crash appears to happen when a `RET` or `RETI` call returns to the wrong address
  * It can crash on boot as well, so both :uart_clear_dr and :uart_irq_dr_buf are affected

I captured a few cases of this occurring and saw a common pattern -- just
before an interrupt instruction would occur, there would be an undefined opcode
like 0xFE, then the 0xEE opcode would take over.

Looking at this event in timing mode I can see that this occurs when the IRQOP
signal is asserted too close to a rising clock edge -- this does not give enough
time for the data bus to settle and an invalid opcode gets loaded into the OPCODE
register.

To resolve this, the IRQOP signal needs to be latched. A transparent latch would
be ideal, and allow locking out a transition during the low clock period. But
given we actually have a spare D flip flop from the work fixing the UART, I can
just reroute the current IRQOP signal into that spare flip flop, and send the output
of it back to the original destination.

I did this, and it works!  The system is now completely stable running at 2.2MHz
over a serial connection.  No more random crashes, at least not while interacting
with the serial port. There are probably still some gremlins elsewhere in the
memory bus due to the high propagation time of control signals and the slow rise
and fall times for the data and address buses.

The next issue I am sorting out is the fact that only 9600 baud (and a few
slower ones) actually work.  After some research it looks like this is because
around 1.3% error in the baud rate is acceptable, and all but 9600 baud is
at 1.7% error rate, so the bits just aren't making it across.  I need a different
crystal to achieve <1% error rates.  The 82C52 has a maximum clock speed of
16MHz. After some poking around I found that a 14.7456MHz crystal will do the
trick:

{14745.6: {300: [(4, '768', 0.0)],
           600: [(3, '512', 0.0)],
           1200: [(1, '768', 0.0), (3, '256', 0.0), (4, '192', 0.0)],
           2400: [(3, '128', 0.0)],
           4800: [(1, '192', 0.0), (3, '64', 0.0)],
           9600: [(3, '32', 0.0)],
           19200: [(3, '16', 0.0)],
           38400: [(3, '8', 0.0)],
           57600: [(1, '16', 0.0), (3, '16/3', 0.0)],
           115200: [(1, '8', 0.0), (4, '2', 0.0)]}}

So I'll order one of these and swap it out to get get the rest of the baud
rates working.


2024-06-01: UART read/write timing fix (part 2)
===============================================

The circuitry updated on May 27 was tested, and unfortunately does not work.
This is because the ADDR_C1 (and thus, /CS0) signal is not stable across
SEQ increments, and neither is /WRAM, and neither are ADDR0/ADDR1.
The /WRAM signal can also flap (but not nearly as bad).  The DATA lines
can flap too, between seqences. So to create the proper "slow" signaling,
it's going to require adding some D flip-flops to gate the /CS0, /RD, /WR,
ADDR0, and ADDR1 signals.

The patchwork that ended up finally working was to bodge in a 74HC377 where
six of its eight flip flops could be accessed.  Five of these flip flops are
used to store ADDR0, ADDR1, ADDR_C1, /RDGEN, and /WRGEN. /RDGEN and /WRGEN
depend on the latched output of the ADDR_C1 flip-flop, and thus will always
be one clock behind (as expected).

There is still one more flip-flop available, which can be used to latch
ADDR2 if needed for some other slow device.  And there are still two more
flip-flops available, but they'd require soldering some wires to wrap pins.

All said, this succeeds at getting the serial console working again, with
the faster asymmetric clock.  And it eliminates the race condition between
the UART timing and the system clock.

However, much to my disappointment, processing long streams of text in
the console program still triggers crashes.  This is still the case even
with the CPU clocked as low as 1MHz. So it must be a coding bug (perhaps
something in the scrolling routines?)


2024-05-27: UART read/write timing fix
======================================

The problem boils down to the need for the data/address to be stable when
/RD or /WR go low, and must _stay_ stable when /RD or /WR go high. To accomplish
this, some "slow" LD (read) and ST/ALUOP_ADDR (write) opcodes will be created,
which include an extra setup step in sequence 0x4, then perform the desired
operation in sequence 0x5, and then continue holding the addr/data into
sequence 0x6 before moving to the next instruction.

This allows the /RD and /WR signals to be controlled by the sequence counter,
not the clock. The sequence counter's transition from 0b0100 to 0b0101 should
not cause any glitches due to just one bit having to transition, so the initial
/RD / /WR assertion will be well coordinated and take place very shortly after
the rising clock edge.  The transition from 0b0101 to 0b0110 could glitch, but
the only glitch state would be 0b0111, which still results in the desired
behavior of the signal being de-asserted.  Thus /RD or /WR would not be
re-asserted in a glitch at the end of sequence 0x5.

The logisim/uart_glue_logic.circ schematic shows how the 4-input NOR gate in
group C G-J can be used to generate the SEQ5 signal.  Two inverters will be
needed to generate the inputs for the 4-in NOR gate.  There is sufficient board
space in group D C-D for a hex inverter.

The existing quad NAND gate in group F E-F can be reused as it is already
generating the /RD and /WR signals.

F26-F27 gate needs F27 changed to the SEQ5 signal
F29-F30 gate can stay the same
E25-E26 gate can stay the same
E28-E29 gate can stay the same

The wiring was completed and the control ROMs updated.  New _SLOW opcodes
have been created in the 0xaN opcode range. The UART library has been
updated to use these opcodes.

However, when running the console command, it's not working, and the
system hangs when sending a character.  It's likely locked in a loop
waiting for the acknowledgement that the character was sent.

Need to do a few things next:
  1. Create `peek-slow` and `poke-slow` shell commands
  2. Update logic analyzer with new _SLOW opcodes in symbol table
  3. Probe the /RD and /WR signals on the UART to verify timings
  4. Check for lingering places where non-SLOW opcodes are used
     to talk to the UART - regular reads/writes won't work at all
     because /RD and /WR are now gated on the SEQ5 signal, not
     the clock.