-
Notifications
You must be signed in to change notification settings - Fork 17
mc09 L2 Port status Nov 2016
These notes describe the state of NitrOS-9 Level 2 for mc09 as-of November 2016. They are of historic interest only (if that) because the port is now complete and working.
I deliberately made the mc09 mmu behave like the one on the COCO with the aim of allowing a Level 2 port.
Debug of NitrOS-9 is tricky for me because my emulator (exec09) does not support the CWAI instruction, which is used in the kernel. For the Level 1 port I did lots of debugging in the emulator to get the kernel up and running, then I had to debug on real hardware.
(Update: the Level 2 port activity included removing that restriction from exec09; both Level 1 and Level 2 can boot and run in exec09 now).
I started looking at a Level 2 port but did not make a lot of progress. The activity of writing NitrOS-9 was a number of years ago and there seems to be little or no active development. I posted a few questions on the coco mailing list but failed to get much technical engagement. The only other people who were interested in this port did not have the knowledge/skills/time to contribute. I kind of lost interest and started looking at FUZIX instead.
The one person who did some actual coding on this was Bill Nobel. He also had given me some help in fixing a bug in my SD driver that came from my limited understanding of the device driver data structures.
22Jan2016 Bill explained to me that the top address range in the MMU has a special behaviour (this is not covered in the NitrOS-9 documentation that I used to specify the behaviour of my design):
- The top 512 bytes of address space fe00-ffff are always ram and always fixed no matter what pages are mapped by the MMU (The Coco3 uses block $3f as this top page, but it could be any block)
(In addition, on mc09, ffd0-ffdf are always I/O no matter what pages are mapped by the MMU).
02Feb2016 I committed modified RTL to implement this additional MMU behaviour through a special register sequence. Once enabled it cannot be disabled except by reset - Bill said that the MMU is never disabled, so that restriction is fine. In the RTL, I named this mode Fixed RAM Top (FRT). The description of the MMU behaviour is attached here as APPENDIX 1 (it is simply cut from the header of the RTL) but, in summary:
- FRT is disabled by default.
- When enabled it will map the top 256 bytes of physical block 7 (the top of the block currently mapped to $E000-$FFFF) permanently to the top 256 bytes of the CPU address space.
- To enable FRT, do a byte store of $20 to address $FFDE - do this early in the track34 code. After this subsequent writes that you do to $FFDE should always have bits [7] and [5] set to 1.
- One of the LEDs is connected up to the FRT signal, for debug (it is LED_7). You should see the LED light when you do that byte store, and stay high forever.
24Jan2106 Bill wrote: "I have never disabled the MMU on any machine once I turn it on. I will have to enable [the FRT] in order for the RAM at that $ffe0-$ffff to get the initial vectors setup at the point REL kicks in and copies the kernel from $2600 to the top of RAM. Once setup, they very rarely get changed after KRNP2 is finished initializing."
APPENDIX 2 has some FORTH code that I wrote to demonstrated that FRT is behaving as I expect.
03Feb2016 Bill wrote: "Looks good so far. I have already started to create a stripped Level 2 make structure in the Level 2 repo (I based it off your Level 1 makefiles)"
"It’s looking good for the conversion also. I started to convert the routines in the vector page I will probably strip the GrfDrv task routine as We don’t need it (this gives me some room in the page). REL & Boot though gives me a lot of space back to the system space even though it ends up being padded out. Even keeping the Level 2 Boot screen debug system."
14Feb2016 Bill wrote: "I have finally got a mc09 makefile working for Level 2. I copied the coco3 make structure to the mc09, and made the changes needed. It has a lot of extra crap included in building the disk images, but it functions."
Bill: is there a specific reason why you split clock.asm out to a separate asm for the mc09?, vs using conditionals?"
Neal: When I was coding the L1 version it seemed to have little in common with the existing code so I thought it was cleaner to have a separate file. That might not be true for the L2 version in which case it might make sense to revisit the L1 decision. Let me know your opinion.
Bill: Splitting clock isn’t a problem, I was just curious.
Bill: I have REL converted, but I need to know if there is any pre-process I need to worry about coming from Camelforth. I am assuming that when Camelforth loads the Bookrack that it just releases execution to REL, no restrictions."
Neal: Your assumption is correct. Moving from CamelForth to NitrOS-9 is a one-way journey with no requirement to preserve any state.
26Feb2016 Bill wrote: "Update for the Level 2 upgrade - I have been moving along quite well until I discovered something tonight with your MMU. Nothing wrong just unexpected behaviour. I found the Level 2 memory size test in KRN doesn’t complete (just gets stuck in a endless loop). It relies on Ghosting of unused (non-existent) banks which you have stated in the MMU doc as undefined (unknown). Well it does not ghost the banks as expected. I will have to find another way to test memory size on the mc09, or hard code it to a specific size. One way or another I will find a way. I have it hard coded for my prototype at 512k right now for development purposes."
"Here is a quick video of the progress: [not attached]"
"The characters printed at the top left of screen start with the ‘R’ which means boot is in REL, next is K which is the KRN init routine, next the 'REL Boot KRN' is the module validation routine and addition of these modules to the module directory, from that point things go haywire, but I have progress. The vector page has been successfully updated to use the Coco3 style for interrupts (just not locked with FRT yet)"
Here is a memory sizing algorithm that I came up with but have not coded/tested:
- assumption: MMU is set up with "flat" 1-1 mapping
- pass 1: using one logical block, select each physical block in turn working down from 31 to 0. Write a unique value into (eg) location 0,1 in the block. For example, write the block number and inverse block number
- pass 2: using one logical block, select each physical block in turn working up from 0 to 31. Check for the unique value and stop when you fail to find it
The aliasing makes real memory (in low address space) repeat at high addresses. Doing pass1 high->low means that the aliased views are overwritten by the real views so that pass2 sees the correct value.
This is destructive but that can be solved by optimisation 1: assume that at least 64K is present and mapped in to logical address space. Just test block 31..8. Now the fact that these (unmapped) blocks have been messed with should not matter.
(Update: the memory sizing routine that I ended up implementing is much simpler than that; it assumes that there are only 2 legal memory sizes, corresponding to 1 or 2 memory chips fitted, and works out which of the two exist).
10Sep2016 Bill wrote: "I am having trouble packing the MMU task switching from Level 2 to fit into the Multicomp MMU. the original routine just uses a loop writing 8 bytes consecutively into the mmu. Your scheme is a little different, It needs a routine that writes 2 bytes. I am not saying I am defeated (I am a determined sob)"
"The problem is in KRN. The following routines sit at the very top of memory in the Vector page. Changing to the new MMU makes it too large to fit in that area. They are the main switching routines for the task registers and MMU. There are shadow registers in system direct page (variables that start with <D.xxx)" -- APPENDIX 3 has the code in question.
So, in conclusion, the current coding challenge is to rewrite the code in APPENDIX 3 so that it will control the Multicomp MMU and fit in the space available; that is what Bill got stuck on. He suggested previously that it's a problem that the MMU registers cannot be read back but I'm not sure whether that's really true: it seems to me that when you come to set up a mapping, you should not care what the old mapping was. In any case, I have not looked over that code in detail. I can change the programming interface to the MMU (and maybe even make it read/write) if that would solve the problem.. but what I cannot do is to expand the address space used by the MMU - it's stuck at 2 bytes.
I think that Bill has some other stuff coded (not sure how much, as I have not seen it). It seems that he is still interested in getting this working but (as with all of us) he has restricted time to spend on his "hobby".
-- OVERVIEW OF MMU OPERATION
-- =========================
--
-- The 6809 can address 64KBytes of memory directly, through a 16-bit address
-- bus. This will be referred to as the "logical address space". The MMU
-- considers the logical address space as 8, 8KByte blocks. Address bits
-- [15:13] identify a logical block number (0-7, 8 in total).
--
-- Up to 1MByte of RAM is supported, referred to as the "physical
-- address space" and needing 20-bit address bus. Address bits [19:13]
-- identify a physical block number (0-127, 128 in total).
--
-- Within the MMU, 6809 address lines [15:13] are used to index a
-- programmable look-up table. Each entry in the table holds a physical
-- block number, which is driven out as address lines/chip selects to RAM.
--
-- There are 16 entries in the table, arranged in two groups of 8. A register
-- bit "TR" is used to select which group is used. This allows software to
-- switch rapidly between two sets of mappings.
--
-- To program a table entry you first select the table entry using a write
-- to one register then select the physical physical block number for that
-- entry using a write to another register. These two operations can be
-- combined into a single 16-bit write.
--
-- Each physical block can be write-protected so that it acts like ROM.
--
-- Logical block 7 ($D000-FFFF) acts differently in three ways:
-- 1. The boot ROM sits in this block, overlaying any RAM that is mapped
-- there. The ROM is enabled after reset but can be disabled by a
-- register write.
-- 2. The multicomp I/O is decoded in this block, in address range
-- $FFD0-$FFDF. The I/O is always present. If you map ROM to this
-- block, accesses to ROM are ignored and I/O is accessed instead.
-- If you map RAM to this block, write accesses go to I/O and to
-- RAM (ie, the RAM locations at $FFD0-$FFDF are corrupted).
-- 3. When the "Fixed RAM Top" (FRT) is enabled, the address range
-- $FE00-FFCF, $FFE0-$FFFF are *always* mapped to physical RAM
-- block 7. This 256byte region is the "vector page" on the COCO
-- (interrupted here by the I/O space). This special mapping is
-- performed for both reads and writes. Furthermore, when this
-- mapping is enabled, I/O writes will corrupt the associated
-- locations in physical RAM block 7, regardless of what RAM block
-- is mapped into logical block 7.
--
-- At reset, the MMU is disabled (giving a 1-1 mapping) but the
-- mapping registers themselves are NOT reset.
--
-- MMU PROGRAMMING INTERFACE
-- =========================
--
-- The software interface is through 2 write-only registers that
-- occupy unused addresses in the SDCARD address space:
-- $FFDE MMUADR
-- $FFDF MMUDAT
--
-- MMUADR
-- b7 ROMDIS Disable ROM. 0 after reset.
-- b6 TR Select upper group of mapping registers.
-- b5 MMUEN Enable MMU. 0 after reset.
-- b4 NMI bit.
-- b3 } MAPSEL Select mapping register to
-- b2 } write through MMUDAT. MAPSEL values 0-7 control
-- b1 } the address translation when TR=0, MAPSEL values
-- b0 } 8-15 control the address translation when TR=1.
--
-- MMUDAT
-- b7 WRPROT When 1 the physical block is read-only
-- b6 } Physical block number associated with the logical
-- b5 } block selected by the current value of MAPSEL.
-- b4 }
-- b3 }
-- b2 }
-- b1 }
-- b0 }
--
-- Magic: for NitrosL2, want a fixed 512byte region of r/w memory
-- at the top of the address space. There is no space to provide
-- an enable for this behaviour (which I call FRT for FixedRamTop)
-- and so some special magic is used, as follows:
--
-- IF ROMDIS=1 & MMUEN=1 then a write with b4=0 (see NMI behaviour
-- below) and b7=0 and b5=1 does NOT enable the ROM but actually
-- sets FRT=1. Any write with MMUEN=0 sets FRT=0 again. In summary:
-- Current Action End State
-- -----------------+-------------+-----------------
-- ROMDIS MMUEn FRT ROMDIS MMUEn ROMDIS MMUEn FRT
-- x x x RESET 0 0 0
-- x x x 0 1 0 1 x
-- x x x 1 1 1 1 x
-- x x x x 0 x 0 0
-- 1 1 x 0 1 1 1 1
--
-- If you select a physical block that is outside the actual size
-- of your RAM, the behaviour is undefined (it will probably alias).
--
-- When MMUEN=0, logical block 0-7 selects physical block 0-7.
--
-- You can write MMUDAT, MMUADR as separate 8-bit stores or as a 16-bit
-- store.
--
-- The NMI bit should be set using an 8-bit store. On writes to
-- MMUADR with bit4=1, the state of the other data bits is ignored
-- (they do not change). The avoids the need to know the current
-- state of any of the other bits. The NMI bit is self-clearing and
-- generates an NMI edge after a specific delay. As part of a
-- carefully-controlled code sequence it can be used to interrupt
-- after execution of a single instruction (see SINGLE STEP, below)
--
-- Remember, these two registers are WRITE-ONLY!
\ test special MMU magic from FORTH
\ from the slash to end of line is a comment - no need to type it
HEX
\ enable MMU with 1-1 mapping
MMUMAP
\ copy FORTH ROM image to block 6 at address C000-DFFF
E000 C000 2000 MOVE
\ map block 6 to address E000 and block 7 to address C000
\ ! is "store" .. <address> <data> !
2706 FFDE ! \ 7 is the CPU block, 6 is the RAM block
2607 FFDE ! \ so this is block 7 going to C000
\ copy FORTH ROM image to block 7 at address C000-DFFF
E000 C000 2000 MOVE
\ branch through RAM, disable ROM, branch back through
\ reset vector (trust me on this one..)
PIVOTRST
\ should be back at the CAMELFORTH banner, but now
\ we are running from RAM.
\ we lost all state so..
HEX
\ should be identical copies. The 20's are unused space in the ROM
DF00 20 DUMP
FF00 20 DUMP
\ write and see it written: ie the destinations are unique
1234 DF00 !
5678 FF00 !
DF00 20 DUMP
FF00 20 DUMP
\ MMU is enabled and ROM is disabled so we are in the right
\ state to enable magic behaviour
20 FFDE C! \ C! is char-store -- ie, 8-bit
\ now, the top of memory should show data from page 7
DF00 20 DUMP \ should be as before
FF00 20 DUMP \ should show data from page 7 -- see the 1234 NOT the 5678
\ two bytes straddling the border
ABCD DDFF ! \ write to page 7
DDF0 20 DUMP \ page 7 -- see both bytes cross the border
FDF0 20 DUMP \ page 6 -- see the CD because it's in the top $200 bytes
\ and again
789A FDFF ! \ write to page 6
DDF0 20 DUMP \ page 7 -- see the 9A because it's in the top $200 bytes
FDF0 20 DUMP \ page 6 -- see both bytes cross the border
* The following routines must appear no earlier than $E00 when assembled, as
* they have to always be in the vector RAM page ($FE00-$FEFF)
* Default routine for D.SysIRQ
S.SysIRQ
lda <D.SSTskN Get current task's GIME task # (0 or 1)
beq FastIRQ Use super-fast version for system state
clr <D.SSTskN Clear out memory copy (task 0)
jsr [>D.SvcIRQ] (Normally routine in Clock calling D.Poll)
inc <D.SSTskN Save task # for system state
lda #1 Task 1
ora <D.TINIT Merge task bit's into Shadow version
sta <D.TINIT Update shadow
sta >DAT.Task Save to GIME as well & return
bra DoneIRQ Check for error and exit
FastIRQ jsr [>D.SvcIRQ] (Normally routine in Clock calling D.Poll)
DoneIRQ bcc L0E28 No error on IRQ, exit
IFNE H6309
oim #IntMasks,0,s Setup RTI to shut interrupts off again
ELSE
lda ,s
ora #IntMasks
sta ,s
ENDC
L0E28 rti
* return from a system call
L0E29 clra Force System task # to 0 (non-GRDRV)
L0E2B ldx <D.SysPrc Get system process dsc. ptr
lbsr TstImg check image, and F$SetTsk (PRESERVES A)
orcc #IntMasks Shut interrupts off
sta <D.SSTskN Save task # for system state
beq Fst2 If task 0, skip subroutine
ora <D.TINIT Merge task bit's into Shadow version
sta <D.TINIT Update shadow
sta >DAT.Task Save to GIME as well & return
Fst2 leas ,u Stack ptr=U & return
rti
* Switch to new process, X=Process descriptor pointer, U=Stack pointer
L0E4C equ *
IFNE H6309
oim #$01,<D.TINIT switch GIME shadow to user state
lda <D.TINIT
ELSE
lda <D.TINIT
ora #$01
sta <D.TINIT
ENDC
sta >DAT.Task save it to GIME
leas ,y point to new stack
tstb is the stack at SWISTACK?
bne MyRTI no, we're doing a system-state rti
IFNE H6309
ldf #R$Size E=0 from call to L0E8D before
ldu #Where+SWIStack point to the stack
tfm u+,y+ move the stack from top of memory to user memory
ELSE
ldb #R$Size
ldu #Where+SWIStack point to the stack
RtiLoop lda ,u+
sta ,y+
decb
bne RtiLoop
ENDC
MyRTI rti return from IRQ
* Execute routine in task 1 pointed to by U
* comes from user requested SWI vectors
L0E5E equ *
IFNE H6309
oim #$01,<D.TINIT switch GIME shadow to user state
ldb <D.TINIT
ELSE
ldb <D.TINIT
orb #$01
stb <D.TINIT
ENDC
stb >DAT.Task
jmp ,u
* Flip to task 1 (used by GRF/WINDInt to switch to GRFDRV) (pointed to
* by <D.Flip1). All regs are already preserved on stack for the RTI
S.Flip1 ldb #2 get Task image entry numberx2 for Grfdrv (task 1)
bsr L0E8D copy over the DAT image
IFNE H6309
oim #$01,<D.TINIT
lda <D.TINIT get copy of GIME Task side
ELSE
lda <D.TINIT
ora #$01
sta <D.TINIT
ENDC
sta >DAT.Task save it to GIME register
inc <D.SSTskN increment system state task number
rti return
* Setup MMU in task 1, B=Task # to swap to, shifted left 1 bit
L0E8D cmpb <D.Task1N are we going back to the same task
beq L0EA3 without the DAT image changing?
stb <D.Task1N nope, save current task in map type 1
ldx #$FFA8 get MMU start register for process's
ldu <D.TskIPt get task image pointer table
ldu b,u get address of DAT image
L0E93 leau 1,u point to actual MMU block
IFNE H6309
lde #4 get # banks/2 for task
ELSE
lda #4
pshs a
ENDC
L0E9B lda ,u++ get a bank
ldb ,u++ and next one
std ,x++ Save it to MMU
IFNE H6309
dece done?
ELSE
dec ,s
ENDC
bne L0E9B no, keep going
IFEQ H6309
leas 1,s
ENDC
L0EA3 rts return
* Execute FIRQ vector (called from $FEF4)
FIRQVCT ldx #D.FIRQ get DP offset of vector
bra L0EB8 go execute it
* Execute IRQ vector (called from $FEF7)
IRQVCT orcc #IntMasks disasble IRQ's
ldx #D.IRQ get DP offset of vector
* Execute interrupt vector, B=DP Vector offset
L0EB8 clra (faster than CLR >$xxxx)
sta >DAT.Task Force to Task 0 (system state)
IFNE H6309
tfr 0,dp setup DP
ELSE
tfr a,dp
ENDC
MapGrf equ *
IFNE H6309
aim #$FE,<D.TINIT switch GIME shadow to system state
lda <D.TINIT set GIME again just in case timer is used
ELSE
lda <D.TINIT
anda #$FE
sta <D.TINIT
ENDC
MapT0 sta >DAT.Task
jmp [,x] execute it
* Execute SWI3 vector (called from $FEEE)
SWI3VCT orcc #IntMasks disable IRQ's
ldx #D.SWI3 get DP offset of vector
bra SWICall go execute it
* Execute SWI2 vector (called from $FEF1)
SWI2VCT orcc #IntMasks disasble IRQ's
ldx #D.SWI2 get DP offset of vector
* This routine is called from an SWI, SWI2, or SWI3
* saves 1 cycle on system-system calls
* saves about 200 cycles (calls to I.LDABX and L029E) on grfdrv-system,
* or user-system calls.
SWICall ldb [R$PC,s] get callcode of the system call
* NOTE: Alan DeKok claims that this is BAD. It crashed Colin McKay's
* CoCo 3. Instead, we should do a clra/sta >DAT.Task.
* clr >DAT.Task go to map type 1
clra
sta >DAT.Task
* set DP to zero
IFNE H6309
tfr 0,dp
ELSE
tfr a,dp
ENDC
* These lines add a total of 81 addition cycles to each SWI(2,3) call,
* and 36 bytes+12 for R$Size in the constant page at $FExx
* It takes no more time for a SWI(2,3) from system state than previously,
* ... and adds 14 cycles to each SWI(2,3) call from grfdrv... not a problem.
* For processes that re-vector SWI, SWI3, it adds 81 cycles. BUT SWI(3)
* CANNOT be vectored to L0EBF cause the user SWI service routine has been
* changed
lda <D.TINIT get map type flag
bita #$01 check it without changing it
* Change to LBEQ R.SysSvc to avoid JMP [,X]
* and add R.SysSvc STA >DAT.Task ???
beq MapT0 in map 0: restore hardware and do system service
tst <D.SSTskN get system state 0,1
bne MapGrf if in grfdrv, go to map 0 and do system service
* the preceding few lines are necessary, as all SWI's still pass thru
* here before being vectored to the system service routine... which
* doesn't copy the stack from user state.
sta >DAT.Task go to map type X again to get user's stack
* a byte less, a cycle more than ldy #$FEED-R$Size, or ldy #$F000+SWIStack
leay <SWIStack,pc where to put the register stack: to $FEDF
tfr s,u get a copy of where the stack is
IFNE H6309
ldw #R$Size get the size of the stack
tfm u+,y+ move the stack to the top of memory
ELSE
pshs b
ldb #R$Size
Looper lda ,u+
sta ,y+
decb
bne Looper
puls b
ENDC
bra L0EB8 and go from map type 1 to map type 0
* Execute SWI vector (called from $FEFA)
SWIVCT ldx #D.SWI get DP offset of vector
bra SWICall go execute it
* Execute NMI vector (called from $FEFD)
NMIVCT ldx #D.NMI get DP offset of vector
bra L0EB8 go execute it
* The end of the kernel module is here
emod
eom equ *
* What follows after the kernel module is the register stack, starting
* at $FEDD (6309) or $FEDF (6809). This register stack area is used by
* the kernel to save the caller's registers in the $FEXX area of memory
* because it doesn't* get "switched out" no matter the contents of the
* MMU registers.
SWIStack
fcc /REGISTER STACK/ same # bytes as R$Size for 6809
IFNE H6309
fcc /63/ if 6309, add two more spaces
ENDC
fcb $55 D.ErrRst
* This list of addresses ends up at $FEEE after the kernel track is loaded
* into memory. All interrupts come through the 6809 vectors at $FFF0-$FFFE
* and get directed to here. From here, the BRA takes CPU control to the
* various handlers in the kernel.
bra SWI3VCT SWI3 vector comes here
nop
bra SWI2VCT SWI2 vector comes here
nop
bra FIRQVCT FIRQ vector comes here
nop
bra IRQVCT IRQ vector comes here
nop
bra SWIVCT SWI vector comes here
nop
bra NMIVCT NMI vector comes here
nop