65c816 for 6502 developers
This guide is meant to introduce the 65c816 to someone who is already familiar with the 6502. NES developers should be able to pick up the new things introduced by this processor pretty easily. You can view the 65c816 as a superset of the 6502. Most 6502 code will continue to work on it without changes (provided only official instructions are used), but this guide intends to highlight some differences that you should be aware of. It's not intended to be comprehensive and you should look at more complete 65c816 documentation later on, like maybe this.
16-bit registers!
The biggest, most obvious change from the 6502 is that it can do 16-bit operations. To enable this, the accumulator and index registers can change between being 8-bit and 16-bit. Sort of. I'll go into more detail in the following sections.
Two additional flags are added to the flags register to control register size. One flag ("M") controls the size of accumulator operations, as well as read-modify-write operations like INC
or ASL
directly on a memory location. Another flag ("X") controls the size of both the X and Y index registers.
Instead of getting dedicated instructions like SEC
/CLC
, register sizes are changed with the new REP
and SEP
instructions. They take an 8-bit immediate operand and REset or SEt bits in the flags register. You can set and reset multiple bits at once, you just have to OR them together. For example SEP #$30
sets M and X simultaneously. One potentially confusing thing is that M and X indicate bigger registers when clear rather than when set.
REP/SEP value | Flag | Purpose |
---|---|---|
#$01 | C | Carry |
#$02 | Z | Zero |
#$04 | I | Interrupt disable |
#$08 | D | Decimal mode |
#$10 | X | 8-bit index registers |
#$20 | M | 8-bit accumulator/memory |
#$40 | V | Overflow |
#$80 | N | Negative |
Personally I find it more friendly to change register sizes with macros, though you can certainly use REP
/SEP
directly if you wish (and may want to if you want to change register size at the same time as one or more other flags in the same instruction, for example putting carry into a known state for ADC
/SBC
). For the usual case though, macros let me avoid caring about what number corresponds to what flag, so I can just write seta8
, setxy16,
setaxy16
and more in my code. There are some ca65 versions of these macros in lorom-template.
One very important thing to consider with register sizes is that the size of an immediate operand changes too! This means that you need to let the assembler know if immediate values should be 8-bit or 16-bit. It seems like most assemblers make you specify the size of each immediate value, but ca65 works differently here. ca65 provides .a8
, .a16
, .i8
and .i16
directives that change the size of all immediate values that follow the directive. There's also an optional feature (named .smart) that allows REP
/SEP
to inform the assembler of the state of the registers. Using this feature helps, but there are still easily cases where you still need to use the directives, especially when register sizes are changed while branching.
I personally find it good practice to use directives that specify register size at the start of every subroutine, which both documents what sizes the code expects and ensures that they'll be correct regardless of whatever code you insert before the subroutine later.
Bigger accumulator
The accumulator is always 16-bit. When you resize the accumulator, you are actually specifying if you will use the lower 8 bits or the entire 16 when you perform operations using the accumulator. The upper byte always exists no matter what, and you can use XAB
to swap (or eXchange) the upper byte with the lower byte. In 8-bit mode this can act as a temporary storage spot, letting you load something else into the accumulator and get the old value back without using the stack.
This means that 16-bit math, 16-bit comparisons, and pointer operations can all be simplified a lot and use a lot fewer instructions. You can simply use a 16-bit INC
to step a pointer in memory forward.
Some transfer instructions always operate on the whole 16-bit accumulator regardless of its current size. These include any transfer instruction that refers to the accumulator as "C" as well as TAX
/TAY
when the accumulator is 8-bit but the index registers are 16-bit. In these cases you'll have to make sure that the upper 8 bits of the accumulator contains the data you want.
8-bit variables in 16-bit mode
You don't need the accumulator to be in 8-bit mode to work with 8-bit variables. You can do a 16-bit LDA
on an 8-bit value, then simply AND #$00FF
to clear the upper byte to make it zero instead of whatever the next byte was. I generally switch to 8-bit mode if I want to do any stores, but depending on the situation that may not be needed.
Bigger index registers
X and Y's size bit work differently from the accumulator's. Their 8-bit mode effectively acts as if the upper byte are forced to zero, so if they contain a 16-bit value and are resized to 8-bit and back, they will act as if they have been ANDed with $00FF. I have run into problems before where I made X and Y 8-bit to do something with Y and forgot to save and restore X which was modified by that resize.
16-bit index registers simplify accessing arrays that are bigger than 256 items, because you can continue using regular indexed addressing modes for them instead of needing pointers.
Index registers as pointers
Now that index registers can be 16-bit, they're big enough to act as a pointer themselves. One technique that becomes viable on the 65c816 is storing the base address of a structure in the X register. From here, you can use 0,x
to access the first byte of the structure, 1,x
for the second byte, etc. and they will be able to take advantage of the smaller zeropage instructions.
Another nice thing you can do with an index register treated as a pointer is actually make the (zeropage,x) addressing mode useful! If X points to a structure's base, you can access any pointers contained inside the structure easily. You could also actually put an array of pointers somewhere other than zeropage and use (zeropage,x) with it that way, or just use it for a double-indirect pointer.
Arrays of structures
On the 6502, parallel arrays (or "structure of arrays") are usually the most efficient way to lay out things like enemy state. However on the 65c816, it actually might be a good idea to consider an array of structures instead. Consider a structure that contains both 8-bit and 16-bit fields. On the 6502 the 16-bit values would just be spread across separate arrays, but here you would want to take advantage of being able to access a 16-bit number all at once, so you would probably prefer to have the two bytes sequential in memory. The biggest (maybe only?) downside is that now you need to do a TXA \ CLC \ ADC #Size \ TAX
sequence to iterate through the list, but it's probably worth it considering the other advantages.
Transfers between differently sized registers
What happens if you use TAX
/TAY
/TXA
/TYA
when the accumulator and index registers are different sizes?
- 8-bit A, copied to 16-bit X/Y - Both bytes of A are copied over despite A being in 8-bit mode. You must make sure that the upper byte of A contains the value you want.
- 8-bit X/Y, copied to 16-bit A - Low byte of A = low byte of X/Y. High byte of A is zero.
- 16-bit A, copied to 8-bit X/Y - Low byte of X/Y = low byte of A. High byte of X/Y is zero.
- 16-bit X/Y, copied to 8-bit A - Low byte of A = low byte of X/Y. High byte of A is unmodified.
Register size saving
One practice you may find useful is to start a routine with PHP
and end it with PLP
if you change the register size inside of it. This way, a caller will get the registers back in the same sizes they were before.
php ; Insert code here plp rts
If you want to save the register values as well, you should push them before you push the register state, so that the correct sizes are restored before they are pulled. Pushing a 16-bit accumulator and then pulling an 8-bit accumulator will probably lead to a crash.
pha phx phy php ; Insert code here plp ply plx pla rts
Small optimizations
There are a lot of little changes over the original 6502 that make life easier, and allow you to use fewer or smaller instructions. Most of these were introduced with the 65c02.
INA
- Increment accumulator.DEA
- Decrement accumulator.PHX
- Push X register.PHY
- Push Y register.PLX
- Pull X register.PLY
- Pull Y register.TXY
- Copy X register to Y register.TYX
- Copy Y register to X register.STZ
- Store zero. Can be indexed with X, and can be zeropage or absolute.BRA
- Unconditional branch. You no longer have to take advantage of a flag reliably being in a given state or do aJMP
in order to perform an unconditional branch.Indirect addressing
- Indexing is no longer mandatory on indirect addressing. For example you can write things likeLDA ($00)
.
Bit tests
The 65c816 provides a few more tools for bit tests. BIT
becomes much more useful because it can now be indexed, and you can use it with an immediate operand. With an immediate operand it acts identically to AND
without changing the accumulator.
TRB
and TSB
are two of my favorite new instructions. They take a zeropage or absolute address (non-indexed) and do a bit test, setting the zero flag as if an AND
had been performed between the accumulator and memory. Next, the memory address is changed, with TRB
clearing all bits that are set in the accumulator, and TSB
setting all bits that are set in the accumulator. TSB
can be a drop-in replacement for most places you would use ORA
followed by STA
on the same address. I've found it incredibly useful for piecing a value together from multiple parts, as well as just turning flags on and off in variables.
lda XPos sta Temp lda YPos asl asl asl asl tsb Temp
Jump tables
Jump tables on the 6502 require you to either push the address you want to jump to onto the stack, or store it to an address before using an indirect jump. On the 65c816, there are instructions that are specifically for jump tables. JMP (absolute,x)
and JSR (absolute,x)
both exist, though you'll still have to do it the old way if you need to preserve X or want to jump to a 24-bit address. One important thing to note is that RTI
expects a 24-bit address now, so that needs to be taken into account if you're using the RTS trick.
asl tax jmp (Table,x) Table: .addr Routine1 .addr Routine2 .addr Routine3
Banks and 24-bit addresses
Banks work very differently on the 65c816 than they would on a 6502 system like the NES. Different parts of memory are not normally swapped in and out of visibility, because the address space is now 24-bit, which provides a whole 16 megabytes.
24-bit program counter
To go with the bigger address space, the program counter is now 24-bit too, with a "bank" byte added to it. JMP
, JSR
, and RTS
still use 16-bit addresses, which keep the program counter within the current bank. There are now JML
, JSL
and RTL
instructions that jump to a full 24-bit address and change the program counter's bank. RTI
also now takes a 24-bit address
If you have a subroutine that you want to have callable from any bank, it needs to be called with JSL
rather than JSR
, and use RTL
rather than RTS
. The routine's choice of RTS
/RTL
must match with the instruction used to call it, or the return address will be wrong and you'll get a crash.
The data bank
Similar to the situation with JMP
, loads, stores and other data accesses with 16-bit addresses (but not zeropage ones) get extended out to 24-bit with a bank byte. In this case it's called the "data bank" register. You can only interact with it through the PHB
and PLB
instructions, so setting the data bank has to involve pushing the bank number to the stack. If you want the data bank to equal the program bank, you can use the PHK
instruction, which pushes the program bank. An example follows:
php ; Save register sizes phb ; Save original data bank phk ; Push the program counter's bank plb ; Store it to the data bank ; Insert code that changes the data bank and register sizes to something else plb ; Restore data bank plp ; Restore register sizes rtl
You need to keep the data bank in mind when calling code that's in another bank. If you JSL
somewhere, the data bank won't necessarily be correct for any lookup tables in the target code bank.
In ca65, in addition to < and > to fetch the bottom 8 bits or next 8 bits of a value/label, you can use ^ to get the bank byte. This can be used both for setting up 24-bit pointers and for setting the data bank to be correct for a specific label.
If you want to set the data bank to something other than the program bank, you can push a value with an 8-bit register, but the PEA
instruction is probably your best option. It takes a 16-bit immediate value and pushes it to the stack. It's kind of annoying because you have an instruction that pushes 16-bit values but none that push 8-bit values, so the best you can do is either PLB
twice or (better) push the next two values you intend on pulling. Following is a ca65 macro from lorom-template which makes this easier to work with:
;; ; Pushes two constant bytes in the order second, first ; to be pulled in the order first, second. .macro ph2b first, second .local first_, second_, arg first_ = first second_ = second arg = (first_ & $FF) | ((second_ & $FF) << 8) pea arg .endmacro
Addressing modes
LDA
, STA
, ADC
, SBC
, ORA
, AND
, EOR
, and CMP
get new addressing modes that provide access to 24-bit addresses, ignoring the data bank. What follows are ca65 syntax:
f:absolute
- 24-bit addressf:absolute, x
- 24-bit address with indexing[zeropage]
- 24-bit version of (zeropage)[zeropage],y
- 24-bit version of (zeropage) with indexing. [zeropage,x] does not exist.
If you need to access data from a bank different from the one you have set as the data bank, you'll have to plan out how you want to use the X and Y registers, given that only X can be used with far absolute addressing.
Variables on the stack
The 65c816 makes it more feasible to put function arguments or local variables on the stack with new addressing modes on LDA
, STA
, ADC
, SBC
, ORA
, AND
, EOR
, and CMP
as well as a new 16-bit stack pointer. The new addressing modes are stack,s
and (stack,s),y
which index an 8-bit address with the stack pointer. Remember that the stack pointer points to the next available slot, so 1,s will go to the most recently pushed byte, 2,s will go to the next recently pushed byte and so on.
If you want to work with values on the stack, you should be aware of the TSC
and TCS
instructions. With these you can easily copy the stack pointer into the accumulator, subtract for however many local variables you want to make room for, and copy back to the stack pointer.
Probably the biggest downside to using the stack like this is that only the above instructions work with it. LDX
, INC
and such don't have the addressing modes available.
Movable "zeropage"/direct page
The 65c816 allows you to move zeropage to anywhere in the first 64KB of the address space. As a result, it's usually renamed to the "direct page". You're provided the TDC
and TCD
instructions to copy the accumulator to/from the base of the direct page. Direct page does not even need to start on a page boundary, but if it doesn't then there is a cycle penalty on direct page instructions.
You could move the direct page to the start of a structure, but I personally wouldn't do this. I would actually generally leave the direct page at zero.
Decimal mode
This isn't new to the 65c816, but will be new to NES developers. The SNES has a functional decimal mode! You should consider using it for values that are mostly for displaying, like money amounts or the score. One thing to keep in mind is that decimal mode only applies to ADC
and SBC
, so increments and decrements must be done using those.
SNES-specific math
The SNES has multiplication and division I/O registers. You get unsigned 8-bit × 8-bit = 16-bit, unsigned 16-bit ÷ 8-bit = 16-bit, and a signed 16-bit × 8-bit = 24-bit multiplier that reuses hardware from Mode 7. The unsigned math functions have a delay before the results are valid, in which you must either find other work to do or just waste time, and the signed multiplier provides results immediately but is not usable while Mode 7 is in use.