We've just updated MediaWiki and its underlying software. If anything doesn't look or work quite right, please mention it to us. --RanAS


From SnesLab
Revision as of 09:07, 5 December 2023 by Runic Rain (talk | contribs) (I know that the overflow flag is supposed to signal this condition. Still feels like black magic tho.)
Jump to: navigation, search

Bithacks are optimization tricks that utilize information in bits and bit manipulation to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the fast inverse sqrt), and bit manipulation in general is harder on the 65c816. To that end, here is a collection of some useful tricks.
Note: cycle counts are intended to be a worst case measure.
See also: Useful Code Snippets

Math Bithacks

Signed Division By 2

7 bytes / 8 cycles
inputs: A
outputs: A

	CMP #$80
	BPL +
	ADC #$00

note: Rounds toward zero.

Arithmetic Shift Right

3 bytes / 4 cycles
inputs: A
outputs: A

	CMP #$80

note: This is similar to division by 2, but rounds toward negative infinity.

Arithmetic Shift Right, multiple steps

6+n bytes / 6+2n cycles
inputs: A
outputs: A

; signed division by two, n times
macro ASR_multi(n)
	LSR #<n>
	BIT.b #$80>><n>
	BEQ ?positive
	ORA.b #$FF00>><n>    ; sign extension

; -1 cycle and +n bytes, but must have N flag set before use
macro ASR_multi(n)
	BMI ?negative
	LSR #<n>
	BRA ?end
	LSR #<n>
	ORA.b #$FF00>><n>    ; sign extension

Absolute Value

5 bytes / 6 cycles
inputs: A, (N Flag)
outputs: A

macro abs()
	BPL ?plus
?plus:		; only 3 cycles if branch taken

Absolute Value (SEC)

4 bytes / 4 cycles
inputs: A, (Carry Set)
outputs: A

; compared to the branching version this is 1 byte smaller
; it's either 2 cycles slower/faster depending on branch taken
	EOR #$7F
;	SEC		; the instant you add this in it becomes worse than the branching version
	SBC #$7F

Magnitude/Extents Check

~7 bytes / 12 cycles
inputs: A
outputs: (none)

; asks "Is [A] on the zero-side of value [X] or the far side?"
; good for magnitude checks, smaller *AND* faster than alternatives
; NOTE: in the event that it is exactly [X] it will have that value at branch
; doesn't need to be an indexed CMP but is most useful this way
; this can be used to combine the BPL and BMI checks for both signs into one
	SEC : SBC Extents,x
	EOR Extents,x
	BMI .zero_side
	; do things
	; do things

	db -$23, $23

Sign Extend

13 bytes / 18 cycles
inputs: 8bit value in $10
outputs: A

	REP #$20
	LDA $10-1 ; load $10 into A high, and garbage in low
	AND #$FF00 ; discard garbage
	BPL +
	ORA #$00FF

Clamp Signed (To Constants)

16 bytes/15 cycles
inputs: A
outputs: A

; clamp signed value in A to [min,max] if min/max are signed constants
macro clamp_const(min,max)
	EOR #$80
	CMP #$80^<min> : BCS ?+
	LDA #$80^<min>
?+	CMP #$80^<max> : BCC ?+
	LDA #$80^<max>
?+	EOR #$80

Misc. Tricks

As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!


12 bytes / 16 cycles
inputs: A
outputs: A

; eXchaNge Nibble without a LUT
	ASL : ADC #$00
	ASL : ADC #$00
	ASL : ADC #$00
	ASL : ADC #$00

Clear Low Byte of Accumulator

1 byte / 2 cycles
inputs: (none)
outputs: A

; "Trashes" A but clears low byte

Direction/Facing As Index

4 bytes / 6 cycles
inputs: A
outputs: A

; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.
; The input here is specifically a signed speed, or similar value.
	AND #$01

Check N Conditions True

n+7 bytes / 2n+7 cycles
inputs: A
outputs: A

; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.
; You can also test for "Less than N True", "More than N", etc. with variations.
; This is almost more a coding technique, but it's super helpful, so worth pointing out.
; It can allow you to re-arrange branches of code as independent blocks among other useful things.
; You can also use any RAM instead of A at a small cost.

; Example Test For 5 True Conditions:
!Next_Highest_Power_of_2 = $08
!N_True_Target = $05
	LDA #!Next_Highest_Power_of_2!-!N_True_Target-1		; here we set up our rounding, the -1 isn't strictly necessary *most* of the time
	BCC +	; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)
;	... repeat the above 5 times for different tests

	INC	; replace our -1 to bring us up to a full power of 2 if we had enough True
	AND #!Next_Highest_Power_of_2
	BEQ .false
	; N Tests were True
	; Not exactly N tests were true

Skip Dead Code

1-2 bytes / 2-3 cycles
inputs: (none)
outputs: (none)

; If you need to skip just one byte of dead code (due to a hijack or whatever reason) you can use:
	NOP		; 1 byte, 2 cycles

; But if you need to skip two bytes the most efficient is:
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!
	WDM		; 2 bytes, 2 cycles

; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended
; (The extra byte used for JMP in this case doesn't matter)
	BRA +		; 2 bytes, 3 cycles
	; dead code

Check 3 Conditions

2 bytes / 2 cycles
inputs: A
outputs: (none)

; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)
; it's worth noting that you can do up to 3 tests with a single opcode though!
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!
	BIT $00
	BMI .bit7_set
	BVS .bit6_set
	BNE .bit5_set	; assuming #$20 is in $00

Combine Carry Flag

4 bytes / 8 cycles
inputs: (Flag, On Stack)
outputs: (Carry Flag)

; flag on stack via PHP (8-Bit A if this), etc.
	; code that alters Carry Flag
	PLA : BCS +

Transfer Carry Flag To Overflow Flag

2 bytes / 2 cycles
inputs: (Carry Flag)
outputs: (Overflow Flag)

	ADC #$7F	; $7FFF for 16-bit