We've just updated MediaWiki and its underlying software. If anything doesn't look or work quite right, please mention it to us. --RanAS

Bithacks: Difference between revisions

From SnesLab
Jump to: navigation, search
Line 15: Line 15:
</pre>
</pre>


'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''
<br>
<br>
<u>inputs:</u> A, n
<u>inputs:</u> A, n
Line 23: Line 23:
; signed division by two, n times
; signed division by two, n times
macro SignedDiv_2N(n)
macro SignedDiv_2N(n)
BMI ?negative
LSR #<n>
LSR #<n>
BRA ?end
BIT.b #$80>><n>
BEQ +
ORA.b #$FF00>><n>    ; sign extension
+
endmacro
 
; -1 cycle and +n bytes, but must have N flag set before use
macro SignedDiv_2N(n)
    BMI ?negative
    LSR #<n>
    BRA ?end
?negative:
?negative:
LSR #<n>
    LSR #<n>
ORA #($FF<<(8-<n>)) ; sign extension
ORA.b #$FF00>><n>   ; sign extension
?end:
?end:
endmacro
endmacro

Revision as of 12:56, 12 April 2021

Bithacks are optimization tricks that utilize information in bits and bit manipulation to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the fast inverse sqrt), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.
Note: cycle counts are intended to be a worst case measure.

Math Bithacks

Signed Division By 2 (3 bytes / 4 cycles)
inputs: A
outputs: A

	CMP #$80
	ROR

Signed Division By 2n (6+n bytes / 6+2n cycles)
inputs: A, n
outputs: A

; signed division by two, n times
macro SignedDiv_2N(n)
	LSR #<n>
	BIT.b #$80>><n>
	BEQ +
	ORA.b #$FF00>><n>    ; sign extension
+
endmacro

; -1 cycle and +n bytes, but must have N flag set before use
macro SignedDiv_2N(n)
    BMI ?negative
    LSR #<n>
    BRA ?end
?negative:
    LSR #<n>
	ORA.b #$FF00>><n>    ; sign extension
?end:
endmacro

Absolute Value (5 bytes / 6 cycles)
inputs: A, n
outputs: A

macro abs()
	BPL ?plus
	EOR #$FF
	INC
?plus:		; only 3 cycles if branch taken
endmacro

Absolute Value (SEC) (4 bytes / 4 cycles)
inputs: A, (Carry Set)
outputs: A

; compared to the branching version this is 1 byte smaller
; it's either 2 cycles slower/faster depending on branch taken
	EOR #$7F
;	SEC		; the instant you add this in it becomes worse than the branching version
	SBC #$7F

Magnitude/Extents Check (~7 bytes / 12 cycles)
inputs: A
outputs: (none)

; asks "Is [A] on the zero-side of value [X] or the far side?"
; good for magnitude checks, smaller *AND* faster than alternatives
; NOTE: in the event that it is exactly [X] it will have that value at branch
; doesn't need to be an indexed CMP but is most useful this way
; this can be used to combine the BPL and BMI checks for both signs into one
	SEC : SBC Extents,x
	EOR Extents,x
	BMI .zero_side
.far_side:
	; do things
.zero_side:
	; do things

Extents:
	db -$23, $23

Misc. Tricks

As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!

Direction/Facing As Index (4 bytes / 6 cycles)
inputs: A
outputs: A

; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.
	ASL
	ROL
	AND #$01

Skip Dead Code (1-2 bytes / 2-3 cycles)
inputs: (none)
outputs: (none)

; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:
	NOP		; 1 byte, 2 cycles

; But if you need to skip just 2 bytes the most efficient is:
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!
	WDM		; 2 bytes, 2 cycles

; Finally, if you need to skip a large amount of dead code you can use JMP instead
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)
	JMP +		; 3 bytes, 3 cycles
	; dead code
+

Check 3 Conditions (2 bytes / 2 cycles)
inputs: A
outputs: (none)

; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20
; it's worth noting that you can do up to 3 tests with a single opcode though!
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!
; this means you can check the input [A] on bit 6 with #$40 for example
	BIT #$20
	BMI .bit7_set
	BVS .bit6_set
	BNE .bit5_set
.bit7_set:
.bit6_set:
.bit5_set: