We've just updated MediaWiki and its underlying software. If anything doesn't look or work quite right, please mention it to us. --RanAS

Bithacks: Difference between revisions

From SnesLab
Jump to: navigation, search
m (Went to reference this and realized... I know what I was talking about, but it was not incredibly clear.)
(I know that the overflow flag is supposed to signal this condition. Still feels like black magic tho.)
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]
Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the [[65c816]]. To that end, here is a collection of some useful tricks.
<br>
<br>
'''Note: cycle counts are intended to be a worst case measure.'''
'''Note: cycle counts are intended to be a worst case measure.'''
Line 8: Line 8:
= Math Bithacks =
= Math Bithacks =
== Signed Division By 2 ==
== Signed Division By 2 ==
''7 bytes / 8 cycles''
<br>
<u>inputs:</u> A
<br>
<u>outputs:</u> A
<pre>
CMP #$80
ROR
BPL +
ADC #$00
+
</pre>
note: Rounds toward zero.
== Arithmetic Shift Right ==
''3 bytes / 4 cycles''
''3 bytes / 4 cycles''
<br>
<br>
Line 17: Line 32:
ROR
ROR
</pre>
</pre>
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive
note: This is similar to division by 2, but rounds toward negative infinity.


== Signed Division By 2<sup>n</sup> ==
== Arithmetic Shift Right, multiple steps ==
''6+n bytes / 6+2n cycles''
''6+n bytes / 6+2n cycles''
<br>
<br>
Line 27: Line 42:
<pre>
<pre>
; signed division by two, n times
; signed division by two, n times
macro SignedDiv_2N(n)
macro ASR_multi(n)
LSR #<n>
LSR #<n>
BIT.b #$80>><n>
BIT.b #$80>><n>
Line 36: Line 51:


; -1 cycle and +n bytes, but must have N flag set before use
; -1 cycle and +n bytes, but must have N flag set before use
macro SignedDiv_2N(n)
macro ASR_multi(n)
BMI ?negative
BMI ?negative
LSR #<n>
LSR #<n>
Line 46: Line 61:
endmacro
endmacro
</pre>
</pre>
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive


== Absolute Value ==
== Absolute Value ==
Line 99: Line 113:
Extents:
Extents:
db -$23, $23
db -$23, $23
</pre>
== Sign Extend ==
''13 bytes / 18 cycles''
<br>
<u>inputs:</u> 8bit value in $10
<br>
<u>outputs:</u> A
<pre>
REP #$20
LDA $10-1 ; load $10 into A high, and garbage in low
AND #$FF00 ; discard garbage
BPL +
ORA #$00FF
+
XBA
</pre>
== Clamp Signed (To Constants) ==
''16 bytes/15 cycles''
<br>
<u>inputs:</u> A
<br>
<u>outputs:</u> A
<pre>
; clamp signed value in A to [min,max] if min/max are signed constants
macro clamp_const(min,max)
EOR #$80
CMP #$80^<min> : BCS ?+
LDA #$80^<min>
?+ CMP #$80^<max> : BCC ?+
LDA #$80^<max>
?+ EOR #$80
endmacro
</pre>
</pre>


= Misc. Tricks =
= Misc. Tricks =
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small>
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small>
== XCN ==
''12 bytes / 16 cycles''
<br>
<u>inputs:</u> A
<br>
<u>outputs:</u> A
<pre>
; eXchaNge Nibble without a LUT
ASL : ADC #$00
ASL : ADC #$00
ASL : ADC #$00
ASL : ADC #$00
</pre>


== Clear Low Byte of Accumulator ==
== Clear Low Byte of Accumulator ==
Line 169: Line 231:
<u>outputs:</u> (none)
<u>outputs:</u> (none)
<pre>
<pre>
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:
; If you need to skip just one byte of dead code (due to a hijack or whatever reason) you can use:
NOP ; 1 byte, 2 cycles
NOP ; 1 byte, 2 cycles


; But if you need to skip just 2 bytes the most efficient is:
; But if you need to skip two bytes the most efficient is:
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!
WDM ; 2 bytes, 2 cycles
WDM ; 2 bytes, 2 cycles
Line 202: Line 264:
.bit5_set:
.bit5_set:
</pre>
</pre>
== Combine Carry Flag ==
''4 bytes / 8 cycles''
<br>
<u>inputs:</u> (Flag, On Stack)
<br>
<u>outputs:</u> (Carry Flag)
<pre>
; flag on stack via PHP (8-Bit A if this), etc.
; code that alters Carry Flag
PLA : BCS +
LSR
+</pre>
== Transfer Carry Flag To Overflow Flag ==
''2 bytes / 2 cycles''
<br>
<u>inputs:</u> (Carry Flag)
<br>
<u>outputs:</u> (Overflow Flag)
<pre>
ADC #$7F ; $7FFF for 16-bit
</pre>
[[Category:ASM]]

Revision as of 09:07, 5 December 2023

Bithacks are optimization tricks that utilize information in bits and bit manipulation to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the fast inverse sqrt), and bit manipulation in general is harder on the 65c816. To that end, here is a collection of some useful tricks.
Note: cycle counts are intended to be a worst case measure.
See also: Useful Code Snippets

Math Bithacks

Signed Division By 2

7 bytes / 8 cycles
inputs: A
outputs: A

	CMP #$80
	ROR
	BPL +
	ADC #$00
	+

note: Rounds toward zero.

Arithmetic Shift Right

3 bytes / 4 cycles
inputs: A
outputs: A

	CMP #$80
	ROR

note: This is similar to division by 2, but rounds toward negative infinity.

Arithmetic Shift Right, multiple steps

6+n bytes / 6+2n cycles
inputs: A
outputs: A

; signed division by two, n times
macro ASR_multi(n)
	LSR #<n>
	BIT.b #$80>><n>
	BEQ ?positive
	ORA.b #$FF00>><n>    ; sign extension
?positive:
endmacro

; -1 cycle and +n bytes, but must have N flag set before use
macro ASR_multi(n)
	BMI ?negative
	LSR #<n>
	BRA ?end
?negative:
	LSR #<n>
	ORA.b #$FF00>><n>    ; sign extension
?end:
endmacro

Absolute Value

5 bytes / 6 cycles
inputs: A, (N Flag)
outputs: A

macro abs()
	BPL ?plus
	EOR #$FF
	INC
?plus:		; only 3 cycles if branch taken
endmacro

Absolute Value (SEC)

4 bytes / 4 cycles
inputs: A, (Carry Set)
outputs: A

; compared to the branching version this is 1 byte smaller
; it's either 2 cycles slower/faster depending on branch taken
	EOR #$7F
;	SEC		; the instant you add this in it becomes worse than the branching version
	SBC #$7F

Magnitude/Extents Check

~7 bytes / 12 cycles
inputs: A
outputs: (none)

; asks "Is [A] on the zero-side of value [X] or the far side?"
; good for magnitude checks, smaller *AND* faster than alternatives
; NOTE: in the event that it is exactly [X] it will have that value at branch
; doesn't need to be an indexed CMP but is most useful this way
; this can be used to combine the BPL and BMI checks for both signs into one
	SEC : SBC Extents,x
	EOR Extents,x
	BMI .zero_side
.far_side:
	; do things
.zero_side:
	; do things

Extents:
	db -$23, $23

Sign Extend

13 bytes / 18 cycles
inputs: 8bit value in $10
outputs: A

	REP #$20
	LDA $10-1 ; load $10 into A high, and garbage in low
	AND #$FF00 ; discard garbage
	BPL +
	ORA #$00FF
	+
	XBA

Clamp Signed (To Constants)

16 bytes/15 cycles
inputs: A
outputs: A

; clamp signed value in A to [min,max] if min/max are signed constants
macro clamp_const(min,max)
	EOR #$80
	CMP #$80^<min> : BCS ?+
	LDA #$80^<min>
?+	CMP #$80^<max> : BCC ?+
	LDA #$80^<max>
?+	EOR #$80
endmacro

Misc. Tricks

As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!

XCN

12 bytes / 16 cycles
inputs: A
outputs: A

; eXchaNge Nibble without a LUT
	ASL : ADC #$00
	ASL : ADC #$00
	ASL : ADC #$00
	ASL : ADC #$00

Clear Low Byte of Accumulator

1 byte / 2 cycles
inputs: (none)
outputs: A

; "Trashes" A but clears low byte
	TDC

Direction/Facing As Index

4 bytes / 6 cycles
inputs: A
outputs: A

; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.
; The input here is specifically a signed speed, or similar value.
	ASL
	ROL
	AND #$01

Check N Conditions True

n+7 bytes / 2n+7 cycles
inputs: A
outputs: A

; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.
; You can also test for "Less than N True", "More than N", etc. with variations.
; This is almost more a coding technique, but it's super helpful, so worth pointing out.
; It can allow you to re-arrange branches of code as independent blocks among other useful things.
; You can also use any RAM instead of A at a small cost.

; Example Test For 5 True Conditions:
!Next_Highest_Power_of_2 = $08
!N_True_Target = $05
	LDA #!Next_Highest_Power_of_2!-!N_True_Target-1		; here we set up our rounding, the -1 isn't strictly necessary *most* of the time
	%TestSomeCondition()
	BCC +	; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)
	INC
+
;	... repeat the above 5 times for different tests

N_True_Test:
	INC	; replace our -1 to bring us up to a full power of 2 if we had enough True
	AND #!Next_Highest_Power_of_2
	BEQ .false
.true:
	; N Tests were True
.false:
	; Not exactly N tests were true

Skip Dead Code

1-2 bytes / 2-3 cycles
inputs: (none)
outputs: (none)

; If you need to skip just one byte of dead code (due to a hijack or whatever reason) you can use:
	NOP		; 1 byte, 2 cycles

; But if you need to skip two bytes the most efficient is:
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!
	WDM		; 2 bytes, 2 cycles

; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended
; (The extra byte used for JMP in this case doesn't matter)
	BRA +		; 2 bytes, 3 cycles
	; dead code
+

Check 3 Conditions

2 bytes / 2 cycles
inputs: A
outputs: (none)

; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)
; it's worth noting that you can do up to 3 tests with a single opcode though!
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!
	BIT $00
	BMI .bit7_set
	BVS .bit6_set
	BNE .bit5_set	; assuming #$20 is in $00
.bit7_set:
.bit6_set:
.bit5_set:

Combine Carry Flag

4 bytes / 8 cycles
inputs: (Flag, On Stack)
outputs: (Carry Flag)

; flag on stack via PHP (8-Bit A if this), etc.
	; code that alters Carry Flag
	PLA : BCS +
	LSR
+

Transfer Carry Flag To Overflow Flag

2 bytes / 2 cycles
inputs: (Carry Flag)
outputs: (Overflow Flag)

	ADC #$7F	; $7FFF for 16-bit