https://sneslab.net/mw/api.php?action=feedcontributions&user=Runic+Rain&feedformat=atomSnesLab - User contributions [en]2024-03-19T02:25:46ZUser contributionsMediaWiki 1.39.5https://sneslab.net/mw/index.php?title=Direct_Page&diff=14492Direct Page2024-01-12T04:47:06Z<p>Runic Rain: </p>
<hr />
<div>The '''Direct Page''' is much like the [[zeropage]] on the 6502, but can be moved around to anywhere within the first 64K [[bank]].<br />
It is technically, as its namesake, the 256 bytes accessible via '''Direct Page Addressing''', and has special handling of wrapping behavior.<br />
Colloquial usage of this term will invariably refer to the '''Direct Page Register''' itself and by extension the group of Direct addressing modes it affects, due to the other addressing modes not being limited to 256 bytes.<br />
The wrapping behavior will ''always'' confine it to Bank 0, but if in emulation may also confine it to a single addressing page.<br />
<br />
On the [[SPC700]], the direct page can only be in one of two places: either coincident with page zero or page one.<br />
<br />
=== See Also ===<br />
* [[PHD]]<br />
* [[PLD]]<br />
* [[TCD]]<br />
* [[TDC]]<br />
* [[Direct Page Flag]]<br />
* [[Direct Page Register]]<br />
* [[Direct Page Addressing]]<br />
* [[Uppermost Page]]<br />
<br />
=== Reference ===<br />
* [[Eyes & Lichty]] page 198, https://archive.org/details/0893037893ProgrammingThe65816/page/n224<br />
* [[6502 Reference]] 5.1.1, http://www.6502.org/tutorials/65c816opcodes.html#5.1.1<br />
* [[6502 Reference]] 5.1.2, http://www.6502.org/tutorials/65c816opcodes.html#5.1.2<br />
[[Category:ASM]]<br />
[[Category:65c816 additions]]</div>Runic Rainhttps://sneslab.net/mw/index.php?title=Direct_Page&diff=14491Direct Page2024-01-12T04:45:33Z<p>Runic Rain: </p>
<hr />
<div>The '''Direct Page''' is much like the [[zeropage]] on the 6502, but can be moved around to anywhere within the first 64K [[bank]].<br />
It is technically, as its namesake, the 256 bytes accessible via Direct Page Addressing, and has special handling of wrapping behavior.<br />
Colloquial usage of this term will invariably refer to the '''Direct Page Register''' itself and by extension the group of Direct addressing modes it affects, due to the other addressing modes not being limited to 256 bytes.<br />
The wrapping behavior will ''always'' confine it to Bank 0, but if in emulation may also confine it to a single addressing page.<br />
<br />
On the [[SPC700]], the direct page can only be in one of two places: either coincident with page zero or page one.<br />
<br />
=== See Also ===<br />
* [[PHD]]<br />
* [[PLD]]<br />
* [[TCD]]<br />
* [[TDC]]<br />
* [[Direct Page Flag]]<br />
* [[Direct Page Register]]<br />
* [[Direct Page Addressing]]<br />
* [[Uppermost Page]]<br />
<br />
=== Reference ===<br />
* [[Eyes & Lichty]] page 198, https://archive.org/details/0893037893ProgrammingThe65816/page/n224<br />
* [[6502 Reference]] 5.1.1, http://www.6502.org/tutorials/65c816opcodes.html#5.1.1<br />
* [[6502 Reference]] 5.1.2, http://www.6502.org/tutorials/65c816opcodes.html#5.1.2<br />
[[Category:ASM]]<br />
[[Category:65c816 additions]]</div>Runic Rainhttps://sneslab.net/mw/index.php?title=Direct_Page&diff=14490Direct Page2024-01-12T04:44:23Z<p>Runic Rain: added wrapping and common usage</p>
<hr />
<div>The '''Direct Page''' is much like the [[zeropage]] on the 6502, but can be moved around to anywhere within the first 64K [[bank]].<br />
It is technically, as its namesake, the 256 bytes accessible via Direct Page Addressing, and has special handling of wrapping behavior.<br />
Colloquial usage of this term will invariably refer to the Direct Page Register itself and by extension the group of Direct addressing modes it affects, due to the other addressing modes not being limited to 256 bytes.<br />
The wrapping behavior will *always* confine it to Bank 0, but if in emulation may also confine it to a single addressing page.<br />
<br />
On the [[SPC700]], the direct page can only be in one of two places: either coincident with page zero or page one.<br />
<br />
=== See Also ===<br />
* [[PHD]]<br />
* [[PLD]]<br />
* [[TCD]]<br />
* [[TDC]]<br />
* [[Direct Page Flag]]<br />
* [[Direct Page Register]]<br />
* [[Direct Page Addressing]]<br />
* [[Uppermost Page]]<br />
<br />
=== Reference ===<br />
* [[Eyes & Lichty]] page 198, https://archive.org/details/0893037893ProgrammingThe65816/page/n224<br />
* [[6502 Reference]] 5.1.1, http://www.6502.org/tutorials/65c816opcodes.html#5.1.1<br />
* [[6502 Reference]] 5.1.2, http://www.6502.org/tutorials/65c816opcodes.html#5.1.2<br />
[[Category:ASM]]<br />
[[Category:65c816 additions]]</div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=11862Bithacks2023-12-05T09:07:53Z<p>Runic Rain: I know that the overflow flag is supposed to signal this condition. Still feels like black magic tho.</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the [[65c816]]. To that end, here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''7 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
BPL +<br />
ADC #$00<br />
+<br />
</pre><br />
note: Rounds toward zero.<br />
<br />
== Arithmetic Shift Right ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is similar to division by 2, but rounds toward negative infinity.<br />
<br />
== Arithmetic Shift Right, multiple steps ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro ASR_multi(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro ASR_multi(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Sign Extend ==<br />
''13 bytes / 18 cycles''<br />
<br><br />
<u>inputs:</u> 8bit value in $10<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
REP #$20<br />
LDA $10-1 ; load $10 into A high, and garbage in low<br />
AND #$FF00 ; discard garbage<br />
BPL +<br />
ORA #$00FF<br />
+<br />
XBA<br />
</pre><br />
<br />
== Clamp Signed (To Constants) ==<br />
''16 bytes/15 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; clamp signed value in A to [min,max] if min/max are signed constants<br />
macro clamp_const(min,max)<br />
EOR #$80<br />
CMP #$80^<min> : BCS ?+<br />
LDA #$80^<min><br />
?+ CMP #$80^<max> : BCC ?+<br />
LDA #$80^<max><br />
?+ EOR #$80<br />
endmacro<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== XCN ==<br />
''12 bytes / 16 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; eXchaNge Nibble without a LUT<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
</pre><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
; The input here is specifically a signed speed, or similar value.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip just one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip two bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre><br />
<br />
== Combine Carry Flag ==<br />
''4 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> (Flag, On Stack)<br />
<br><br />
<u>outputs:</u> (Carry Flag)<br />
<pre><br />
; flag on stack via PHP (8-Bit A if this), etc.<br />
; code that alters Carry Flag<br />
PLA : BCS +<br />
LSR<br />
+</pre><br />
<br />
== Transfer Carry Flag To Overflow Flag ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> (Carry Flag)<br />
<br><br />
<u>outputs:</u> (Overflow Flag)<br />
<pre><br />
ADC #$7F ; $7FFF for 16-bit<br />
</pre><br />
<br />
[[Category:ASM]]</div>Runic Rainhttps://sneslab.net/mw/index.php?title=David_Whittaker_Sound_Engine&diff=9767David Whittaker Sound Engine2023-08-23T04:08:33Z<p>Runic Rain: Added P</p>
<hr />
<div>__TOC__<br />
<br />
'''David Whittaker Sound Engine''' is a sound driver for the SPC700 programmed by David Whittaker.<br />
<br />
Many games have multiple different builds stored in the game at once, sometimes with code differences. Almost every single game has at least one unique build not shared by any of the other games: the exceptions are...<br />
* The opening logos for ''Lemmings 2: The Tribes'' and ''Apocalypse II''<br />
* ''Kick Off 3'' Beta's Title Screen and ''Lawnmower Man/Virtual Wars''' US Beta<br />
* ''Lemmings 2: The Tribes''' most common build and ''Riddick Bowe Boxing'''s Japanese version<br />
<br />
The raw build sorting notes can be found [[David Whittaker Sound Engine/Build Sorting|here]].<br />
<br />
These are the games where the sound engine was used:<br />
{| class="wikitable sortable"<br />
|-<br />
! Game Name !! Version !! VCMD Code Location !! ROM Offset<br />
|-<br />
| ''Dream TV'' || 0.0 (Beta 2)<br>2.1 (all other versions) || <tt>0x098A</tt> (Beta 2)<br><tt>0x0614</tt> (all other versions) || <tt>0x0196A2</tt> (Beta 2, uncompressed)<br><tt>0x039B16</tt> (all other versions, ''RNC compressed'')<br />
|-<br />
| ''Krusty's Super Fun House'' || 1.0 || <tt>0x04F1</tt> || <tt>0x0E8000</tt> (all versions)<br />
|-<br />
| ''Kick Off/Super Kick Off'' || 1.0 || <tt>0x04E6</tt> || <tt>0x098000</tt> (all versions)<br />
|-<br />
| ''Batman: Revenge of the Joker'' || 2.0 || <tt>0x0610</tt> || <tt>0x068000</tt><br />
|-<br />
| ''Super SWIV/Firepower 2000'' || 2.2 || <tt>0x06AD</tt> || <tt>0x158000</tt><br />
|-<br />
| ''Gods'' || 2.3 || <tt>0x0500</tt>/<tt>0x0600</tt>/<tt>0x0612</tt> || <tt>0x980000</tt>/<tt>0x98AFAC</tt>/<tt>0x99D448</tt>/<tt>0x9A8000</tt> (all versions, three unique build variants, ''RNC compressed'')<br />
|-<br />
| ''World Class Rugby'' || 2.3 || <tt>0x0608</tt> || <tt>0x0B8000</tt> (all versions)<br />
|-<br />
| ''Battle Cars'' || 3.0 || <tt>0x0717</tt> || <tt>0x0A8000</tt><br />
|-<br />
| ''Chavez/Riddick Bowe Boxing'' || 3.0 (US version)<br>3.1 (all other versions) || <tt>0x0621</tt> (Japanese version)<br><tt>0x067C</tt> (US version)<br><tt>0x067D</tt> (Chavez) || <tt>0x0D8000</tt> (all versions)/<tt>0x0E8000</tt> (all versions except ''Chavez'')<br>(three unique build variants, one per version between ''Chavez'', US & Japanese versions)<br />
|-<br />
| ''Lawnmower Man/Virtual Wars'' || 3.1 || <tt>0x06BD</tt> (US beta)<br><tt>0x06A4</tt> (all other versions) || <tt>0x9C8000</tt> (all versions except US beta)/<tt>0x9E8000</tt> (all versions)<br />
|-<br />
| ''Apocalypse II'' || 3.1 || <tt>0x05AF</tt>/<tt>0x0624</tt> || <tt>0x118000</tt>/<tt>0x168000</tt><br />
|-<br />
| ''Elite Soccer/World Cup Striker'' || 3.1 || <tt>0x0601</tt> (US and European version In-Game)<br><tt>0x060B</tt> (US and Japanese version Title Screen)<br><tt>0x0704</tt> (Japanese version In-Game)<br><tt>0x070B</tt> (European version Title Screen) || <tt>0x1B8000</tt> (all versions)/<tt>0x08D500</tt> (US & Japanese version)/<tt>0x08D550</tt> (European beta version)/<tt>0x08D640</tt> (European version)<br>(Four unique build variants: two Title Screen (US/JP compared to EU) and two In-Game (US/EU compare to JP))<br />
|-<br />
| ''Kick Off 3'' (Beta version only) || 3.1 || <tt>0x0658</tt>/<tt>0x06BD</tt> || <tt>0x1A8000</tt>/<tt>0x1D8000</tt><br />
|-<br />
| ''Lemmings 2: The Tribes'' || 3.1 || <tt>0x05AF</tt>/<tt>0x060F</tt>/<tt>0x0614</tt>/<tt>0x0621</tt>/<tt>0x0697</tt> || ''(omitted for now: there are 16 copies of the code in the ROM, with five different build variants being present: four of them are used in one instance each, that being the opening logo, the Title Screen, the Beach Tribe and the Cavelem Tribe, and the others use a single common copy of the code with modifications of constants and non-code pointers)''<br />
|-<br />
| ''Porky Pig's Haunted Holiday'' (Beta version only) || 3.1 || <tt>0x06CA</tt> || ''(omitted for now: there are 36 copies of the code in the ROM, none of which are unique except for the sound data and modifications of constants and non-code pointers)''<br />
|-<br />
| ''Shaq Fu'' || 4.0a || <tt>0x0539</tt> || <tt>0xD96408</tt> (all versions)<br />
|-<br />
| ''Michael Jordan: Chaos in the Windy City'' || 4.0b || <tt>0x0539</tt> || <tt>0xD2B18F</tt> (all versions)<br />
|}<br />
<br />
==Communication with the SNES==<br />
Each command is triggered by sending the parameters to $2141-$2143 first, then the command ID to $2140. The latches are cleared afterwards on all registers, which means the same command IDs can be sent again, provided the parameters are present.<br />
<br />
===Output to the SNES===<br />
====V0.0====<br />
There is no acknowledgement to the SNES beyond the latches being cleared.<br><br />
====V1.0====<br />
On initialization, initially $2140-$2141 are sent a zero while the sound driver sets itself up. When initialization finishes, a <tt>$55</tt> is sent to $2140-$2141.<br><br />
After the first command ID is sent, $2140-$2142 are updated every time a command ID is sent, and $2143 is updated every timer 0 tick.<br><br />
When a command ID is sent, the latches are cleared, allowing the SNES to send the same command ID again. However, this also means that the parameters have to be resent if the same parameters are to be used again.<br />
<pre>xx yy zz %00000abc</pre><br />
* <tt>xx</tt> is the command ID that was just sent.<br />
* <tt>yy</tt> is a parameter that was sent to $2141 to execute the command.<br />
* <tt>zz</tt> is the command ID that was just sent.<br />
* <tt>%a</tt> indicates that a piece of SFX is playing on channel 7.<br />
* <tt>%b</tt> indicates that a piece of SFX is playing on channel 8.<br />
* <tt>%c</tt> indicates that music is playing.<br />
<br />
====V2.0-V4.0====<br />
On initialization, initially $2140-$2141 are sent a zero while the sound driver sets itself up. When initialization finishes, a <tt>$55</tt> is sent to $2140-$2141, and the internal command counter is initialized to the same value.<br><br />
After the first command ID is sent, $2140 is updated every time a command ID is sent, and $2141-$2143 are updated every timer 0 tick.<br><br />
When a command ID is sent, the latches are cleared, allowing the SNES to send the same command ID again. However, this also means that the parameters have to be resent if the same parameters are to be used again.<br />
<br />
=====V2.0-V4.0a=====<br />
<pre>xx %yyyyyyyy zz %aaaaaaaa</pre><br />
* <tt>xx</tt> is a counter that is incremented every time a command is received.<br />
* <tt>%yyyyyyyy</tt> is a set of bits for each channel that is playing a note for music. ''Not used in V4.0a.''<br />
* <tt>zz</tt> indicates whether or not music is playing. It is <tt>$FF</tt> when music is playing, and zero otherwise.<br />
* <tt>%aaaaaaaa</tt> is a set of bits for each channel that is playing SFX. These bits are cleared only when the SFX has elapsed all of its ticks, not when the sample is done playing if non-looping. ''Not used in V4.0a.''<br />
<br />
=====V4.0b=====<br />
<pre>xx yy ?? ??</pre><br />
* <tt>xx</tt> is a counter that is incremented every time a command is received.<br />
* <tt>yy</tt> indicates whether or not music is playing. It is <tt>$FF</tt> when music is playing, and zero otherwise.<br />
<br />
===Command IDs (V0.0-V1.0)===<br />
{| class="wikitable sortable"<br />
|-<br />
! Command ID !! Description !! Register Values & Arguments !! Minimum Version<br />
|-<br />
|<tt>$00</tt> || NOP || <tt>$00 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$01</tt> || Play Music || <tt>$01 xx ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$02</tt> || Pause Music || <tt>$02 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$03</tt> || Continue Music || <tt>$03 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$04</tt> || Play SFX (Channel 7) || <tt>$04 xx yz ??</tt> || 0.0<br />
|-<br />
|<tt>$05</tt> || Play SFX (Channel 8) || <tt>$05 xx yz ??</tt> || 0.0<br />
|-<br />
|<tt>$06</tt> || Fast Forward On || <tt>$06 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$07</tt> || Fast Forward Off || <tt>$07 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$08</tt> || Main Volume || <tt>$08 xx ?? ??</tt> || 1.0<br />
|-<br />
|<tt>$09-$FF</tt> || Latch Clear || <tt>$09-$FF ?? ?? ??</tt> || 0.0<br />
|}<br />
<br />
===Command IDs (V2.0-V4.0a)===<br />
{| class="wikitable sortable"<br />
|-<br />
! Command ID !! Description !! Register Values & Arguments !! Minimum Version<br />
|-<br />
|<tt>$00</tt> || NOP || <tt>$00 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$01</tt> || Play Music || <tt>$01 xx ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$02</tt> || Pause Music || <tt>$02 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$03</tt> || Continue Music || <tt>$03 ?? ?? ??</tt> || 0.0<br />
|-<br />
|<tt>$04</tt> || Play SFX (Channel 1) || <tt>$04 xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$05</tt> || Play SFX (Channel 2) || <tt>$05 xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$06</tt> || Play SFX (Channel 3) || <tt>$06 xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$07</tt> || Play SFX (Channel 4) || <tt>$07 xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$08</tt> || Play SFX (Channel 5) || <tt>$08 xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$09</tt> || Play SFX (Channel 6) || <tt>$09 xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$0A</tt> || Play SFX (Channel 7) || <tt>$0A xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$0B</tt> || Play SFX (Channel 8) || <tt>$0B xx yz ??</tt> || 2.0<br />
|-<br />
|<tt>$0C</tt> || Fast Forward On || <tt>$0C ?? ?? ??</tt> || 2.0<br />
|-<br />
|<tt>$0D</tt> || Fast Forward Off || <tt>$0D ?? ?? ??</tt> || 2.0<br />
|-<br />
|<tt>$0E</tt> || Main & Echo Volume || <tt>$0E xx ?? ??</tt> || 2.0<br />
|-<br />
|<tt>$0F</tt> || Echo Feedback || <tt>$0F xx ?? ??</tt> || 2.0<br />
|-<br />
|<tt>$10</tt> || Go to IPL Boot Program || <tt>$10 ?? ?? ??</tt> || 2.0<br />
|-<br />
|<tt>$11</tt> || Set Tempo || <tt>$11 xx ?? ??</tt> || 2.1<br />
|-<br />
|<tt>$12</tt> || Set SFX Volume (Channel 1) || <tt>$12 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$13</tt> || Set SFX Volume (Channel 2) || <tt>$13 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$14</tt> || Set SFX Volume (Channel 3) || <tt>$14 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$15</tt> || Set SFX Volume (Channel 4) || <tt>$15 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$16</tt> || Set SFX Volume (Channel 5) || <tt>$16 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$17</tt> || Set SFX Volume (Channel 6) || <tt>$17 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$18</tt> || Set SFX Volume (Channel 7) || <tt>$18 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$19</tt> || Set SFX Volume (Channel 8) || <tt>$19 ?? xy ??</tt> || 3.0<br />
|-<br />
|<tt>$1A</tt> || Set SFX Pitch (Channel 1) || <tt>$1A xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$1B</tt> || Set SFX Pitch (Channel 2) || <tt>$1B xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$1C</tt> || Set SFX Pitch (Channel 3) || <tt>$1C xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$1D</tt> || Set SFX Pitch (Channel 4) || <tt>$1D xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$1E</tt> || Set SFX Pitch (Channel 5) || <tt>$1E xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$1F</tt> || Set SFX Pitch (Channel 6) || <tt>$1F xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$20</tt> || Set SFX Pitch (Channel 7) || <tt>$20 xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$21</tt> || Set SFX Pitch (Channel 8) || <tt>$21 xx yy ??</tt> || 3.0<br />
|-<br />
|<tt>$22-$23</tt> || Load New Data || <tt>$22-$23 xx yy ??</tt> || 4.0a<br />
|-<br />
|<tt>$24-$FF</tt> || Latch Clear/Increment Counter || <tt>$24-$FF ?? ?? ??</tt> || 0.0<br />
|}<br />
<br />
===Command IDs (V4.0b)===<br />
{| class="wikitable sortable"<br />
|-<br />
! Command ID !! Description !! Register Values & Arguments<br />
|-<br />
|<tt>$00</tt> || NOP || <tt>$00 ?? ?? ??</tt><br />
|-<br />
|<tt>$01</tt> || Play Music || <tt>$01 xx ?? ??</tt><br />
|-<br />
|<tt>$02</tt> || Pause Music || <tt>$02 ?? ?? ??</tt><br />
|-<br />
|<tt>$03</tt> || Continue Music || <tt>$03 ?? ?? ??</tt><br />
|-<br />
|<tt>$04</tt> || Play SFX || <tt>$04 xx yz ??</tt><br />
|-<br />
|<tt>$05</tt> || Stop All SFX || <tt>$05 ?? ?? ??</tt><br />
|-<br />
|<tt>$06</tt> || Main Volume || <tt>$06 xx ?? ??</tt><br />
|-<br />
|<tt>$07</tt> || Set Tempo || <tt>$07 xx ?? ??</tt><br />
|-<br />
|<tt>$08</tt> || Fast Forward On || <tt>$08 xx ?? ??</tt><br />
|-<br />
|<tt>$09</tt> || Fast Forward Off || <tt>$09 xx ?? ??</tt><br />
|-<br />
|<tt>$0A</tt> || Stop Music & Load New Data || <tt>$0A ?? ?? ??</tt><br />
|-<br />
|<tt>$0B</tt> || Load New Data || <tt>$0B ?? ?? ??</tt><br />
|-<br />
|<tt>$0C</tt> || Panning Separation || <tt>$0C xx ?? ??</tt><br />
|-<br />
|<tt>$0D-$FF</tt> || Latch Clear/Increment Counter || <tt>$0D-$FF ?? ?? ??</tt><br />
|}<br />
<br />
===Latch Clear/Increment Counter===<br />
Clears the latches on the CPUIO registers and does nothing else except increment the counter on V2.0 and up, thus making this effectively a NOP. This occurs on almost all invalid command IDs: there is exactly one exception, noted under Invalid.<br />
<br />
===Invalid (Command <tt>$22</tt> - Porky Pig's Haunted Holiday (beta) only)===<br />
Crashes the sound driver. The only game that can do this is Porky Pig's Haunted Holiday (beta), and it is because it jumps directly to a RET opcode on an empty stack, thus causing a stack underflow and a jump to a memory location not used for code.<br />
<br />
===NOP (Command <tt>$00</tt>)===<br />
Does absolutely nothing.<br />
<br />
===Play Music (Command <tt>$01</tt>)===<br />
<pre>$01 xx ?? ??</pre><br />
<br />
Stops a previous piece of music, then plays a piece of music.<br />
<br />
* <tt>xx</tt> is the music ID.<br />
** V1.0 and up have boundary checks that prevent invalid music IDs from being played.<br />
<br />
The following builds don't support this command because it effectively turns it into a NOP:<br />
* ''Apocalypse II'' (Opening logo)/''Lemmings 2: The Tribes'' (Opening logo)<br />
* ''Kick Off 3'' (Beta) (In-Game)<br />
<br />
===Pause Music (Command <tt>$02</tt>)===<br />
Stops music, allowing it to be resumed later.<br />
<br />
===Continue Music (Command <tt>$03</tt>)===<br />
Resumes a paused song.<br />
<br />
===Play SFX (Command <tt>$04-$05</tt> (V0.0-V1.0), <tt>$04-$0B</tt> (V2.0-V4.0a), <tt>$04</tt> (V4.0b))===<br />
<pre>ii xx yz ??</pre><br />
Plays a piece of SFX.<br />
<br />
* <tt>ii</tt>, the command ID used for the SFX, defines the channel used.<br><br />
** <u>V0.0-V1.0:</u><br />
*** Channels 1-6: Not supported<br />
*** Channel 7: <tt>$04</tt><br />
*** Channel 8: <tt>$05</tt><br />
** <u>V2.0-V4.0a:</u><br />
*** Channel 1: <tt>$04</tt><br />
*** Channel 2: <tt>$05</tt><br />
*** Channel 3: <tt>$06</tt><br />
*** Channel 4: <tt>$07</tt><br />
*** Channel 5: <tt>$08</tt><br />
*** Channel 6: <tt>$09</tt><br />
*** Channel 7: <tt>$0A</tt><br />
*** Channel 8: <tt>$0B</tt><br />
** V4.0b instead dynamically allocates the channel that the SFX is played on across channels 5-8, and has a fixed ID of <tt>$04</tt>.<br />
* <tt>xx</tt> defines the SFX ID. Setting the highest bit (covering IDs <tt>$80-$FF</tt>) keys off the SFX instead of playing it.<br />
** V1.0 and up have boundary checks that prevent invalid SFX IDs from being played, and only accept <tt>$FF</tt> as an ID to key off SFX on that channel. V4.0b doesn't support <tt>$80-$FF</tt> at all.<br />
* <tt>y</tt> defines the left volume. This is not signed.<br />
* <tt>z</tt> defines the right volume. This is not signed.<br />
** For both volume values, the 4-bit values are translated into the raw values for the VxVOLL/VxVOLR DSP registers via the following multipliers (in actuality, using XCNs and shift opcodes) on a per-game basis...<br />
*** ''Dream TV'' (beta and final): '''2X'''<br />
*** All other games: '''4X'''<br />
*** ''Batman: Revenge of the Joker'', ''Krusty's Super Fun House'', ''Michael Jordan Chaos in the Windy City'' ''(SFX IDs <tt>$00</tt> and <tt>$4B</tt> only)'', ''Shaq-Fu'', ''Super SWIV/Firepower 2000'': '''8X'''<br />
<br />
The following builds don't support this command:<br />
* ''Gods'' (Ending)<br />
<br />
====SFX Data Format====<br />
Each SFX entry is a pointer to a 14-byte entry containing a series of parameters to utilize on a per-SFX basis. Only one channel's worth is defined here.<br />
<br />
<pre>xx yy zz aa bb cc dd ee ff gg hh ii jj kk</pre><br />
<br />
* <tt>xx</tt> is the number of timer 0 ticks (times 10) to play the SFX for.<br />
* <tt>yy</tt> is the number of timer 0 ticks (times 10) to slide pitches before resetting and doing the slide again.<br />
* <tt>zz</tt> is a starting value for either the PITCHL DSP register or a noise frequency.<br />
* <tt>aa</tt> is a starting value for the PITCHH DSP register.<br />
* <tt>bb</tt> is the offset to apply to either the PITCHL DSP register or noise frequency per step.<br />
* <tt>cc</tt> is a offset to apply to the PITCHH DSP register per step.<br />
* <tt>dd</tt> is a direct write to the SRCN DSP register. <tt>$80</tt> and up instead cause this to use noise.<br />
* <tt>ee</tt> modifies the pitch using RNG using a non-zero value (except for V3.0 and up, this value is any non-zero value). How it is modified depends on the version.<br />
** V0.0-V2.0 overwrites the pitch settings and instead uses two bytes of RNG values for the pitch on a per-step basis.<br />
** V2.1 adds a random offset of the RNG value ANDed by <tt>$3F</tt> to the high byte of the final pitch output on a per-step basis.<br />
** V2.2 adds a random offset of the RNG value ANDed by <tt>$3F</tt> to the low byte of the starting pitch for the first step only.<br />
** V2.3 does nothing.<br />
** V3.0 and up apply a random offset of the RNG value ANDed by <tt>ee</tt> to the low byte of the starting pitch for the first step only.<br />
* <tt>ff</tt> defines the sign for the pitch offset.<br />
** <tt>$00</tt> turns off pitch slides for the SFX.<br />
** <tt>$01-$7F</tt> apply a positive offset.<br />
** <tt>$80-$FF</tt> apply a negative offset.<br />
* <tt>gg</tt> is the number of timer 0 ticks (times 10) to perform the pitch slide for.<br />
* <tt>hh</tt> is a direct write to the ADSR1 DSP register.<br />
* <tt>ii</tt> is a direct write to the ADSR2 DSP register.<br />
* <tt>jj</tt> is generally an endless flag, but its operation differs between versions.<br />
** For V0.0-V4.0a, when this value is non-zero, it causes the SFX to be played endlessly until keyed off or interrupted. It also causes pitch slides to run forever.<br />
** For V4.0b, this acts as SFX priority. If all four channels are being used for SFX and the priority of the incoming SFX is greater than the SFX currently played, then the SFX is overwritten. The SFX plays forever if this value is greater than <tt>$7F</tt>, like what is done in V0.0-V4.0a.<br />
* <tt>kk</tt> is the number of timer 0 ticks (times 10) per step.<br />
<br />
===Stop All SFX (Command <tt>$05</tt> (V4.0b))===<br />
Stops all SFX from playing in channels 5-8.<br />
<br />
===Fast Forward On (Command <tt>$06</tt> (V0.0-V1.0), <tt>$0C</tt> (V2.0-V4.0a), <tt>$08</tt> (V4.0b))===<br />
Causes the music to play at maximum tempo... which means every ten timer 0 ticks. This is equivalent to a tempo of 256.<br />
<br />
===Fast Forward Off (Command <tt>$07</tt> (V0.0-V1.0), <tt>$0D</tt> (V2.0-V4.0a), <tt>$09</tt> (V4.0b))===<br />
Causes the music to play at normal tempo.<br />
<br />
===Main & Echo Volume (Command <tt>$08</tt> (V1.0), <tt>$0E</tt> (V2.0-V4.0a), <tt>$06</tt> (V4.0b))===<br />
<pre>ii xx ?? ??</pre><br />
<br />
* <tt>xx</tt> is a direct write to the MVOLL and MVOLR DSP registers.<br />
** V2.0 and up may also set the EVOLL and EVOLR DSP registers, but only if echo is supported in that build. It is always scaled by the main volume. The scaler multiplier (actually done with shifts) is as following, sorted by build...<br />
*** ''Batman: Revenge of the Joker'': '''1/4'''<br />
*** ''Dream TV'': '''1/4'''<br />
*** ''Gods'' (all builds): '''3/16'''<br />
*** ''World Class Rugby'': '''3/16'''<br />
*** ''Lawnmower Man/Virtual Wars'' (Driving): '''3/16'''<br />
*** ''Lemmings 2: The Tribes'' (Cavelem Tribe): '''3/16'''<br />
<br />
===Echo Feedback (Command <tt>$0F</tt> (V2.0-V4.0a))===<br />
<pre>$0F xx ?? ??</pre><br />
<br />
* <tt>xx</tt> is a direct write to the EFB DSP register.<br />
<br />
This is only ever supported if echo is supported in the ROM. That means only these builds support this command:<br />
* ''Batman: Revenge of the Joker''<br />
* ''Dream TV''<br />
* ''Gods'' (all builds)<br />
* ''World Class Rugby''<br />
* ''Lawnmower Man/Virtual Wars'' (Driving)<br />
* ''Lemmings 2: The Tribes'' (Cavelem Tribe)<br />
<br />
These other games have a reference for the ID, but effectively make it a NOP:<br />
* ''Battle Cars''<br />
* ''Riddick Bowe Boxing/Chavez'' (all versions except Japanese)<br />
* ''Shaq-Fu''<br />
<br />
===Go to IPL Boot Program (Command <tt>$10</tt> (V2.0-V4.0a))===<br />
Jumps directly to the [[SPC700/IPL ROM|IPL Boot Program]].<br />
<br />
===Set Tempo (Command <tt>$11</tt> (V2.1-V2.2, V3.0-V4.0a))===<br />
<pre>$11 xx ?? ??</pre><br />
<br />
* <tt>xx</tt> is the tempo to set the song at. This overwrites the song's tempo settings.<br />
<br />
===Set SFX Volume (Command <tt>$12-$19</tt> (V3.0-V4.0a))===<br />
<pre>ii ?? xy ??</pre><br />
<br />
* <tt>ii</tt>, the command ID used for the SFX, defines the channel used.<br><br />
** Channel 1: <tt>$12</tt><br />
** Channel 2: <tt>$13</tt><br />
** Channel 3: <tt>$14</tt><br />
** Channel 4: <tt>$15</tt><br />
** Channel 5: <tt>$16</tt><br />
** Channel 6: <tt>$17</tt><br />
** Channel 7: <tt>$18</tt><br />
** Channel 8: <tt>$19</tt><br />
* <tt>x</tt> defines the left volume. This is not signed.<br />
* <tt>y</tt> defines the right volume. This is not signed.<br />
** For both volume values, the 4-bit values are translated into the raw values for the VxVOLL/VxVOLR DSP registers via the following multipliers (in actuality, using XCNs and shift opcodes) on a per-game basis...<br />
*** All other games: '''4X'''<br />
*** ''Michael Jordan Chaos in the Windy City'' ''(SFX IDs <tt>$00</tt> and <tt>$4B</tt> only)'', Shaq-Fu: '''8X'''<br />
<br />
===Set SFX Pitch (Command <tt>$1A-$21</tt> (V3.0-V4.0a))===<br />
<pre>ii xx yy ??</pre><br />
<br />
* <tt>ii</tt>, the command ID used for the SFX, defines the channel used.<br><br />
** Channel 1: <tt>$1A</tt><br />
** Channel 2: <tt>$1B</tt><br />
** Channel 3: <tt>$1C</tt><br />
** Channel 4: <tt>$1D</tt><br />
** Channel 5: <tt>$1E</tt><br />
** Channel 6: <tt>$1F</tt><br />
** Channel 7: <tt>$20</tt><br />
** Channel 8: <tt>$21</tt><br />
* <tt>xx</tt> overwrites the PITCHL value used in the SFX data.<br />
* <tt>yy</tt> overwrites the PITCHH value used in the SFX data.<br />
<br />
This command is only supported in the following builds...<br />
* ''Battle Cars''<br />
* ''World Cup Striker'' (European version) (Title Screen)<br />
* ''Elite Soccer/World Cup Striker'' (European version) (In-Game)<br />
* ''Kick Off 3'' (beta) (In-Game)<br />
* ''Kick Off 3'' (beta) (Title Screen)<br />
* ''Porky Pig's Haunted Holiday'' (beta)<br />
* ''Shaq-Fu''<br />
<br />
Not all IDs are supported in these builds...<br />
* ''Lawnmower Man/Virtual Wars'' (In-Game) ''(does not support $1A-$1D, not even in the code)''<br />
* ''World Cup Striker (Japanese)'' (In-Game) ''(skips over $1A-$1D via a longer branding distance, though the code still exists)''<br />
<br />
===Load New Data (Command <tt>$22-$23</tt> (V4.0a), <tt>$0B</tt> (V4.0b))===<br />
Loads new data from the SNES using an IPL Boot ROM variant.<br><br />
''TODO Loading New Data (the loading protocol should be nearly identical to IPL Boot ROM with a few differences or so, since the program seems like it's using a similar variant)''<br />
<br />
===Stop Music & Load New Data (Command <tt>$0A</tt> (V4.0b))===<br />
Stops the music, then loads new data from the SNES using an IPL Boot ROM variant.<br />
<br />
===Panning Separation (Command <tt>$0C</tt> (V4.0b))===<br />
<pre>$0C xx ?? ??</pre><br />
<br />
* <tt>xx</tt> sets the amount that the panning will influence the left and right volumes. The maximum is zero, while the minimum is <tt>$FF</tt>, making it effectively mono.<br />
<br />
==Song Entry==<br />
For each song entry in an array of song definitions, one byte is defined as a starting tempo, followed by a series of pointers defined [[little endian]] style for each channel. For ''Dream TV'' Beta and ''Kick Off'', only five pointers are defined. For ''Krusty's Super Fun House'', only six pointers are defined. For all other games, eight pointers are defined.<br />
<br />
<u>Pre-V2.0 (''Dream TV Beta'' and ''Kick Off'')</u><br />
<pre>xx yy yy zz zz aa aa bb bb cc cc</pre><br />
<u>Pre-V2.0 (''Krusty's Super Fun House'')</u><br />
<pre>xx yy yy zz zz aa aa bb bb cc cc dd dd</pre><br />
<u>V2.0 and up</u><br />
<pre>xx yy yy zz zz aa aa bb bb cc cc dd dd ee ee ff ff</pre><br />
* <tt>xx</tt> is the starting tempo.<br />
* <tt>yy yy</tt> is a [[little endian]] pointer to a pattern order list for channel 1.<br />
* <tt>zz zz</tt> is a [[little endian]] pointer to a pattern order list for channel 2.<br />
* <tt>aa aa</tt> is a [[little endian]] pointer to a pattern order list for channel 3.<br />
* <tt>bb bb</tt> is a [[little endian]] pointer to a pattern order list for channel 4.<br />
* <tt>cc cc</tt> is a [[little endian]] pointer to a pattern order list for channel 5.<br />
* <tt>dd dd</tt> is a [[little endian]] pointer to a pattern order list for channel 6. ''(''Krusty's Super Fun House'' only pre-V2.0)''<br />
* <tt>ee ee</tt> is a [[little endian]] pointer to a pattern order list for channel 7. ''(not supported pre-V2.0)''<br />
* <tt>ff ff</tt> is a [[little endian]] pointer to a pattern order list for channel 8. ''(not supported pre-V2.0)''<br />
<br />
Two builds don't support song entries at all:<br />
* The opening logos for ''Lemmings 2: The Tribes'' and ''Apocalypse II''<br />
* ''Kick Off 3'' Beta (In-Game)<br />
<br />
==Pattern Order List==<br />
Each channel has a list of [[little endian]] pointers on a per-channel basis. If the pointer is zero, then you jump back to either the beginning of the pattern order list for that channel, or to the loop marker that was set by the $F5 VCMD.<br />
<br />
==Instrument Format==<br />
The instrument format at V0.0 is N-SPC/Kankichi-kun compatible minus noise support for SRCN values above 127. Post V1.0, one of the bytes loses its usage.<br />
<br />
The instrument format is defined as direct writes to DSP registers for the first four bytes, followed by two pitch-related bytes. They are defined like this...<br><br />
<u>V0.0-V1.0</u><br />
<pre>xx yy zz aa bb cc</pre><br />
<u>V2.0 and up</u><br />
<pre>xx yy zz ?? bb cc</pre><br />
* <tt>xx</tt> is a direct write to the VxSRCN DSP register.<br />
* <tt>yy</tt> is a direct write to the VxADSR1 DSP register.<br />
* <tt>zz</tt> is a direct write to the VxADSR2 DSP register.<br />
* <tt>aa</tt> is a direct write to the VxGAIN DSP register... but only for V1.0 and older. Otherwise, it's an unused byte.<br />
* <tt>bb</tt> is a pitch base multiplier.<br />
* <tt>cc</tt> is a fractional pitch base multiplier, defined in 256ths.<br />
<br />
==Voice Command Format==<br />
{| class="wikitable sortable"<br />
|-<br />
! VCMD ID !! Description !! Arguments !! Minimum Version<br />
|-<br />
| <tt>$00-$5F</tt> || Note || || 0.0<br />
|-<br />
| <tt>%011xxxxx</tt><br><tt>$60-$7F</tt> || Note Length || || 0.0<br />
|-<br />
| <tt>$80-$EC</tt> || Invalid || || 0.0<br />
|-<br />
| <tt>$ED</tt> || Instant Pitch Change to Note || <tt>xx</tt> || 4.0a only<br />
|-<br />
| <tt>$ED</tt> || Set Main Volume || <tt>xx</tt> || 4.0b only<br />
|-<br />
| <tt>$EE</tt> || Volume Scaler by Fraction ||<tt>xx yy</tt> || 3.0<br />
|-<br />
| <tt>$EF</tt> || Set ADSR ||<tt>xx yy</tt> || 2.0<br />
|-<br />
| <tt>$F0</tt> || Fine Tune ||<tt>xx</tt> || 1.0<br />
|-<br />
| <tt>$F1</tt> || Pitch Bend ||<tt>xx</tt> || 1.0<br />
|-<br />
| <tt>$F2</tt> || Pitch Envelope ID ||<tt>xx</tt> || 0.0<br />
|-<br />
| <tt>$F3</tt> || L/R Voice Volume ||<tt>xx yy</tt> || 0.0<br />
|-<br />
| <tt>$F4</tt> || Tempo ||<tt>xx</tt> || 0.0<br />
|-<br />
| <tt>$F5</tt> || Jump to Pattern in Order List + Mark Loop Point ||<tt>xx xx</tt> || 0.0<br />
|-<br />
| <tt>$F6-$F7</tt> || Invalid || || 0.0<br />
|-<br />
| <tt>$F8</tt> || Key Off || || 0.0<br />
|-<br />
| <tt>$F9</tt> || One Note Delay || || 0.0<br />
|-<br />
| <tt>$FA</tt> || Instrument ||<tt>xx</tt> || 0.0<br />
|-<br />
| <tt>$FB</tt> || Absolute Global Transposition ||<tt>xx</tt> || 0.0<br />
|-<br />
| <tt>$FC</tt> || Absolute Transposition ||<tt>xx</tt> || 0.0<br />
|-<br />
| <tt>$FD</tt> || Invalid || || 0.0<br />
|-<br />
| <tt>$FE</tt> || Song End || || 0.0<br />
|-<br />
| <tt>$FF</tt> || Pattern End || || 0.0<br />
|}<br />
<br />
===Invalid (VCMDs <tt>$80-$EC</tt>, <tt>$F6-$F7</tt>, <tt>$FD</tt>, <tt>$ED</tt> (pre-V4.0), <tt>$EE</tt> (pre-V3.0), <tt>$EF</tt> (pre-V2.0), <tt>$F0-$F1</tt> (pre-V1.0))===<br />
The sound driver deliberately freezes itself via an infinite branch always loop rather than crashing or skipping the byte. This also takes up VCMD slots that are not filled in earlier versions.<br />
<br />
===Note (VCMDs <tt>$00-$5F</tt>)===<br />
Plays a note and delays the channel for one note length.<br />
<br />
The initial pitch of the note is calculated using a pitch table that contains one octave (and a note)'s worth of pitch values. This value is shifted according to the octave used, and multiplied that by the instrument's pitch base to get the pitch for the note.<br />
<br />
===Note Length (VCMDs <tt>$60-$7F</tt>)===<br />
<pre>%011xxxxx</pre><br />
* <tt>%xxxxx</tt> represents your note length (as five bits).<br />
Note lengths are defined as the number of tempo ticks plus one. Notes are keyed off either on another note or on the key off VCMD (<tt>$F9</tt>).<br />
<br />
===Instant Pitch Change to Note (VCMD <tt>$ED</tt> (V4.0a))===<br />
Requires V4.0a to use. Lower versions freeze the sound driver instead.<br />
<br />
<pre>$ED xx</pre><br />
<br />
* <tt>xx</tt> defines the note to change the pitch to.<br />
<br />
===Set Main Volume (VCMD <tt>$ED</tt> (V4.0b))===<br />
Requires V4.0b to use. Lower versions freeze the sound driver instead.<br />
<br />
<pre>$ED xx</pre><br />
<br />
* <tt>xx</tt> defines the value to directly write to the MVOL DSP registers.<br />
<br />
===Volume Scaler by Fraction (VCMD <tt>$EE</tt>)===<br />
Requires V3.0 or later to use. Lower versions freeze the sound driver instead.<br />
<br />
<pre>$EE xx yy</pre><br />
<br />
* <tt>xx</tt> is the numerator, and is the value to multiply the volume by. The scaling process is skipped if this is zero.<br />
* <tt>yy</tt> is the denominator, and is the value to divide the multiplied volume by.<br />
Put the two together, and you get the current volume multiplied by <math>\frac{x}{y}</math>.<br />
<br />
===Set ADSR (VCMD <tt>$EF</tt>)===<br />
Requires V2.0 or later to use. Lower versions freeze the sound driver instead.<br />
<br />
<pre>$EF xx yy</pre><br />
* <tt>xx</tt> is a direct write to the VxADSR1 DSP register.<br />
* <tt>yy</tt> is a direct write to the VxADSR2 DSP register.<br />
<br />
===Fine Tune (VCMD <tt>$F0</tt>)===<br />
Requires V1.0 or later to use. Lower versions freeze the sound driver instead.<br />
<br />
<pre>$F0 xx</pre><br />
* <tt>xx</tt> is a signed delta value for the pitch. These are in units used directly for the VxPITCH DSP registers.<br />
<br />
===Pitch Bend (VCMD <tt>$F1</tt>)===<br />
Requires V1.0 or later to use. Lower versions freeze the sound driver instead.<br />
<br />
<pre>$F1 xx</pre><br />
* <tt>xx</tt> is a signed delta value for the pitch. These are in units used directly for the VxPITCH DSP registers.<br />
<br />
Pitch bends are performed every 10 timer 0 ticks independently of the tempo. They are automatically terminated after one note.<br />
<br />
===Pitch Envelope ID (VCMD <tt>$F2</tt>)===<br />
<pre>$F2 xx</pre><br />
* <tt>xx</tt> is an index to an array to a pointer of pitch envelopes. The pitch envelopes always loop and are clocked at a rate of 10 timer 0 ticks independently of the tempo ticker.<br />
<br />
====Pitch Envelope Format====<br />
=====Pitch Offset Tick (CMD <tt>$00-$7F</tt>, <tt>$81-$FF</tt>)=====<br />
<pre>xx</pre><br />
* <tt>xx</tt> is a signed offset in units used directly for the VxPITCH DSP registers.<br />
<br />
=====Restart Pitch Envelope (CMD <tt>$80</tt>)=====<br />
<pre>$80</pre><br />
Jumps back to the beginning of the pitch envelope.<br />
<br />
===L/R Voice Volume (VCMD <tt>$F3</tt>)===<br />
<pre>$F3 xx yy</pre><br />
* This is a direct DSP register write to the VxVOL registers, with <tt>xx</tt> being the left volume and <tt>yy</tt> being the right volume.<br />
This VCMD's operation changes depending on the version:<br />
* V0.0-V2.2: Just a straight write with no modifications made to the input.<br />
* V2.3: The input values are shifted left once.<br />
* V3.0-V4.0b: The input values are scaled by VCMD <tt>$EE</tt>, then shifted left once.<br />
* V4.0a: The input values are scaled by the values used for VCMD <tt>$EE</tt>. The shifting operation is not done here.<br />
<br />
===Tempo (VCMD <tt>$F4</tt>)===<br />
<pre>$F4 xx</pre><br />
* <tt>xx</tt> defines one tempo tick as <math>10*\frac{256}{x}</math> timer 0 ticks. Zero freezes the song except for pitch bends and envelopes, which are not affected by the tempo ticker.<br />
<br />
===Jump to Pattern in Order List + Mark Loop Point (VCMD <tt>$F5</tt>)===<br />
<pre>$F5 xx xx</pre><br />
* <tt>xx xx</tt> is a [[little endian]] pointer to a pattern pointer in the order list. The loop point is marked here.<br />
<br />
===Key Off (VCMD <tt>$F8</tt>)===<br />
Keys off the note and delays the channel for one note length.<br />
<br />
===One Note Delay (VCMD <tt>$F9</tt>)===<br />
Delays the channel for one note length.<br />
<br />
===Instrument (VCMD <tt>$FA</tt>)===<br />
<pre>$FA xx</pre><br />
* <tt>xx</tt> is an instrument ID to an array of instruments. See Instrument Format above for the format.<br />
<br />
===Absolute Global Transposition (VCMD <tt>$FB</tt>)===<br />
<pre>$FB xx</pre><br />
* <tt>xx</tt> is a signed note offset to apply for all channels for the music.<br />
Global transpsition is not applied to channel 4 for V0.0-2.1.<br />
<br />
===Absolute Transposition (VCMD <tt>$FC</tt>)===<br />
<pre>$FC xx</pre><br />
* <tt>xx</tt> is a signed note offset to apply for the channel.<br />
<br />
===Song End (VCMD <tt>$FE</tt>)===<br />
Terminates the song for all of the channels.<br />
<br />
===Pattern End (VCMD <tt>$FF</tt>)===<br />
Ends the pattern and goes to the pattern order list to fetch another pattern.<br />
<br />
[[Category:SPC Sound Engines]]</div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2263Bithacks2022-11-10T21:12:31Z<p>Runic Rain: Combine Carry Flag</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''7 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
BPL +<br />
ADC #$00<br />
+<br />
</pre><br />
note: Rounds toward zero.<br />
<br />
== Arithmetic Shift Right ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is similar to division by 2, but rounds toward negative infinity.<br />
<br />
== Arithmetic Shift Right, multiple steps ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro ASR_multi(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro ASR_multi(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Sign Extend ==<br />
''13 bytes / 18 cycles''<br />
<br><br />
<u>inputs:</u> 8bit value in $10<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
REP #$20<br />
LDA $10-1 ; load $10 into A high, and garbage in low<br />
AND #$FF00 ; discard garbage<br />
BPL +<br />
ORA #$00FF<br />
+<br />
XBA<br />
</pre><br />
<br />
== Clamp Signed (To Constants) ==<br />
''16 bytes/15 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; clamp signed value in A to [min,max] if min/max are signed constants<br />
macro clamp_const(min,max)<br />
EOR #$80<br />
CMP #$80^<min> : BCS ?+<br />
LDA #$80^<min><br />
?+ CMP #$80^<max> : BCC ?+<br />
LDA #$80^<max><br />
?+ EOR #$80<br />
endmacro<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== XCN ==<br />
''12 bytes / 16 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; eXchaNge Nibble without a LUT<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
</pre><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
; The input here is specifically a signed speed, or similar value.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre><br />
<br />
== Combine Carry Flag ==<br />
''4 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> (Flag, On Stack)<br />
<br><br />
<u>outputs:</u> (Carry Flag)<br />
<pre><br />
; flag on stack via PHP (8-Bit A if this), etc.<br />
; code that alters Carry Flag<br />
PLA : BCS +<br />
LSR<br />
+</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2238Bithacks2022-10-21T22:59:59Z<p>Runic Rain: /* Clamp (To Constants) */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''7 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
BPL +<br />
ADC #$00<br />
+<br />
</pre><br />
note: Rounds toward zero.<br />
<br />
== Arithmetic Shift Right ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is similar to division by 2, but rounds toward negative infinity.<br />
<br />
== Arithmetic Shift Right, multiple steps ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro ASR_multi(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro ASR_multi(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Sign Extend ==<br />
''13 bytes / 18 cycles''<br />
<br><br />
<u>inputs:</u> 8bit value in $10<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
REP #$20<br />
LDA $10-1 ; load $10 into A high, and garbage in low<br />
AND #$FF00 ; discard garbage<br />
BPL +<br />
ORA #$00FF<br />
+<br />
XBA<br />
</pre><br />
<br />
== Clamp Signed (To Constants) ==<br />
''16 bytes/15 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; clamp signed value in A to [min,max] if min/max are signed constants<br />
macro clamp_const(min,max)<br />
EOR #$80<br />
CMP #$80^<min> : BCS ?+<br />
LDA #$80^<min><br />
?+ CMP #$80^<max> : BCC ?+<br />
LDA #$80^<max><br />
?+ EOR #$80<br />
endmacro<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== XCN ==<br />
''12 bytes / 16 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; eXchaNge Nibble without a LUT<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
</pre><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
; The input here is specifically a signed speed, or similar value.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2237Bithacks2022-10-21T22:52:59Z<p>Runic Rain: Added Clamp2Const</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''7 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
BPL +<br />
ADC #$00<br />
+<br />
</pre><br />
note: Rounds toward zero.<br />
<br />
== Arithmetic Shift Right ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is similar to division by 2, but rounds toward negative infinity.<br />
<br />
== Arithmetic Shift Right, multiple steps ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro ASR_multi(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro ASR_multi(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Sign Extend ==<br />
''13 bytes / 18 cycles''<br />
<br><br />
<u>inputs:</u> 8bit value in $10<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
REP #$20<br />
LDA $10-1 ; load $10 into A high, and garbage in low<br />
AND #$FF00 ; discard garbage<br />
BPL +<br />
ORA #$00FF<br />
+<br />
XBA<br />
</pre><br />
<br />
== Clamp (To Constants) ==<br />
''16 bytes/15 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; clamp value in A to [min,max] if min/max are constants<br />
macro clamp_const(min,max)<br />
EOR #$80<br />
CMP #$80^<min> : BCS ?+<br />
LDA #$80^<min><br />
?+ CMP #$80^<max> : BCC ?+<br />
LDA #$80^<max><br />
?+ EOR #$80<br />
endmacro<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== XCN ==<br />
''12 bytes / 16 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; eXchaNge Nibble without a LUT<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
</pre><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
; The input here is specifically a signed speed, or similar value.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2224Bithacks2022-06-16T23:37:30Z<p>Runic Rain: /* Misc. Tricks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''7 bytes / 8 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
BPL +<br />
ADC #$00<br />
+<br />
</pre><br />
note: Rounds toward zero.<br />
<br />
== Arithmetic Shift Right ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is similar to division by 2, but rounds toward negative infinity.<br />
<br />
== Arithmetic Shift Right, multiple steps ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro ASR_multi(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro ASR_multi(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Sign Extend ==<br />
''13 bytes / 18 cycles''<br />
<br><br />
<u>inputs:</u> 8bit value in $10<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
REP #$20<br />
LDA $10-1 ; load $10 into A high, and garbage in low<br />
AND #$FF00 ; discard garbage<br />
BPL +<br />
ORA #$00FF<br />
+<br />
XBA<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== XCN ==<br />
''12 bytes / 16 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; eXchaNge Nibble without a LUT<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
ASL : ADC #$00<br />
</pre><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
; The input here is specifically a signed speed, or similar value.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2146Bithacks2022-02-13T14:30:49Z<p>Runic Rain: Went to reference this and realized... I know what I was talking about, but it was not incredibly clear.</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Signed Division By 2<sup>n</sup> ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
; The input here is specifically a signed speed, or similar value.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2142Bithacks2022-01-27T20:21:23Z<p>Runic Rain: </p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br><br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Signed Division By 2<sup>n</sup> ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2141Bithacks2022-01-27T20:21:08Z<p>Runic Rain: Linked to another page I was unaware of.</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
See also: [[Useful_Code_Snippets|Useful Code Snippets]]<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Signed Division By 2<sup>n</sup> ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2125Bithacks2021-11-28T08:26:15Z<p>Runic Rain: Thanks to JamesD28 for noticing something I never did. Figure it would be good to note as a gotcha.</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Signed Division By 2<sup>n</sup> ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
note: This is a sign-preserving division so -1÷2 will stay -1 since 0 is positive<br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2120Bithacks2021-11-01T08:49:58Z<p>Runic Rain: /* Check N Conditions True */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
== Signed Division By 2<sup>n</sup> ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+7 bytes / 2n+7 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=2119Bithacks2021-11-01T08:42:16Z<p>Runic Rain: </p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
= Math Bithacks =<br />
== Signed Division By 2 ==<br />
''3 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
== Signed Division By 2<sup>n</sup> ==<br />
''6+n bytes / 6+2n cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value ==<br />
''5 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
== Absolute Value (SEC) ==<br />
''4 bytes / 4 cycles''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
== Magnitude/Extents Check ==<br />
''~7 bytes / 12 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
= Misc. Tricks =<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
== Clear Low Byte of Accumulator ==<br />
''1 byte / 2 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears low byte<br />
TDC<br />
</pre><br />
<br />
== Direction/Facing As Index ==<br />
''4 bytes / 6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
== Check N Conditions True ==<br />
''n+6 bytes / 2n+6 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; You can test for multiple conditions being true (7 conditions true, at least 5 conditions, etc.) by simply using a counter and rounding to the next power of 2 and test if that bit is set.<br />
; You can also test for "Less than N True", "More than N", etc. with variations.<br />
; This is almost more a coding technique, but it's super helpful, so worth pointing out.<br />
; It can allow you to re-arrange branches of code as independent blocks among other useful things.<br />
; You can also use any RAM instead of A at a small cost.<br />
<br />
; Example Test For 5 True Conditions:<br />
!Next_Highest_Power_of_2 = $08<br />
!N_True_Target = $05<br />
LDA #!Next_Highest_Power_of_2!-!N_True_Target-1 ; here we set up our rounding, the -1 isn't strictly necessary *most* of the time<br />
%TestSomeCondition()<br />
BCC + ; here we're going to say our test just returns carry set on true (but it could directly INC inside the code as well)<br />
INC<br />
+<br />
; ... repeat the above 5 times for different tests<br />
<br />
N_True_Test:<br />
INC ; replace our -1 to bring us up to a full power of 2 if we had enough True<br />
AND #!Next_Highest_Power_of_2<br />
BEQ .false<br />
.true:<br />
; N Tests were True<br />
.false:<br />
; Not exactly N tests were true<br />
</pre><br />
<br />
== Skip Dead Code ==<br />
''1-2 bytes / 2-3 cycles''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
== Check 3 Conditions ==<br />
''2 bytes / 2 cycles''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1889Bithacks2021-04-25T03:51:09Z<p>Runic Rain: small correction with big implications (wrote this as immediate mode even though I've never used this trick that way, turns out that doesn't work!)</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Clear High Byte of Accumulator''' ''(1 byte / 2 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears high byte<br />
TDC<br />
</pre><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), using any operand that's not immediate (#)<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; Just As A Reminder: the V & N flag are set by the *operand* to BIT not the result of the AND!<br />
BIT $00<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set ; assuming #$20 is in $00<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1888Bithacks2021-04-25T03:15:12Z<p>Runic Rain: found confirmation info on BRA vs JMP</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Clear High Byte of Accumulator''' ''(1 byte / 2 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears high byte<br />
TDC<br />
</pre><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use BRA/JMP instead<br />
; JMP is as fast as BRA on the SNES CPU, but will be slightly slower on SA-1, and 1 cycle slower on SPC. So BRA is recommended<br />
; (The extra byte used for JMP in this case doesn't matter)<br />
BRA + ; 2 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1887Bithacks2021-04-25T01:16:12Z<p>Runic Rain: Learned about a new opcode from Catador and a FF6/ChronoTrigger trick.</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Clear High Byte of Accumulator''' ''(1 byte / 2 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; "Trashes" A but clears high byte<br />
TDC<br />
</pre><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1842Bithacks2021-04-12T13:03:06Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A, (N Flag)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1841Bithacks2021-04-12T13:02:32Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1839Bithacks2021-04-12T12:57:07Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ ?positive<br />
ORA.b #$FF00>><n> ; sign extension<br />
?positive:<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1838Bithacks2021-04-12T12:56:05Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(3 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
CMP #$80<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+n bytes / 6+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
LSR #<n><br />
BIT.b #$80>><n><br />
BEQ +<br />
ORA.b #$FF00>><n> ; sign extension<br />
+<br />
endmacro<br />
<br />
; -1 cycle and +n bytes, but must have N flag set before use<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA.b #$FF00>><n> ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1836Bithacks2021-04-12T12:26:48Z<p>Runic Rain: /* Misc. Tricks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA #($FF<<(8-<n>)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set<br />
BVS .bit6_set<br />
BNE .bit5_set<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1835Bithacks2021-04-12T12:24:10Z<p>Runic Rain: /* Misc. Tricks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA #($FF<<(8-<n>)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input [A] on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set:<br />
BVS .bit6_set:<br />
BNE .bit5_set:<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1834Bithacks2021-04-12T12:23:20Z<p>Runic Rain: /* Misc. Tricks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA #($FF<<(8-<n>)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre><br />
<br />
'''Skip Dead Code''' ''(1-2 bytes / 2-3 cycles)''<br />
<br><br />
<u>inputs:</u> (none)<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; If you need to skip one byte of dead code (due to a hijack or whatever reason) you can use:<br />
NOP ; 1 byte, 2 cycles<br />
<br />
; But if you need to skip just 2 bytes the most efficient is:<br />
; NOTE: many times WDM is used as a breakpoint for debugging so only do this as a final pass to speed up your code!<br />
WDM ; 2 bytes, 2 cycles<br />
<br />
; Finally, if you need to skip a large amount of dead code you can use JMP instead<br />
; (BRA is just as fast but has less range, it's advantage is mostly in reducing code size, but that's not an issue at this point)<br />
JMP + ; 3 bytes, 3 cycles<br />
; dead code<br />
+<br />
</pre><br />
<br />
'''Check 3 Conditions''' ''(2 bytes / 2 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; just the opcode as normal here (not counting the conditions), you can use other operands instead of #$20<br />
; it's worth noting that you can do up to 3 tests with a single opcode though!<br />
; just as a reminder: the V and N flag are set by the *operand* to BIT not the result of the AND!<br />
; this means you can check the input on bit 6 with #$40 for example<br />
BIT #$20<br />
BMI .bit7_set:<br />
BVS .bit6_set:<br />
BNE .bit5_set:<br />
.bit7_set:<br />
.bit6_set:<br />
.bit5_set:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1833Bithacks2021-04-12T10:44:38Z<p>Runic Rain: /* Misc. Tricks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA #($FF<<(8-<n>)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction/Facing As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL<br />
ROL<br />
AND #$01<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1832Bithacks2021-04-12T10:43:54Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA #($FF<<(8-<n>)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value''' ''(5 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
macro abs()<br />
BPL ?plus<br />
EOR #$FF<br />
INC<br />
?plus: ; only 3 cycles if branch taken<br />
endmacro<br />
</pre><br />
<br />
'''Absolute Value (SEC)''' ''(4 bytes / 4 cycles)''<br />
<br><br />
<u>inputs:</u> A, (Carry Set)<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; compared to the branching version this is 1 byte smaller<br />
; it's either 2 cycles slower/faster depending on branch taken<br />
EOR #$7F<br />
; SEC ; the instant you add this in it becomes worse than the branching version<br />
SBC #$7F<br />
</pre><br />
<br />
'''Magnitude/Extents Check''' ''(~7 bytes / 12 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> (none)<br />
<pre><br />
; asks "Is [A] on the zero-side of value [X] or the far side?"<br />
; good for magnitude checks, smaller *AND* faster than alternatives<br />
; NOTE: in the event that it is exactly [X] it will have that value at branch<br />
; doesn't need to be an indexed CMP but is most useful this way<br />
; this can be used to combine the BPL and BMI checks for both signs into one<br />
SEC : SBC Extents,x<br />
EOR Extents,x<br />
BMI .zero_side<br />
.far_side:<br />
; do things<br />
.zero_side:<br />
; do things<br />
<br />
Extents:<br />
db -$23, $23<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL : ROL : AND #$01 ; get facing as index<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1831Bithacks2021-04-12T10:22:49Z<p>Runic Rain: /* Misc. Tricks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #<n><br />
BRA ?end<br />
?negative:<br />
LSR #<n><br />
ORA #($FF<<(8-<n>)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small><br />
<br />
'''Direction As Index''' ''(4 bytes / 6 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; Ever wonder why facing flags are 0=right and 1=left? This is why. It's incredibly cheap.<br />
ASL : ROL : AND #$01 ; get facing as index<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1800Bithacks2021-04-06T08:00:10Z<p>Runic Rain: </p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br><br />
'''Note: cycle counts are intended to be a worst case measure.'''<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #!n<br />
BRA ?end<br />
?negative:<br />
LSR #!n<br />
ORA #($FF<<(8-!n)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre><br />
<br />
== Misc. Tricks ==<br />
<small>As this list grows tricks here will be consolidated into their own sections. Clever optimization tricks that aren't necessarily what someone might personally call a "bithack" are okay here as well!</small></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1799Bithacks2021-04-06T07:48:25Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #!n<br />
BRA ?end<br />
?negative:<br />
LSR #!n<br />
ORA #($FF<<(8-!n)) ; sign extension<br />
?end:<br />
endmacro<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1798Bithacks2021-04-06T07:47:36Z<p>Runic Rain: /* Math Bithacks */</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre><br />
<br />
'''Signed Division By 2<sup>n</sup>''' ''(6+2n bytes / 5+2n cycles)''<br />
<br><br />
<u>inputs:</u> A, n<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
; signed division by two, n times<br />
macro SignedDiv_2N(n)<br />
BMI ?negative<br />
LSR #!n<br />
BRA ?end<br />
?negative:<br />
LSR #!n<br />
ORA #($FF<<(8-!n)) ; sign extension<br />
?end:<br />
</pre></div>Runic Rainhttps://sneslab.net/mw/index.php?title=Bithacks&diff=1797Bithacks2021-04-06T07:24:44Z<p>Runic Rain: A community page containing bithacks for the 65c816</p>
<hr />
<div>Bithacks are optimization tricks that utilize information in bits and [https://en.wikipedia.org/wiki/Bit_manipulation bit manipulation]<br />
to accomplish their tasks. Usually they work in a slightly non-obvious way, (the most famous being the [https://en.wikipedia.org/wiki/Fast_inverse_square_root fast inverse sqrt]), and bit manipulation in general is harder on the 65c816. To that end here is a collection of some useful tricks.<br />
<br />
== Math Bithacks ==<br />
'''Signed Division By 2''' ''(5 bytes / 10 cycles)''<br />
<br><br />
<u>inputs:</u> A<br />
<br><br />
<u>outputs:</u> A<br />
<pre><br />
STA $00<br />
ASL $00<br />
ROR<br />
</pre></div>Runic Rain