This is the documentation for the 6502, 65C02, 65CE02, 45GS02 and Z80 assembler jAsm. It was written in 2015 by me, Jonas Hultén, because I used DAsm before and over the years it started to feel outdated. I missed namespaces and I was annoyed by some bugs, like its inability to use local variables as macro arguments. It felt a bit slow on my slow laptop computer as well, although that wasn't a strong argument. Also, I hadn't written a language before and it felt like an interesting challenge.
jAsm was written with two main goals. It should be fast enough to assemble a large program in under a second and it should support everything DAsm can do and more. To reach the required speed it tries to use memory as linearly as possible and it proved to be faster than DAsm in the end.
The assembler was made for Commodore 64 programming and some features were specifically made to help programming for that computer. However, it shouldn't stop anyone from making programs for other computers with it.
jAsm looks a lot like C. It wasn't meant to do that but over the course of development it moved closer and closer because it was easier to solve parsing problems that way.
It took 7 months to complete this first version of the assembler. It is still a bit rough around the edges but has some power under the hood. Let's start!
This documentation covers the language and syntax provided by the assembler but not any details about specific supported processors. It was written when only 6502 was supported so the document is heavily geared towards that processor.
jAsm supports all regular instructions of the 6502. Instructions are written in lower case.
lda #0
sta $d020
The brk
instruction takes an optional immediate argument since rti
actually will return to the instruction after that argument (this goes for 65C02, 65CE02 and 45GS02 as well).
brk // valid but rti won't return directly after this instruction
brk #0 // optional argument makes rti return to next instruction
Due to the large amount of source code with upper case instruction keywords, a python script is provided to convert upper case keywords in all .asm files in a directory. Run that like this.
python3 tools/convert_6502_keyword_case.py <my_source_directory>
jAsm supports all regular instructions of the Western Design Center 65C02. Instructions are written in lower case.
stz $d020
bra loop
The bit operation instructions don't have the bit in the instruction name as some assemblers do. Instead it is a separate argument. To follow convention, there is no '#' before the bit number to indicate immediate mode, even if that would be more consistent.
bbr 0, zp, label
bbs 1, zp, label
rmb 2, zp
smb 3, zp
jAsm supports all regular instructions of the Commodore Semiconductor Group 65CE02. Instructions are written in lower case.
ldz $d020
bru loop
Just like 65C02, the bit operation instructions don't have the bit in the instruction name as some assemblers do. Instead it is a separate argument. To follow convention, there is no '#' before the bit number to indicate immediate mode, even if that would be more consistent.
bbr 0, zp, label
bbs 1, zp, label
rmb 2, zp
smb 3, zp
The aug instruction isn't available since that's intended to extend the processor with more instructions in the future.
The stack pointer relative access addressing mode is written like this.
lda ($55,sp),y
jAsm has experimental support for the new Mega65 instructions of the 45GS02, along with the instructions of CSG4510.
Instructions are written in lower case.
ldz $d020
bru loop
Just like 65CE02, the bit operation instructions don't have the bit in the instruction name as some assemblers do. Instead it is a separate argument. To follow convention, there is no '#' before the bit number to indicate immediate mode, even if that would be more consistent.
bbr 0, zp, label
bbs 1, zp, label
rmb 2, zp
smb 3, zp
The stack pointer relative access addressing mode is written like this.
lda ($55,sp),y
The indirect quad addressing mode is written using brackets.
lda [$55],z
jAsm supports all regular instructions of the Z80. Instructions are written in lower case.
ld a, 0
ld (hl), a
There's also a script to convert Z80 uppercase keywords to lowercase. Run that like this.
python3 tools/convert_z80_keyword_case.py <my_source_directory>
We'll start by creating a small program in a text file.
processor "6502"
section code, "main", $8000
{
inc $d020
rts
}
Save this to a file named main.jasm. Use utf-8 format, because this is what jAsm expects. 7-bit ASCII is also ok since that is compatible with the utf-8 format. Now we'll assemble it into a binary. Open a command line window and change the current directory to where the main.jasm file is. Type this on the command line.
jasm -hla main.jasm main.prg
Now you have a program that changes the border color on a Commodore 64. Load it into an emulator or onto a real machine.
LOAD"MAIN.PRG",8,1
Now start it.
SYS32768
The border color changes.
If you want to start it on a Commodore 64 with a BASIC line, you need to add the necessary data to produce a SYS line at the BASIC start. This is specific to the Commodore BASIC v2. This example shows how to do that in jAsm.
processor "6502"
section code, "main", $0801
{
define word = .next_basic_line // next BASIC line
define word = 2016 // line number
define byte = $9e // SYS token
define byte[] = { string(.start) }
define byte = 0 // end of line
.next_basic_line:
define word = 0 // zero next BASIC line to mark end of program
.start:
inc $d020
rts
}
Stuff written after //
are comments and will be completely ignored by the assembler.
.next_basic_line
and .start
are labels that represent the addresses in memory where they are placed. The dot before the name means it is local to the space between the closest surrounding curly braces. define
places variable data into the program. A word
is two bytes long. The SYS token is written in hexadecimal form, which is what the dollar sign indicates.
string(.start)
means "call the built in function string
with the argument .start
". The function will return a string representation of .start
.
This BASIC line thing will be used a lot in programs since almost all programs loaded from disk will need it. Let's break out this code into a handy macro that we can reuse. The macro will need two arguments, one is the line number and one is the address to start the program from.
processor "6502"
macro basic_sys_line(.line_number, .sys_address)
{
define word = .next_basic_line // next BASIC line
define word = .line_number
define byte = $9e // SYS token
define byte[] = { string(.sys_address) }
define byte = 0 // end of line
.next_basic_line:
define word = 0 // zero next BASIC line to mark end of program
}
section code, "main", $0801
{
basic_sys_line(2016, .start)
.start:
inc $d020
rts
}
The start of the main section invokes the macro and this inserts the code in the macro at the place of invocation.
The main section of our example looks a lot cleaner now. We can now move the macro to its own file. We can build a small library of handy macros to help us be productive and avoid solving the same problem several times.
Move the macro code into a file called macros.jasm and place it where main.jasm lies. We can now include the macros in main.jasm.
processor "6502"
include "macros.jasm"
section code, "main", $0801
{
basic_sys_line(2016, .start)
.start:
inc $d020
rts
}
The border color changing address isn't exactly self explanatory. The BASIC start address is also a naked constant that isn't exactly self explained. Let's make this a bit better.
processor "6502"
include "macros.jasm"
const BASIC_START = $0801
const BORDER_COLOR = $d020
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
inc BORDER_COLOR
rts
}
I use uppercase characters for fixed address constants (basically any naked constant) to make it easy to identify them. BASIC_START
and BORDER_COLOR
can now be used instead of the naked constants. Let's move the constants out into their own file as well. Call this c64.jasm since they describe constants specific to Commodore 64. We'll include this as well in the program.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
inc BORDER_COLOR
rts
}
Now, what if we wanted to port this to VIC20? We would only need to create a vic20.jasm file with different BASIC_START
and BORDER_COLOR
addresses and then include that instead of the c64.jasm file. We can also support both at the same time. Let's put this in the vic20.jasm file.
const BASIC_START = $1001
const BORDER_COLOR = $900f // this address controls both background and border colors
Now, what we need is a way to include either the c64.jasm or vic20.jasm file based on an option somewhere. Let's add the selection first.
processor "6502"
include "macros.jasm"
if (C64_BUILD) {
include "c64.jasm"
} else {
include "vic20.jasm"
}
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
inc BORDER_COLOR
rts
}
The if
statement wants a boolean expression within the parentheses and if true the first block of code is used, otherwise the second block is used. We can feed constants from the command line to solve this. The command line option is -d
and it needs to be followed by an assignment. In this case we want to assign C64_BUILD
to true
or false
.
jasm -d C64_BUILD=true main.jasm main.prg
jasm -d C64_BUILD=false main.jasm main.prg
Let's try a hello world example. We'll drop the VIC20 support to make the code shorter. We will define the string "hello world!" and print it, character by character. We have already seen how to define a string in memory in the BASIC line. Printing is done with a jump to $ffd2, which prints a single character. Let's add the following naked constant to the c64.jasm file.
const CHROUT = $ffd2
Now we'll add the loop to print the text.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #0
.loop:
lda hello_world_text,x
jsr CHROUT
inx
cpx #sizeof(hello_world_text)
bne .loop
rts
define byte[] hello_world_text = { "HELLO WORLD!" }
}
The define now has a name before the equal sign. This becomes a special kind of label. It can be used as a normal label but it also contains information about the defined data. sizeof
is a function that returns the size in bytes of such a labeled object or array.
This works but is hard to read. It isn't obvious where the loop starts and ends unless we read the instructions. Let's improve it using indentation.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #0
.loop:
lda hello_world_text,x
jsr CHROUT
inx
cpx #sizeof(hello_world_text)
bne .loop
rts
define byte[] hello_world_text = { "HELLO WORLD!" }
}
This is better but can be improved further. jAsm supports an automatic @loop
label at the beginning of a scope defined by curly braces.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #0
{
lda hello_world_text,x
jsr CHROUT
inx
cpx #sizeof(hello_world_text)
bne @loop
}
rts
define byte[] hello_world_text = { "HELLO WORLD!" }
}
It's now much easier to read the loop and we got rid of the explicitly defined label .loop
.
If we want to print more text we need to move the loop into a subroutine which can be called with a jsr instruction and some parameters in registers.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #>hello_world_text // high byte from address
lda #<hello_world_text // low byte from address
ldy #sizeof(hello_world_text)
jsr print_text
rts
define byte[] hello_world_text = { "HELLO WORLD!" }
// -> xa: address to text
// -> y: size of text
subroutine print_text
{
// self modifying code
sta .addr
stx .addr + 1
sty .size
ldx #0
{
const .addr = * + 1
lda $ffff,x // just a dummy address, it will be overwritten
jsr CHROUT
inx
const .size = * + 1
cpx #0 // just a dummy value, it will be overwritten
bne @loop
}
rts
}
}
*
in the subroutine represents the current program counter. * + 1
points one byte into the next instruction, which is where the instruction argument is. All is well, except that it doesn't assemble!
main.jasm(25,7) : Error 3004 : Reference to undefined symbol .addr
main.jasm(26,7) : Error 3004 : Reference to undefined symbol .addr
main.jasm(26,13) : Error 3000 : Operator + is not defined for left hand side unknown type.
main.jasm(27,7) : Error 3004 : Reference to undefined symbol .size
There is something wrong with .addr
and .size
. The reason is that local constants are not accessible outside the scope they are defined in. Local constants are always accessible inside the scope they are defined in, even in inner scopes. The scope is defined by the closest enclosing curly braces. So .addr
and .size
is accessible inside the loop but not outside.
To solve this we can declare the symbol names in the subroutine scope but define the constants inside the loop. This is the working subroutine.
processor "6502"
// -> xa: address to text
// -> y: size of text
subroutine print_text
{
// declaring constants
declare .addr
declare .size
// self modifying code
sta .addr
stx .addr + 1
sty .size
ldx #0
{
const .addr = * + 1
lda $ffff,x // just a dummy address, it will be overwritten
jsr CHROUT
inx
const .size = * + 1
cpx #0 // just a dummy value, it will be overwritten
bne @loop
}
rts
}
There is a more intuitive way to declare the .addr
and .size
addresses. Instruction data labels can point directly to the instruction argument by placing a label definition between the instruction and the argument.
processor "6502"
// -> xa: address to text
// -> y: size of text
subroutine print_text
{
// declaring constants
declare .addr
declare .size
// self modifying code
sta .addr
stx .addr + 1
sty .size
ldx #0
{
lda .addr: $ffff,x // just a dummy address, it will be overwritten
jsr CHROUT
inx
cpx .size: #0 // just a dummy value, it will be overwritten
bne @loop
}
rts
}
This subroutine can be reused so let's move it to its own file. Name a new file screen_io.jasm and paste the subroutine into it. Now we'll modify the main file to include this new file. Note that we now must include the file inside the section because otherwise generated code or data would lie outside any section and that isn't allowed. Only code sections can contain code or data. The other include files only contain constant definitions and macros and they don't directly produce any code or data themselves. That's why they can be outside a section.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #>hello_world_text
lda #<hello_world_text
ldy #sizeof(hello_world_text)
jsr print_text
rts
define byte[] hello_world_text = { "HELLO WORLD!" }
include "screen_io.jasm"
}
Self modifying code is handy and can improve efficiency but it doesn't work if the code is in a cartridge ROM, because it can't be modified. Let's try modifying the code to use the zero page instead. To do this we need to reserve some space for variables in the zero page area. This is done with a bss section. BSS stands for "Block Started by Symbol" and means a static memory block that is part of the program, but without its content stored in the executable file. The bss section doesn't generate any code or data, it just reserves uninitialized space. I reserved the last 5 bytes in the zero page area from $fb to, but not including, $100.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section bss, "zero page", $fb, $100
{
reserve word addr
}
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #>hello_world_text
lda #<hello_world_text
ldy #sizeof(hello_world_text)
jsr print_text
rts
define byte[] hello_world_text = { "HELLO WORLD!" }
include "screen_io.jasm"
}
The reserve statement can reserve one type or an array of them, just like the define statement. The difference is that a reserve statement can't put actual values into anything. Also, you must specify array sizes with a number between the brackets.
The addr
constant has no leading dot. This means that it is a global constant. It is accessible from anywhere in the program. Making it global is necessary since it doesn't exist in the same scope as the code that uses it.
Note that the bss section header has an extra value added after the start address. This is the end of the section. If the section grows beyond this value, an error is generated. This is an effective way to keep the variables under control.
Now we need to modify the print subroutine to not modify itself and instead use the allocated pointer.
// -> xa: address to text
// -> y: size of text
subroutine print_text
{
sta addr
stx addr + 1
tya
tax // size left in x
ldy #0 // pointer offset
{
lda (addr),y
jsr CHROUT
iny
dex
bne @loop
}
rts
}
It would also be nice to avoid having to specify the length of the string when printing it. The code became a bit kludgy when swapping registers. We can solve this by removing the need for the size argument. If we zero terminate the string we can get rid of it (or swap argument registers).
processor "6502"
include "macros.jasm"
include "c64.jasm"
section bss, "zero page", $fb, $100
{
reserve word addr
}
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #>hello_world_text
lda #<hello_world_text
jsr print_text
rts
define byte[] hello_world_text = { "HELLO WORLD!", 0 }
include "screen_io.jasm"
}
Now the zero is added. Let's update the subroutine.
// -> xa: address to text
subroutine print_text
{
sta addr
stx addr + 1
ldy #0 // pointer offset
{
lda (addr),y
beq @continue
jsr CHROUT
iny
bne @loop
}
rts
}
Now that looks better. The @continue
is another automatic label that is defined by the closest surrounding closing curly braces.
One thing that isn't really great is that it isn't obvious what addr is used for. It would be nice if it was connected to the print subroutine somehow. We can make that connection by creating partial sections in the screen_io.jasm file that adds to the sections in the main file. We do that by moving the reserve into the screen_io.jasm file. We also move the include outside the main section, because we can't define a partial section within a section.
This is what main.jasm looks like after the change.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section bss, "zero page", $fb, $100
{
}
section code, "main", BASIC_START
{
basic_sys_line(2016, .start)
.start:
ldx #>hello_world_text
lda #<hello_world_text
jsr print_text
rts
define byte[] hello_world_text = { "HELLO WORLD!", 0 }
}
include "screen_io.jasm"
The screen_io.jasm file now needs to define two section parts, one for the zero page reservation and one for the code.
section part, "zero page"
{
// temporary text address when printing
reserve word addr
}
section part, "main"
{
// -> xa: address to text
subroutine print_text
{
sta addr
stx addr + 1
ldy #0 // pointer offset
{
lda (addr),y
beq @continue
jsr CHROUT
iny
bne @loop
}
rts
}
}
We now have a kind of module with the print subroutine and its zero page variable. It can sit beside other potential modules in a larger program without overlapping. We don't have to specify a single address and it will be optimally packed together.
What if some other module also wants to have a temporary address called addr
? That could be a problem. One solution for this is to put the print related names in a namespace.
We'll enclose the contents of the screen_io.jasm file in a screen
namespace.
And the same for the screen_io.jasm file.
namespace screen
{
section part, "zero page"
{
// temporary text address when printing
reserve word addr
}
section part, "main"
{
// -> xa: address to text
subroutine print_text
{
sta addr
stx addr + 1
ldy #0 // pointer offset
{
lda (addr),y
beq @continue
jsr CHROUT
iny
bne @loop
}
rts
}
}
}
The reference to the print subroutine must now specify the namespace in one way or another. One way would be to explicitly type it in front of the print name like this:
jsr screen::print_text
If print_text
is used a lot in one place it is also possible to specify that a namespace should be used in a scope. As long as other names don't start to collide, this is just as good.
processor "6502"
include "macros.jasm"
include "c64.jasm"
section bss, "zero page", $fb, $100
{
}
section code, "main", BASIC_START
{
using namespace screen
basic_sys_line(2016, .start)
.start:
ldx #>hello_world_text
lda #<hello_world_text
jsr print_text
rts
define byte[] hello_world_text = { "HELLO WORLD!", 0 }
}
include "screen_io.jasm"
A namespace expose everything to the outside world. Sometimes that's what you want but it could also be nice to control the module's interface. This can be done using /modules/ instead of namespaces. In a module, all global variables are local to the module unless they are marked for export.
In our example, addr
doesn't need to be exposed outside, but the print subroutine must be.
module screen
{
section part, "zero page"
{
// temporary text address when printing
reserve word addr
}
section part, "main"
{
// -> xa: address to text
export subroutine print_text
{
sta addr
stx addr + 1
ldy #0 // pointer offset
{
lda (addr),y
beq @continue
jsr CHROUT
iny
bne @loop
}
rts
}
}
}
Accessing the print_text subroutine in the module is done exactly the same way it was accessed in the namespace so the rest of the program can be left unchanged.
jAsm can assist debugging in the VICE emulator by exporting the names of addresses for use in the emulator. Add --dump-vice-symbols and a filename to the command line arguments to export this information.
jasm --dump-vice-symbols main.vs main.jasm main.prg
Now, a symbol file will be created called main.vs
. Let's start the emulator (install it first if you don't have it) and use the file.
x64sc -moncommands main.vs -autostart main.prg
Hello world should be printed on the screen.
Start the monitor (alt-h in Linux) and type d 080d
.
(C:$e5d1) d 080d
.C:080d .sys_address:
.C:080d A2 08 LDX #$08
.C:080f A9 15 LDA #$15
.C:0811 20 22 08 JSR .print_text
.C:0814 60 RTS
.C:0815 .hello_world_text:
.C:0815 48 PHA
.C:0816 45 4C EOR $4C
.C:0818 4C 4F 20 JMP $204F
.C:081b 57 4F SRE $4F,X
.C:081d 52 JAM
.C:081e 4C 44 21 JMP $2144
.C:0821 00 BRK
.C:0822 .print_text:
.C:0822 85 FB STA $FB
.C:0824 86 FC STX $FC
.C:0826 A0 00 LDY #$00
.C:0828 B1 FB LDA (.addr),Y
.C:082a F0 06 BEQ $0832
.C:082c 20 D2 FF JSR $FFD2
.C:082f C8 INY
.C:0830 D0 F6 BNE $0828
.C:0832 60 RTS
.C:0833 00 BRK
.C:0834 00 BRK
.C:0835 00 BRK
(C:$0836)
You'll get a disassembled listing of the program and some of the labels are visible in the listing! The zero page addresses didn't get a name. That's a limitation in VICE so we can't help that. The CHROUT address didn't get a name either. How come? Well, the constant is only a number and not all numbers should be exported to VICE because those would act as addresses and it would get very confusing. There is a work-around for this. You can explicitly set a value to be an address like this.
const address BASIC_START = $0801
const address BORDER_COLOR = $d020
const address CHROUT = $ffd2
Change the c64.jasm file to this, assemble and restart the emulator.
.C:0822 .print_text:
.C:0822 85 FB STA $FB
.C:0824 86 FC STX $FC
.C:0826 A0 00 LDY #$00
.C:0828 B1 FB LDA (.addr),Y
.C:082a F0 06 BEQ $0832
.C:082c 20 D2 FF JSR .CHROUT
.C:082f C8 INY
.C:0830 D0 F6 BNE $0828
.C:0832 60 RTS
Problem solved!
To aid debugging you can set breakpoints in your program. This makes it easy to stop the program in a specific subroutine and single step through it. You do this by creating a label with a name that begins with breakpoint
. Let's try this. Add a label somewhere in the print_text subroutine, like this.
// -> xa: address to text
subroutine print_text
{
.breakpoint:
sta addr
stx addr + 1
ldy #0 // pointer offset
{
lda (addr),y
beq @continue
jsr CHROUT
iny
bne @loop
}
rts
}
The emulator stops almost immediately.
BREAK: 1 C:$0822 (Stop on exec)
#1 (Stop on exec 0822) 141 016
.C:0822 85 FB STA $FB - A:15 X:08 Y:00 SP:f4 ..-..... 3114547
(C:$0822)
You can step through the instructions with the z
command in the monitor.
There are two more types of breakpoints. A label beginning with read_breakpoint
will stop execution when that memory address is accessed to read. A label beginning with write_breakpoint
will stop the execution when that memory address is accessed to write.
Now you know the basics of jAsm and should be able to start experimenting yourself. The language has more to offer and the complete syntax is described in the reference section. Good luck!
You need to fetch the source code from SourceHut to get started. If you have a command line Mercurial client you can clone the repository like this.
hg clone https://hg.sr.ht/~bjonte/jasm
jAsm compiles using CMake and Clang.
To build with CMake you need CMake 3.5, Clang, Mercurial and python3 installed. On Debian, Ubuntu or Mint systems you can use apt-get to fetch the dependencies like this.
sudo apt-get install cmake clang mercurial python3
Clone the repository into a directory called 'jasm' and build it like this.
hg clone https://hg.sr.ht/~bjonte/jasm
cd jasm
export CXX=/usr/bin/clang++
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
sudo make install
If you want to cross compile binaries for Windows you need to install MingW.
sudo apt-get install mingw-w64
Cross compile like this.
hg clone https://hg.sr.ht/~bjonte/jasm
cd jasm
mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=../win64_cross_compile_toolchain.txt -DCMAKE_BUILD_TYPE=Release ..
make
You will find the binaries in build/jasm. You will also need the MingW dynamic link libraries found here in Linux Mint.
/usr/lib/gcc/x86_64-w64-mingw32/7.3-win32/libgcc_s_seh-1.dll
/usr/lib/gcc/x86_64-w64-mingw32/7.3-win32/libstdc++-6.dll
jAsm is a command line tool. It will print its arguments if started without any. Basically it needs an input file and an output file.
jasm input.jasm output.bin
There are some flags to tweak how the assembler behaves.
When working with several memory banks it is handy to place them after each other in memory. That way it is possible to check which bank code or data belongs to just looking at the address. For example, cartridge bank 0 could be located at $08000-$0a000 and bank 1 at $18000-$1a000. However, jAsm will generate an error when trying to reference bank 1 in data definitions or instructions because the addresses exceeds 16 bits. This can be overridden with the --bank-mode
flag, which automatically truncates long addresses.
jasm --bank-mode input.jasm output.bin
A shortcut alternative is -bm
.
This also have implications on the high byte unary operator (>
). Without bank mode '>addr
' will mean the same as 'addr>>8
' but with the bank mode enabled this will be '(addr>>8)&$ff
' to make sure the result is an eight bit value suitable for instructions taking a byte argument.
You can instruct the assembler to create some initial constants that can be accessed in the source code with the --define
flag.
jasm --define INFINITE_LIVES=true --define STARTING_LIVES=3 input.jasm output.bin
jasm --define DEFAULT_NAME=bobo input.jasm output.bin
You can feed it with integers, booleans and strings, like in the examples above.
A shortcut alternative is -d
.
The constants and variables in the assembled program can be written to text files in these formats.
Dump jAsm symbols like this.
jasm --dump-symbols symbols.txt input.jasm output.bin
A shortcut alternative is -ds
.
Dump VICE symbols like this.
jasm --dump-vice-symbols symbols.vs input.jasm output.bin
A shortcut alternative is -dv
.
Dump No$GBA symbols like this.
jasm --dump-gba-symbols symbols.sym input.jasm output.bin
A shortcut alternative is -dg
.
The assembled program can be written as a hex file interleaved with embedded source lines that produced the output to help understanding what the assembler produced.
Write hex output like this.
jasm --dump-hex hex_output.txt input.jasm output.bin
A shortcut alternative is -dh
.
The file will output all source lines that generate data. The first column is the program counter, then up to four columns of binary data. This is followed by a line number and then the source code that produced the generated data.
./source/main_loop.jasm
--------------------------------------------------------------------------------
0400: 20 17 04 7: jsr setup_cpu
8:
0403: 20 46 04 9: jsr blank_screen
10:
0406: 20 00 1f 11: jsr mmu::setup
0409: 20 6b 04 12: jsr init_reset_vector
When the source file changes, the file name and a line with dashes will be added. In case there is a longer jump in line numbers or a jump backwards, a partially dashed line is printed.
046b: ad 06 d5 51: lda MMURCR
046e: 48 52: pha
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
046f: ad 06 d5 67: lda MMURCR
0472: 29 f7 68: and #~MMURCR_COMMON_TOP
0474: 09 04 69: ora #MMURCR_COMMON_BOTTOM
0476: 8d 06 d5 70: sta MMURCR
By default, jAsm outputs only the binary data without any header. To generate a program file for Commodore 64 that can be loaded from BASIC, a two byte header must be added containing the load address in little endian format. You can add this header using --header-little-endian-address
.
jasm --header-little-endian-address input.jasm output.prg
A shortcut alternative is -hla
.
You can add include paths using the --include-dir
flag. jAsm will look in these for included files.
jasm --include-dir some/dir --include-dir other/dir input.jasm output.bin
A shortcut alternative is -i
.
With the --max-errors
flag, you can specify the number of errors that will be printed before jAsm stops assembling.
jasm --max-errors 4 input.jasm output.bin
A shortcut alternative is -me
.
The default output mode will merge all code sections into one big binary and pad the inbetween space with zero. With the flag --output-multiple-files
, this can be changed to store one file per section instead. Each file will be named after the output file but add the section name before the file extension.
jasm --output-multiple-files input.jasm output.bin
A shortcut alternative is -om
.
You can choose to have jAsm name the files after the sections by not specifying an output file name.
jasm --output-multiple-files input.jasm
You may want to add an extension to the section names when using them as file names. Use the option --file-extension
to do that.
jasm --output-multiple-files --file-extension prg input.jasm
A shortcut alternative is -ext
.
You can set the default processor to use when assembling the source code using the option --processor
. If you do this you won't need to specify the processor in the source code, unless you need to switch it.
jasm --processor 6502 input.jasm output.bin
or
jasm --processor z80 input.jasm output.bin
A shortcut alternative is -p
.
You can enable a number of extra instructions to simplify programming using the option --pseudo-instructions
. The result differs depending on the processor.
A shortcut alternative is -pi
.
These are the pseudo instructions for 6502.
bhs addr // branch if higher or same
blt addr // branch if lower
These are equivalent to bcs
and bcc
, respectively.
These are the pseudo instructions for 65C02.
bhs addr // branch if higher or same
blt addr // branch if lower
These are equivalent to bcs
and bcc
, respectively.
dea // decrement A register
ina // increment A register
These are equivalent to the implied mode dec
and inc
, respectively.
These are the pseudo instructions for 65CE02.
bhs addr // branch if higher or same
blt addr // branch if lower
These are equivalent to bcs
and bcc
, respectively.
dea // decrement A register
ina // increment A register
These are equivalent to the implied mode dec
and inc
, respectively.
bra label // branch unconditionally
This is equivalent to the bru
instruction.
These are the pseudo instructions for 45GS02.
bhs addr // branch if higher or same
blt addr // branch if lower
These are equivalent to bcs
and bcc
, respectively.
dea // decrement A register
ina // increment A register
These are equivalent to the implied mode dec
and inc
, respectively.
bra label // branch unconditionally
This is equivalent to the bru
instruction.
These are the pseudo instructions for Z80.
ld bc,de
ld bc,hl
ld de,bc
ld de,hl
ld hl,bc
ld hl,de
They are implemented using two instructions under the hood. First the high register part is loaded and then the low.
jAsm supports several levels of output during assembly. This is controlled by the -v0
, -v1
, -v2
and -v3
flags.
jasm -v2 input.jasm output.bin
Flag | Meaning |
---|---|
-v0 | Show errors |
-v1 | Show errors and warnings |
-v2 | Show errors, warnings, printouts and general information |
-v3 | Show errors, warnings, general information and debugging information |
jAsm returns with return code 0 for success and non-zero if an error occurred.
This section documents the entire syntax. Have a look at the starter guide first to get a grasp of the basics before digging into this.
jAsm uses Unicode utf-8 encoded text files only. If you provide something that can't be interpreted as utf-8, an error will be returned.
To assemble instructions jAsm needs to know what processor to target. This is done by either specifying the processor using command line flags or by a keyword in the source code. Specify the processor in a source file like this.
processor "6502"
After this statement, the assembler can handle 6502 processor instructions. You can switch processor in a source file several times.
processor "6502"
rts
processor "z80"
ret
It is also possible to momentarily change the processor and switch back to whatever it was before. The processor pop
statement is used to change back to the previously set processor.
processor "6502"
rts
processor "z80"
ret
processor pop
rts
processor pop
Included files inherit the processor from the file with the include statement but the processor set in the included file won't affect the file where the include statement is.
Suppose we have a file named test.jasm:
// processor 6502 inherited from main.jasm
rts
processor "z80"
// processor is now set to z80
ret
and a file named main.jasm:
processor "6502"
// processor is now set to 6502
include "test.jasm"
// processor is still 6502
lda #0
When including test.jasm
, the rts
instruction is assembled using 6502 because it was inherited from main.jasm
. The ret
instruction is assembled as z80 since the processor was changed in the included file before the instruction. After the included file the processor is 6502 since the included file won't affect the file it is included from.
jAsm supports C style single line comments. They span the rest of the line.
lda #0 // this is a comment
rts
Multiline comments are also supported with the same syntax as C. They are started with /*
and ended with */
.
lda #0 /* this
is a
comment */
A notable difference from C is that jAsm supports nested comments.
/* this is
/* also */
a comment */
This documentation doesn't cover the actual instructions, their meaning and so on. You will have to find that elsewhere.
The assembler instructions are entered using lowercase letters. All standard opcodes and addressing modes are supported. Examples:
lda #0
tay
sta $d020
ldx #5
lda ($fb),y
sta $9000,x
jmp $8000
Some assemblers use brackets as expression parentheses to avoid colliding with the indirect addressing modes but jAsm uses normal parentheses for both.
lda ($fb + 1) + 1,y // not indirect addressing
lda ($fb + 1 + 1),y // indirect addressing
jmp ($1fff + 1)*2 // not indirect addressing
jmp (($1fff + 1)*2) // indirect addressing
Instructions end with a new line or a semicolon. You can stack together instructions on one line like this.
inx; inx; inx;
Named constants can be defined in the source code to replace naked constants. This is encouraged as much as possible since it makes the source code much more readable. Two types exist. 1) Labels set to the current program counter and 2) constant declarations. Labels are names followed by a colon.
setup:
The label will contain the address to the next byte in memory. A label can also be placed between an instruction and its argument to be assigned the address to the instruction argument.
lda value: #0
inc value // increment what is loaded by the previous instruction next time it is executed
On Z80, there are at most two arguments so in some cases two labels can be defined to instruction arguments. Note how the label is placed when using indirect addressing modes. It has to be placed before the parenthesis.
ld hl, index
ld bc, data
ld index:(ix+0), data:0
inc (hl) // increase index in instruction above
Instruction labels become Memory Storage Types and it means they have a defined size and the low and high parts of word addresses can be accessed directly through the lo and hi properties.
lda value:$ffff
inc value.lo // the low part of the previous instruction's argument
Constant declarations look like this.
const NUM_LIVES = 5 // constant declaration
With this defined we can use them to produce more readable assembler code.
lda #NUM_LIVES
jsr setup
rts
Sometimes it is convenient to define constants that are exported as addresses. This is done using the address keyword.
const address SCREEN = $0400
Address constants can be exported to the VICE emulator via a command line option. Labels are automatically marked as addresses since they point into the source code address space but constants do not by default.
Constants and labels can be defined and used in any order, unlike C where everything needs to be declared before use.
A constant cannot change its value throughout the source.
const A = 4
A = 3 // Error 3017 : Cannot reassign constant.
To do that you need a variable.
var A = 4
A = 3 // it works!
Since the value is dependent on the parsing order (top to bottom), variables must be declared before they are used.
lda #A // Error 3004 : Reference to undefined symbol A
var A = 3
There are local and global symbols (constants, labels and variables). Local symbols always start with a period character.
const A = 3 // global variable
const .A = 4 // local variable
Symbols cannot be used outside their scope. Global symbols have the entire source code as their scope. Local symbols can only be used inside the closest outer scope, which is delimited by curly braces.
{
const .A = 1 // a local .a
{
lda #.A // this will be 2
const .A = 2 // another local .a
lda #.A // this will be 2
}
lda #.A // this will be 1
}
lda #.A // Error 3004 : Reference to undefined symbol .A
The program counter is always represented by an asterisk (*). This can be used to refer to things relative the current address.
lda #0
inc *-1 // increment the value 0 in the previous instruction
Every scope has automatic local variables generated to simplify loop constructs in the code without the need to make up names for the loop labels. At the start of the scope @loop
is created and at the end @continue
is created. This makes it possible to loop and exit the loop without explicitly creating labels.
{
lda text,x
beq @continue // break out if zero is found
jsr CHROUT
inx
bne @loop // loop back to the scope start
}
rts
Constant and variable names can be constructed by an expression if the dynamic
keyword is used when defining the symbols.
const dynamic "begin" + "end" = 100
lda #beginend // this will be 100
This can be used together with the symbol
function to create and access dynamically generated symbols.
repeat 3
{
const .addr = symbol("data" + string(@i))
lda #<.addr
ldx #>.addr
jsr print_text
}
rts
define byte[] data0 = { "one", 0 }
define byte[] data1 = { "next", 0 }
define byte[] data2 = { "last", 0 }
Global symbols have a tendency to collide name-wise with each other in large programs, especially if there are several people working on the same code. jAsm supports namespaces to reduce these problems. For example, setup
is a common name and different systems may provide their own setup
. If the symbols are placed in their own namespaces they can coexist.
namespace random
{
setup:
// initialize randomizer
rts
}
namespace raster
{
setup:
// initialize raster system
rts
}
Outside the namespaces you need to specify which setup
you are referring to. This is done like this.
jsr random::setup
jsr raster::setup
Global symbols are fetched relative to the current namespaces. Sometimes you need to use absolute namespace references to resolve ambiguity. This starts with ::
.
const A = 1
namespace aa
{
const A = 2
}
namespace bb
{
const A = 3
lda #::A // this is 1
}
Namespaces can be nested to form deeper namespaces.
namespace aa
{
namespace bb
{
}
}
You can declare that you will be using a namespace in a scope to avoid having to specify it every time you reference it.
namespace system_a
{
const A = 0
}
namespace system_b
{
using namespace ::system_a
const B = A // this is ok because we are now using namespace system_a as well
}
Note that this doesn't resolve ambiguity so, in some cases, you may still need to specify the absolute namespace for symbols.
Namespaces encapsulates global symbols in its own space but expose all of them outside the namespace. Modules exist to solve the problem of exposing too much.
All global symbols are by default private to the module. The keyword export
is used to make symbols accessible from outside the module.
module system
{
const A = 0
export const B = 1
}
const C = system::A // Error 3004: Reference to undefined symbol A
const D = system::B // this is ok because B has been exported
Place the export
keyword before a statement that creates a symbol to export it.
export subroutine func
{
export lbl:
export define byte number = 3
export macro exit()
{
rts
}
}
A module can also declare that it needs values from the user of the module. This is done with the import
keyword at the beginning of the module definition.
module system
import color
{
subroutine init
{
lda #color
sta $d020
rts
}
}
The value system::color
must be assigned somewhere once.
const BLACK = 0
system::color = BLACK
Import several variables like this.
module system
import color1, color2
{
}
Local symbols cannot be accessed outside their scope and sometimes they need to be defined in an inner scope. This is common when using self modifying code. In the following example, it isn't possible to access .char
outside the loop.
const address SCREEN = $0400
lda #0
sta .char // Error 3004 : Reference to undefined symbol .char
ldx #NUM_ELEMENTS - 1
{
lda .char: #0 // address to the value to be loaded into the accumulator
sta SCREEN,x
inc .char
dex
bpl @loop // loops back to enclosing scope beginning
}
This problem can be solved by declaring the symbol .char
in the outermost scope where it needs to be accessed.
const address SCREEN = $0400
declare .char // declaring .char to be used in this scope
lda #0
sta .char // works!
ldx #NUM_ELEMENTS - 1
{
lda .char: #0 // define the value of the declared symbol
sta SCREEN,x
inc .char
dex
bpl @loop
}
This technique can be used with any type of local symbol to move its scope.
Expressions in jAsm are similar to expressions in C. They can contain assignments and assignments return the value assigned. This has the side effect that you can do multiple assignments.
var aa = 0
var bb = 1
aa = bb = 2
Normal parentheses are always used in expressions, not brackets.
jAsm supports a number of operators, similar to C but not exactly the same. The operators are, in the order of precedence:
Operator | Type | Types | Example |
---|---|---|---|
() | call operator | macro, subroutine | mac(aa,bb) |
[] | array indexing operator | string, list | aa[3] |
. | property operator | string, list, dict | aa.length |
++ | postfix increment | integer | aa++ |
-- | postfix decrement | integer | aa-- |
++ | prefix increment | integer | ++aa |
-- | prefix decrement | integer | --aa |
! | boolean not | boolean | !aa |
~ | bitwise not | integer | ~aa |
+ | unary addition | integer, float | +3 |
- | unary subtraction | integer, float | -3 |
< | unary low byte | integer | <aa |
> | unary high byte | integer | >aa |
* | multiplication | integer, float | aa * bb |
/ | division | integer, float | aa / bb |
+ | addition | integer, float, string, list | aa + bb |
- | subtraction | integer, float | aa - bb |
<< | logical left shift | integer | aa << 1 |
>> | logical right shift | integer | aa >> 1 |
>>> | arithmetic right shift | integer | aa >>> 1 |
& | bitwise and | integer | aa & bb |
^ | bitwise exclusive or | integer | ab ^ bb |
| | bitwise or | integer | aa | bb |
< | less than comparison | integer, float, string | aa < bb |
> | greater than comparison | integer, float, string | aa > bb |
<= | less or equal comparison | integer, float, string | aa <= bb |
>= | greater or equal comparison | integer, float, string | aa >= bb |
== | equal comparison | boolean, integer, float, string | aa == bb |
!= | not equal comparison | boolean, integer, float, string | aa != bb |
&& | boolean and | boolean | aa && bb |
|| | boolean or | boolean | aa || bb |
= | assignment | all | aa = 1 |
+= | add and assign | integer, float, string, list | aa += 1 |
-= | subtract and assign | integer, float | aa -= 1 |
*= | multiply and assign | integer, float | aa *= 2 |
/= | divide and assign | integer, float | aa /= 2 |
&&= | boolean and, and assign | boolean | aa &&= bb |
||= | boolean or, and assign | boolean | aa ||= bb |
&= | bitwise and, and assign | integer | aa &= bb |
|= | bitwise or, and assign | integer | aa |= bb |
^= | bitwise exclusive or, and assign | integer | aa ^= bb |
<<= | logical left shift, and assign | integer | aa <<= 1 |
>>= | logical right shift, and assign | integer | aa >>= 1 |
>>>= | arithmetic right shift, and assign | integer | aa >>>= 1 |
Statements are blocks of code that control the generation of instructions or change the assembler state. Statements can optionally be separated by a semicolon, just like in C. Newline characters only matter in instructions where it is impossible for the assembler to know if some instructions (like rol) have an address following or not. In all other cases newlines are completely ignored. The following is valid in jAsm.
const
aa
=
1
;
const bb = 2 const cc = 3
In some cases you may be confused by the greedy parser which tries to include as much as possible in the current statement. Look at this.
var aa = 0
var bb = 1 + aa
++aa // Error 3044 : Expression must have side effect.
The parser tries to include as much as possible in the variable declaration for b. The ++ operator is applied to the a in the second line which leaves a single a in the third line. The assembler tries to be helpful by pointing out that the result is meaningless. This case needs to be resolved by a semicolon to separate the statements.
var aa = 0
var bb = 1 + aa;
++aa // ok!
jAsm has a couple of built in data types.
Booleans can only be either true
or false
. Comparison operators return boolean values. They are well suited for conditional assembly.
const USE_DEBUG_OUTPUT = true
Integer numbers are 32 bit signed numbers in the range [-2147483648, 2147483647].
const NUMBER = 123
Integer numbers can be expressed in several formats. All the following constants will evaluate to decimal number 31.
const DEC_NUMBER = 31 // decimal number with base 10
const HEX_NUMBER = $1f // hexadecimal number with base 16
const BIN_NUMBER = %11111 // binary number with base 2
Floating point numbers are 64 bit signed numbers with decimal points. They can represent large numbers and precision increases closer to zero. The largest number is roughly 10308. The smallest number is roughly 10-308.
const NUMBER1 = 123.0
const NUMBER2 = 0.0
const NUMBER3 = -1e-50
Strings are quoted text. The characters are stored as wide characters (32 bits in Linux and 16 bits in Windows), leaving you a large selection of characters to choose from. This is why utf-8 is used as the file format for source code.
const STRING = "Hello"
There are a number of special characters that can be encoded in strings using a special backslash syntax. Whenever a backslash is encountered in a string the next character is checked to see if a special character should be used.
Code | Result |
---|---|
\\ |
The backslash character itself |
\t |
The horizontal tab character (9) |
\n |
The newline character (10) |
\r |
The carriage return character (13) |
\0 |
The null character (0) |
\' |
The single quote character |
\" |
The double quote character |
The backslash has the same special meaning when specifying a single character as it has in strings. This can be used to specify the single quote character itself for example.
const QUOTE_CHAR = '\''
The following operators and methods are supported by the string type.
Function | Argument types | Description | Examples |
---|---|---|---|
+ |
string | Returns left and right side strings concatenated. | const a = "Commodore" + "64" // "Commodore64" |
[index] |
integer | Returns the character at (zero based) index . |
const a = "Commodore64" lda #a[1] // 'o' |
length |
Returns the length of the string. | const a = "Commodore64" lda #a.length // 11 |
|
substring(start, length) |
integer | Returns the part of the string starting at (zero based) start and spanning length characters. The range can be partly outside the string and the result will be the union of the string and the range. |
const a = "Commodore64" const b = a.substring(3, 4) // "modo" |
To support different platform's character sets there is a string()
function that is used to convert unicode strings, which is the default string type, to other character sets.
const PET_HELLO = string("Hello", "pet", "lowercase")
The function takes a number of arguments. First the string to convert, then a number of conversion properties which specifies the format, subformat, locale and flags for the conversion, in any order.
The following format properties are supported:
Format | Comment |
---|---|
ascii7 | 7 bit ascii format. |
pet | The character set used in Commodore PET models after PET 2001. |
pet2001 | The character set used in the Commodore PET 2001 model. |
vic20 | The character set used in the Commodore VIC 20. |
c16 | The character set used in the Commodore 16. |
plus4 | The character set used in the Commodore Plus/4. |
c64 | The character set used in the Commodore 64. |
c128 | The character set used in the Commodore 128. |
zx80 | Sinclair specific character set. |
zx81 | Sinclair specific character set. |
The following optional subformats are supported.
Subformat | Supported formats | Comment |
---|---|---|
uppercase | All | This is the default subformat. |
lowercase | pet2001, pet, vic20, c16, plus4, c64, c128 | The character set with both lower and uppercase characters. |
uppercase_screen | pet2001, pet, vic20, c16, plus4, c64, c128 | The character set as screen codes. |
lowercase_screen | pet2001, pet, vic20, c16, plus4, c64, c128 | The character set with both lower and uppercase characters as screen codes. |
The following optional locale properties are supported.
Locale | Supported formats | Comment |
---|---|---|
english | All | This is the default locale. |
The following optional flag properties are supported.
Flag | Supported formats | Comment |
---|---|---|
high_bit_term | All | The last character in the string is modified to have bit 7 set. This is sometimes used as a cheap terminator for the string. |
There are a couple of functions specific to operating on string data or characters.
Function | Argument types | Description | Examples |
---|---|---|---|
uppercase(text [, locale]) |
string|integer, string | Returns an uppercase version of the string or character sent as the first argument. | uppercase("Commodore") // "COMMODORE" uppercase("Cåmmodåre", "swedish") // "CÅMMODÅRE" uppercase('a') // 65 |
lowercase(text [, locale]) |
string|integer, string | Returns a lowercase version of the string or character sent as the first argument. | lowercase("Commodore") // "commodore" lowercase("ABCÅÄÖ", "swedish") // "abcåäö" lowercase('A') // 97 |
When specifying a locale string in the string functions, these are the currently supported locales.
Locale | Comment |
---|---|
default | This is the default locale. It uses the default C locale. It doesn't handle any characters other than A-Z. |
english | This is the US English locale. This is much like the C locale. |
swedish | This is the Swedish locale. Supports the åäö characters. |
The list type can hold a collection of values with different types. A list is created using the list
function. This constructs a list containing the arguments.
const PRIMES = list(1, 2, 3, 5, 7, 11)
Function | Argument types | Description | Examples |
---|---|---|---|
+ |
list | Concatenates two lists. | const aa = list(1, 2) const bb = list(3, 4) aa + bb // [1, 2, 3, 4] |
+= |
list | Concatenates two lists. | var aa = list(1, 2) aa += list(3, 4) aa // [1, 2, 3, 4] |
[index] |
integer | Returns the item at (zero based) index in the list. |
const aa = list(5, 6, 7) aa[1] // 6 |
push(x) |
any | Adds x to the end of the list and returns the list. |
var aa = list(1, 2, 3) aa.push(4) // [1, 2, 3, 4] |
pop() |
Removes the last element in the list and returns the list. | var aa = list(1, 2, 3) aa.pop() // [1, 2] |
|
insert(position, value) |
integer, any | Inserts value at zero based index position and returns the list. |
var aa = list(1, 2, 3) aa.insert(1, 99) // [1, 99, 2, 3] |
erase(position) erase(position, length) |
integer, integer | Erase the part of the list defined by position and the optional length argument. Specifying only the position will erase one element. The list is returned. The range can be partly outside the list. |
var aa = list(1, 2, 3, 4) aa.erase(1, 2) // [1, 4] aa.erase(0) // [4] |
keep(position) keep(position, length) |
integer, integer | Erase everything except the part of the list defined by position and the optional length argument. Specifying only the position will keep one element. The list is returned. The range can be partly outside the list. |
var aa = list(1, 2, 3, 4) aa.keep(1, 2) // [2, 3] aa.keep(0) // [2] |
clear() |
Clears the list and returns it. | var aa = list(1, 2, 3) aa.clear() // [] |
|
sort(before) |
macro | Sort the elements in the list according to an item ordering macro that takes two arguments and returns true if the first argument should be before the second. The list is returned. | const .less = macro(.a, .b) { return .a < .b } var aa = list(8, 4, 5, 1) aa.sort(.less) // [1, 4, 5, 8] |
empty |
Returns true if the list is empty, otherwise false . |
list(2, 4, 8).empty // false list().empty // true |
|
length |
Returns the number of elements in the list. | const aa = list(2, 4, 8) lda #aa.length // 3 |
The dictionary type can hold a collection of values with different types. A dictionary is created using the dict
function. This constructs a dictionary containing key and value pairs as arguments.
const FRUITS = dict("apples" = 2, "bananas" = 5)
Values can be of any type, but keys must be booleans, integers or strings.
set(key, value) |
boolean|integer|string, any | Adds value to be accessible by key in the dict. |
var aa = dict() aa.set("hi", 0) |
get(key) |
boolean|integer|string | Fetches the value for a specific key. | var aa = dict("a" = 0, "b" = 1) aa.get("b") // 1 |
erase(key) |
boolean|integer|string | Removes a value with the specified key from the dict. | var aa = dict("a" = 0, "b" = 1) aa.erase("b") |
clear() |
Remove all keys and values from the dict. | var aa = dict("a" = 0, "b" = 1) aa.clear() |
|
has(key) |
boolean|integer|string | Return true if the key exists in the dict, otherwise false . |
var aa = dict("a" = 0, "b" = 1) aa.has("a") // true aa.has("c") // false |
empty |
Returns true if the dict is empty, otherwise false . |
dict("one" = 1).empty // false dict().empty // true |
|
length |
Returns the number of elements in the dict. | const aa = dict("one" = 1, "two" = 2) lda #aa.length // 2 |
Values are always passed by value, never by reference. Everytime you assign a value to some other variable, a copy is made. This makes it possible to assign constants to variables and variables to constants without problems. Values passed as macro arguments will be copied before executing the macro body as well.
const aa = list(1, 2, 3)
var bb = aa
bb.pop()
print("{} {}\n", aa, bb) // [1, 2, 3] [1, 2]
There is no ghosting or pointers that can mess up the data unexpectedly.
There are a number of functions in the root namespace dedicated to converting between the built-in types.
Function | Accepted input types | Description | Examples |
---|---|---|---|
int(value) |
numeric | Strips decimal part from a value and converts it to an integer. |
int(5) // 5 int(5.8) // 5 |
float(value) |
numeric | Converts value into a floating point value. |
float(5) // 5.0 float(5.5) // 5.5 |
string(value [, property, ...]) |
numeric|string, string, ... | Converts value to a string according to a specific character set. See String Conversions for details about properties. |
string("Hello", "petscii", "lowercase") string(123, "zx81") |
hexstring(value) |
int | Converts value into a readable hexadecimal string. |
hexstring(100) // "64" |
unicode(value) |
int | Converts unicode codepoint value to a string. |
unicode(65) // "A" |
There are specific byte
, word
and long
types for memory storage. They are used when reserving or defining data to include in the assembler program.
define byte = 5
The memory storage data types store negative values as signed values and positive values as unsigned.
byte
is an 8 bit data typeword
is a 16 bit data typelong
is a 32 bit data typeThe word storage type has a pair of properties to address the first and second byte in the word. These are lo
and hi
.
define word number = 5
const high_addr = number.hi
const low_addr = number.lo
These properties will take number
which points to the 5 i memory and return the offset to the high and low byte of the word respectively.
There are a number of functions in the root namespace dedicated to converting between the memory storage type ranges.
Function | Accepted input types | Description | Examples |
---|---|---|---|
byte(value) |
numeric | Returns value truncated to integer and with the number of bits reduced to 8. |
byte(257) // 1 byte(-1) // 255 |
word(value) |
numeric | Returns value without decimal part and with the number of bits reduced to 16. |
word(128.3) // 128 |
long(value) |
numeric | Returns value without decimal part and with the number of bits reduced to 32. |
long(1000000) // 1000000 |
jAsm provides a number of mathematical functions in the root namespace.
Function | Accepted input types | Description | Examples |
---|---|---|---|
abs(x) |
numeric | Returns the absolute part of x . |
abs(-10) // 10 |
acos(x) |
numeric | Returns arc cosine of x in radians. |
acos(-1) // 3.1415926536 |
asin(x) |
numeric | Returns arc sine of x in radians. |
asin(-1) // -1.5707963268 |
atan(x) |
numeric | Returns arc tangent of x in radians. |
atan(1) // 0.7853981634 |
atan2(y, x) |
numeric, numeric | Returns arc tangent of y /x in radians. |
atan2(1, 1) // 0.7853981634 |
ceil(x) |
numeric | Returns x after rounding it up to the closest integer. |
ceil(0.1) // 1.0 ceil(-0.1) // 0.0 |
clamp(t, a, b) |
numeric, numeric, numeric | Returns t clamped to the range [a ..b ]. |
clamp(0.1, 1.0, 2.0) // 1.0 clamp(5, 0, 10) // 5 |
cos(x) |
numeric | Returns cosine of x radians. |
cos(PI) // -1.0 |
cosh(x) |
numeric | Returns hyperbolic cosine of x radians. |
cosh(PI) // 11.591953344 |
degrees(x) |
numeric | Returns radian angle x in degrees. |
degrees(PI) // 180.0 |
exp(x) |
numeric | Returns ex . |
exp(1) // 2.7182818285 |
floor(x) |
numeric | Returns x after rounding it down to the closest integer. |
floor(0.9) // 0.0 floor(-0.9) // -1.0 |
lerp(t, a, b) |
numeric, numeric, numeric | Linearly interpolate a value between [a ..b ] using t [0..1] where 0 returns a and 1 returns b . t can also be outside the [0..1] range. |
lerp(0.5, 0.0, 10.0) // 5.0 lerp(-1, 0, 10) // -10 |
log(x) |
numeric | Returns the natural logarithm of x . |
log(10) // 2.302585093 |
log10(x) |
numeric | Returns the base-10 logarithm of x . |
log10(100) // 2.0 |
logn(x, n) |
numeric, numeric | Returns the base-n logarithm of x . |
logn(243, 3) // 5.0 |
max(a, ...) |
numeric | Returns the largest of the arguments. | max(2, 4) // 4 max(2, 4.0) // 4.0 |
max(a) |
list | Returns the largest of the numeric elements in the list. | max(list(2, 4)) // 4 max(list(2, 4.0)) // 4.0 |
min(a, ...) |
numeric | Returns the smallest of the arguments. | min(2, 4) // 2 min(2, 4.0) // 2 |
min(a) |
list | Returns the smallest of the numeric elements in the list. | min(list(2, 4)) // 2 min(list(2, 4.0)) // 2 |
modulo(a, b) |
integer, integer | Returns the remainder from the Euclidean division a/b. | modulo(7, 3) // 1 modulo(-7, 3) // 2 |
pow(a, b) |
numeric | Returns ab . |
pow(2, 4) // 16.0 |
radians(x) |
numeric | Returns angle x in radians. |
radians(90) // 1.570796327 |
remainder(a, b) |
integer, integer | Returns the remainder from the floored division a/b. This is commonly what the % operator does in C. | remainder(7, 3) // 1 remainder(-7, 3) // -1 |
round(x) |
numeric | Returns x after rounding it to the closest integer. |
round(0.9) // 1.0 round(0.1) // 0.0 |
sin(x) |
numeric | Returns sine of x radians. |
sin(PI) // 0.0 |
sinh(x) |
numeric | Returns hyperbolic sine of x radians. |
sinh(PI) // 11.548739357 |
sqrt(x) |
numeric | Returns the square root of x . |
sqrt(16) // 4.0 |
tan(x) |
numeric | Returns the tangent of an angle of x radians. |
tan(1.0) // 0.5493061444 |
tanh(x) |
numeric | Returns the hyperbolic tangent of and angle of x radians. |
tanh(1) // 0.761594156 |
jAsm has a couple of predefined constants in the root namespace.
Constant | Value |
---|---|
E |
2.718281828459045 |
PI |
3.141592653589793 |
A print
function exists to output text when assembling. This can be useful when you want to know about locations or calculations made in the assembled code.
Note that in order to see the output you need to use at least verbose level -v2
. See Verboseness for more information.
There are two functions dedicated to formatting and printing, format
and print
. Both use the same arguments but format
returns the result and print
outputs it. Let's take format
as the example when looking at the arguments.
const WIDTH = 40
format("width: {}", WIDTH) // returns "width: 40"
The first argument is the format string that describes the output format. Each pair of curly brackets in the format string inserts the next argument to the function, as a string, where the brackets are.
const WIDTH = 40
const HEIGHT = 25
format("width: {}, height: {}", WIDTH, HEIGHT) // returns "width: 40, height: 25"
It is possible to control the alignment of the injected text using a format specifier inside the curly brackets.
const WIDTH = 40
format("width: {L4}", WIDTH) // returns "width: 40 "
format("width: {R4}", WIDTH) // returns "width: 40"
When formatting integers you can control the minimum number of digits used.
const WIDTH = 40
format("width: {D4}", WIDTH) // returns "width: 0040"
Integers can also be formatted as hexadecimal numbers.
const WIDTH = 40
format("width: {X4}", WIDTH) // returns "width: 0028"
Floating point numbers will by default be displayed as a short representation of either fixed-point or scientific notation.
format("{}", 0.001) // returns "0.001"
format("{}", 0.0000001) // returns "1e-07"
It is possible to force fixed-point to be used with a specific number of decimal digits.
format("{F4}", 0.0000001) // returns "0.0000"
format("{F4}", 1.23) // returns "1.2300"
format("{F4}", 10) // returns "10.0000"
Alignment and number formatting specifiers can be combined.
format("{R8F4}", 1.23) // returns " 1.2300"
To print an opening curly bracket, prefix it with a backslash.
format("\{{}}", 1.23) // returns "{1.23}"
Function | Argument types | Description | Examples |
---|---|---|---|
format |
string, ... | Returns a string with the additional arguments injected into the format string argument. | format("Commodore{}", 64) // "Commodore64" |
print |
string, ... | Prints a string with the additional arguments injected into the format string argument. | print("Commodore{}", 64) // Commodore64 |
Function | Accepted input types | Description | Examples |
---|---|---|---|
symbol(s) |
string | Returns the value of the symbol that is stored as s . |
const .a = 5; symbol(".a") // 5 |
jAsm supports static asserts to help improve the robustness of your programs. Use those to verify limitations in your program. The following example shows a common use case.
subroutine object_offset
{
lda object_index
static_assert(OBJECT_SIZE == 8, "This code only supports object sizes of 8")
asl
asl
asl
tax
rts
}
The first argument is a boolean expression. If this evaluates to false
, the assembler will generate an error and print the string in the second argument.
To output anything, a jAsm source file needs to contain a code section. Here is a simple example program that changes the border color on a Commodore 64 and returns.
section code, "main", $8000
{
inc $d020
rts
}
A section has a unique name, a start address and an optional end address. The name is used to name the output files when using the command line option to write one file per section. The filenames will consist of the output name specified on the command line, concatenated with an underscore and the section name. This way, each filename will be unique.
An end address can be specified after the start address. This will enforce that the code within the section actually fits within it. If it overflows, the assembler exits with an error.
section code, "main", $8000, $9000
{
inc $d020
rts
}
Sometimes you want the code to end at a position rather than start at a position. You can do this by setting the section start based on the end minus the length of the section data. The following example shows how it can be done and still enforce that the size must fit within two memory locations.
const .section_start = $8000
const .section_end = $9000
const .code_size = code_end - code_start
static_assert(.section_end - .code_size >= .section_start, "section overflow")
section code, "main", .section_end - .code_size, .section_end
{
code_start:
// code here
// ...
code_end:
}
Sections can be placed within sections. This is useful in two cases, 1) store relocated code and 2) output the size and placement of code.
In the following example the inner section is stored within the outer section at $8000 but is assembled like it was located at address $9000. So moving the code from $8000 to $9000 makes it run perfectly. This will only create one single tight code section, even if jAsm is configured to output one file per section. This only affects the outermost sections.
section code, "main", $8000
{
// move the code to the proper location
ldx #end - start - 1
{
lda start,x
sta target,x
dex
bpl @loop
}
jmp target
start:
section code, "reloc", $9000
{
target:
inc $d020
jmp target
}
end:
}
If jAsm is started with the -v2 flag, the output will print the sections like this.
$8000 - $8014 ($0014) code: main
$9000 - $9006 ($0006) code: reloc
The following example measures the size of a piece of code.
const address CHROUT = $ffd2
section code, "main", $8000
{
ldx #0
{
lda str,x
jsr CHROUT
inx
cpx #sizeof(str)
bne @loop
}
rts
// measure the size of string data
section code, "string", *
{
define byte[] str = {
"LONG STRING DATA STORED HERE... ",
"NO ONE KNOWS WHERE IT ENDS..."
}
}
}
The asterisk represents the current program counter value and this relocates the section to the address it is already at, thus it only affects the assembler information output. If jAsm is started with the -v2 flag, the output will print the sections like this.
$8000 - $804a ($004a) code: main
$800e - $804a ($003c) code: string
Bss sections is used to reserve memory for variables in your assembler program. This section type doesn't output anything, it just keeps track of a program counter to measure the size of reserved space. It isn't possible to place instructions or other data generating statements in a bss section. Reservation of space is done with the reserve statement.
section bss, "variables", $9000
{
reserve byte num_lives
reserve byte num_boosts
}
It is possible to add to an existing section later in the source code using section parts.
section code, "main", $8000
{
nop
}
section part, "main"
{
rts // some more code
}
A section part refers to the name of a previously defined section to add its contents to it. This can be used to create single file modules with code and variable reservations for specific systems. Empty sections can be created in a main file for zero page variables, code and variables and includes a number of modules. The modules adds to these sections to form a complete program.
It is possible to name a module's sections using generic names like "code", "variables" and "zero page" and still have the power to map these to more specific section names in a main program.
Let's say that the main program defines two locations for variable storage.
section bss, "low variables", $1000, $1100
{
}
section bss, "high variables", $2000, $2200
{
}
A generic module can reserve variable storage like this.
section part, "variables"
{
reserve byte lives
}
The main program can then include the generic module inside a section remap like this to get the module's variables stored in the low variable section.
section mapping
"variables" = "low variables"
{
include "module.jasm"
}
Several mappings can be specified like this.
section mapping
"zero page" = "zp",
"variables" = "low variables",
"main" = "code page 1"
{
include "module.jasm"
}
Sections can be used to build large cartridge images with banks. Do that by creating an outer section for all the banks and one inner section per bank.
section code, "main", 0 // start address will not be used
{
section code, "image_1", $e000, $10000
{
// code here
align $2000, $ff
}
section code, "image_2", $e000, $10000
{
// code here
align $2000, $ff
}
// more sections
}
The align
keyword is used here to fill up the rest of each bank up to where the next one begins.
An alternative is to use the bank mode and place the sections in their own 64 kB address space.
section code, "main", 0 // start address will not be used
{
section code, "image_1", $0e000, $10000
{
// code here
align $2000, $ff
}
section code, "image_2", $1e000, $20000
{
// code here
align $2000, $ff
}
// more sections
}
That way, you can create macros to generate special code when jumping between banks based on the addresses of program counter and jump target address.
Sections can also be used when building a game that streams code from disk at runtime. Each streaming code sector gets its own section and the command line option --output-multiple-files is used to output one file per section. If the same code files are used in several streaming code sectors, you use namespaces to keep them apart.
const address PROGRAM_START = $1000
section code, "main", PROGRAM_START
{
// code here
}
section bss, "streaming_buffer", *
{
// Reserve space for the streaming buffer. The size corresponds to the largest of the sectors.
reserve byte[max(sector_1::end - sector_1::start, sector_2::end - sector_2::start)] buffer
}
section code, "sector_1", buffer
{
namespace sector_1
{
start:
// code here
end:
}
}
section code, "sector_2", buffer
{
namespace sector_2
{
start:
// code here
end:
}
}
Code blocks can be selected or rejected with the if
statement.
if (USE_FEATURE)
{
jsr feature_update
}
The parentheses must contain a boolean expression to evaluate whether the code will be used or not. Two different code blocks can be selected in a mutually exclusive fashion, using the if-else
statement.
if (USE_FEATURE)
{
jsr feature_update
}
else
{
jsr featureless_update
}
You can choose to use or reject large blocks of code, even entire sections if needed.
Sometimes you need to select between more than two options. This is what the if-elif-else
statement does.
if (USE_FEATURE_1)
{
jsr feature1_update
}
elif (USE_FEATURE_2)
{
jsr feature2_update
}
else
{
jsr feature3_update
}
Since if
is a statement, it can't be used inside expressions or data definitions. You can use the function select
to do that. It takes a boolean as its first argument and if that evaluates to true
, the second argument is returned, otherwise the third. This is much like the ternary if-operator in C.
define byte size = select(USE_FEATURE_1, feature1_size, feature2_size)
Note that the function lazy evaluates its arguments so the argument that isn't returned will not be evaluated. The following code assembles even though .invalid
is never defined since that branch is not evaluated.
const .valid = 5
const .selected = select(true, .valid, .invalid)
Large programs may need to be separated into several files. You can include other source files in a source file using the include statement.
include "some_dir/some_file.jasm"
This will act as if all the text in some_file.jasm
was pasted over the include statement. Files will be searched for in the current directory first, and then all additional include directories specified by command line options.
Data like pictures, sprites and character sets can be included in a code section, to be accessible from code. Use the incbin
statement for that.
incbin "some_dir/some_file.bin"
The assembler will look in the current directory first and then all additional include directories specified by command line options.
You can add an optional byte offset into the file where to start reading.
incbin "some_dir/some_file.bin", 2
This will skip the first two bytes of the file. It is also possible to set a max size to read.
incbin "some_dir/some_file.bin", 2, 4
This will read at most 4 bytes from offset 2 in the specified file.
Sometimes it's needed to communicate with other tools when building a large project with several compile steps. jAsm supports exporting JSON data using the json_write
function. One dictionary variable (with all contents within) can be exported, as long as it only contains data types matching the JSON specification. That's dictionaries, lists, strings, numbers and booleans. The first argument is the filename, the second the data to export and the third states whether to make the output human readable. A non-human readable export is, a more compact, single line of data.
const data = dict("lives" = 3, "rooms" = list(1, 4, 8))
const humanreadable = true
json_write("some_dir/some_file.json", data, humanreadable)
Data in JSON format can also be imported into a variable using the json_read
function. The entire file will be imported as dictionary data.
const imported_data = json_read("some_dir/some_file.json")
This can be used to get compile settings into the build or data from a previous compile into the next.
You can define data to be included in a code section using the define statement.
define byte max_lives = 3
This adds a single byte with the value 3 and creates a label max_lives
pointing to it. All storage types can be used in the define statement. You can also create arrays of data.
define word[] pointers = {
ptr1, ptr2, ptr3
}
You always need to provide curly braces when defining arrays. This is also true if you are defining strings.
define byte[] str = { "HELLO" }
It is possible to specify the size of the array, to verify that the number of elements match.
define word[NUM_POINTERS] pointers = {
ptr1, ptr2, ptr3
}
If NUM_POINTERS
doesn't match the number of pointers defined, an error will be returned.
pointers
acts like a label but you can also index into the array using the array operator []
.
lda pointers[1]
sta low_byte
lda pointers[1] + 1
sta high_byte
In the code example above, the array index start at zero so the second pointer was fetched and stored.
Another way to index is to use the offsetof
function. It will return the offset in bytes from the beginning of the array.
lda pointers + offsetof(pointers[1])
sta low_byte
lda pointers + offsetof(pointers[1]) + 1
sta high_byte
This isn't more convenient in this case but there are cases when determining the offset is useful.
Another handy function operating on defined data is sizeof
. It returns the size in bytes of the consumed space.
ldx #sizeof(pointers) // 6
Function | Accepted input types | Description | Examples |
---|---|---|---|
offsetof(x) |
offset type | Returns the offset in bytes from the beginning of defined or reserved data to x . |
define byte[] ints = { 1, 2, 3 } lda #offsetof(ints[2]) |
sizeof(x) |
offset type | Returns the size in bytes of x . |
define byte[] ints = { 1, 2, 3 } lda #sizeof(ints) |
It is also possible to define data without specifying a name.
define byte = 3
define byte[] = { 1, 2, 3 }
The define statement can also be used to fill a larger memory block with values without specifying each value if they follow a pattern. This will generate 100 bytes of zeroes.
define byte[100] = { 0, ... }
This can also fill using a more complex pattern like this:
define byte[100] = { "HELLO WORLD!", ... }
Multidimensional arrays are also allowed. The following example defines an array of arrays of words.
define word[][] test = {
{0, 1},
{2, 3},
{4, 5}
}
The address to the value 2 is test[1][0]
because the first array indexing operator is operating on the outermost array.
In bss sections you can allocate space for variables in your program. You use the reserve statement to do that.
reserve byte lives
This will reserve one byte for lives
and create a label to the memory address.
You can reserve an array of a storage data type as well.
reserve long[16] coordinates
It is also possible to reserve space without specifying a name. In this case you will need to provide a semicolon to signal that there will be no name following the type.
reserve byte;
reserve byte[3];
The sizeof
and offsetof
functions can also be used on reserved memory labels, just like for defined data.
Multidimensional array space can be reserved as well.
reserve byte[40][25] screen
Using this you can create a handy way to address screen coordinates.
lda #'1'
sta screen[y][x]
To create a subroutine you really only need to place a label somewhere and jump to it. jAsm allows you to express it a bit more explicitly using the subroutine keyword.
// -> a: the value to multiply
// <- a: the result
// <> x: preserved
// <> y: preserved
subroutine multiply_by_8
{
asl
asl
asl
rts
}
A subroutine also has the property that it can be called like a macro without arguments. The two following lines are equivalent.
jsr multiply_by_8
multiply_by_8()
The macro style call has the advantage that a module can change a subroutine into a macro to inline the code, or the other way around, without changing the calling code.
jAsm supports enumerated constants. It is a simplified way of assigning a series of numbers without specifying each number. It makes it easier to insert a value in the middle without re-enumerating all following numbers.
enum pause_menu
{
continue,
options,
exit
}
In this example, continue will contain 0, options 1 and exit 2. You access the values using the pause_menu enum like this.
ldx #pause_menu.continue
jsr draw_menu_option
The first enum value is by default 0, but any of the values can be explicitly specified like this.
enum pause_menu
{
continue = 1, // 1
options, // 2
exit = 10 // 10
}
Enum values can be specified relative to other values as well.
enum device
{
joy1,
joy2,
paddle1 = device.joy1,
paddle2 = device.joy1,
paddle3 = device.joy2,
paddle4 = device.joy2
}
Sometimes it is necessary to write a lot of repetitive code. jAsm supports loops for this purpose.
The for
loop is a general form of loop that can be used in a large variety of situations. It is very similar to the C for-loop with a tighter set of options.
for(var .i = 0; .i < 5; ++.i)
{
nop
}
This creates five nop instructions. The for
loop starts with an optional variable declaration, followed by a required ending condition expression and ends with an optional variable modification expression. The loop takes another pass as long as the ending condition expression evaluates to true
.
A more specific form of for loop exists which conveniently iterates over lists.
const .bits = list(0, 1, 2, 4, 5, 7)
for(var .b in .bits)
{
define byte = 1 << .b
}
This creates six mask bytes according to the bits in the list. Inside the loop the special variable @i
is set to the zero-based index to the value in the list.
const .names = list("PICTURE", "GAME", "LEVEL1", "LEVEL2")
for(var .name in .names)
{
define byte[] = { string(@i), " ", .name, 0 }
}
This will generate data for filenames where each name begins with a number and a space character before the descriptive name.
When the loop starts, a copy of the list will be made. The iteration is done over the copy to avoid problems where the list is accidently modified inside the loop.
Strings can also be iterated over with this form of loop.
const .message = "SECRET"
for(var .char in .message)
{
define byte[] = { .char ^ $ff }
}
This will store an encoded string where each character has all its bits flipped.
This type of loop can be used on dicts as well like this.
const .colors = dict("black" = 0, "white" = 1, "red" = 2)
for(var .name, .color in .colors)
{
define byte[] = {.name, 0}
define byte = .color
}
Note that the iteration over the dict keys and values will not be done in any particular order.
The repeat
loop is a simplified version of the for loop. It can only repeat itself a fixed number of times and doesn't use a complex exit condition expression. It generates an automatic local label @i
as a zero based loop iteration counter.
repeat 5
{
define byte = @i
}
This defines numbers 0, 1, 2, 3 and 4 in memory.
The break
statement can be used in any form of loop to exit it prematurely.
repeat 5
{
if (@i == 3)
{
break
}
define byte = @i
}
This defines numbers 0, 1 and 2 in memory since the loop is exited before the end condition is reached.
Macros are a way to generate adaptable and reusable code blocks. A macro is a function type object which generates its contents where it is invoked. This is much like an inline function or a template function in C++.
macro memset(.addr, .size)
{
ldx #.size - 1
{
sta .addr,x
dex
bpl @loop
}
}
This is a simple macro to generate a loop to clear a block of memory. The arguments are put into the local constants .addr
and .size
when the macro is invoked.
lda #0
memset(data, sizeof(data))
// ...
reserve byte[55] data
A powerful feature is that the macro can change its behavior based on its arguments. What if the size to clear is 2? A loop wouldn't be very efficient in that case. The macro can be changed to solve this more efficiently.
macro memset(.addr, .size)
{
if (.size < 4)
{
repeat .size
{
sta .addr + @i
}
}
elif (.size < 129)
{
ldx #.size - 1
{
sta .addr,x
dex
bpl @loop
}
}
else
{
static_assert(false, "memset doesn't support larger sizes... yet.")
}
}
The loop is unrolled for sizes less than 4, otherwise a loop will be constructed and if the size is too big, the assert triggers. This can be extended to support all sizes optimally and then you will never again need to write a memory clear loop!
Macros can also be locally defined.
{
macro .write2(.addr)
{
sta .addr
sta .addr + 1
}
lda #0
.write2(ptr1)
.write2(ptr2)
.write2(ptr3)
}
Macros are first class objects and therefore they can be stored as constants or variables. They can also be sent as arguments to functions or macros. This enables code injection in macros, using other macros as arguments.
macro print_char()
{
jsr CHROUT
}
macro print_text(.text, .printer)
{
ldx #0
{
lda .text,x
beq @continue
.printer()
inx
bne @loop
}
}
print_text(text1, print_char)
print_text(text2, print_char)
rts
define byte[] text1 = { "wow", 0 }
define byte[] text2 = { "cool", 0 }
Now this print_text macro is generic enough to be reused even with other types of output devices.
Macros can return values if desired. That makes it possible for macros to also act as pure functions if no instructions are generated within them. Values are returned with the return
statement that takes an optional expression to return as argument. This macro calculates the screen address based on a screen base address and screen coordinates.
macro screen_pos(.start, .x, .y)
{
return .start + .x + 40*.y
}
Macros can be called recursively. This example calculates a Fibonacci Sequence using a recursive macro.
macro fibonacci(.value)
{
if (.value == 0) {
return 0
}
if (.value == 1) {
return 1
}
return fibonacci(.value - 1) + fibonacci(.value - 2)
}
repeat 10
{
define byte = fibonacci(@i)
}
A macro can be ended early without returning any value like this.
macro send_large(.size)
{
if (.size < 4) {
return;
}
jsr send_it
}
Note that the return statement must end with a semicolon if no value is to be returned, otherwise an expression is expected.
Sometimes code or data needs to be aligned to avoid extra cycles spent on traversing memory block boundaries. This is done with the align statement.
align 256
This will align the program counter so that it ends up where the address modulo 256 is 0. In code sections, the alignment pads with zeros by default. If you need to pad with something else in code sections you can supply an additional fill byte argument.
align 256, 55
This will fill up the gap to next page boundary with the number 55.
Sometimes it is necessary to ensure that a block of code or data is within a 256 byte page to avoid extra cycles being spent on indexing or branch instructions. The following macro can be used to verify that a memory block is within alignment.
// Check that the code/data between .start and .end won't cross an alignment border.
macro assert_within_alignment(.start, .end, .alignment)
{
static_assert(.end - .start >= 0, "Content wraps around")
static_assert(.end - .start <= .alignment, "Content too large for alignment")
if (.end - .start > 0) {
static_assert(.start/.alignment == (.end - 1)/.alignment, "Content crosses alignment")
}
}
When a code block needs to be within a 256 byte page to work, this macro can be used to automatically add dummy data at the macro location until the code block (that needs to be placed later in the same section) is within the alignment boundaries.
// Makes sure the area between .start and .end is within an alignment block.
// It adds bytes of data at the macro position to make sure the area isn't
// crossing any alignment boundaries.
macro align_within_page(.start, .end, .alignment, .fill_byte)
{
const .size = .end - .start
static_assert(.relative_position <= .start, "Macro must be placed earlier in memory")
static_assert(.size >= 0, "Content wraps around")
static_assert(.size <= .alignment, "Content too large for alignment")
if (.size > 0) {
// This is quite tricky because the code cannot make calculations based
// on the aligned position since it will cause variable oscillation. Instead
// it is using the distance to the area to calculate the unaligned position
// and base the alignment size on that.
const .distance = .start - .relative_position
const .start_before = * + .distance
const .end_before = .start_before + .size
const is_aligned = .start_before/.alignment == (.end_before - 1)/.alignment
if (!is_aligned) {
repeat .alignment - modulo(.start_before, .alignment) {
define byte = .fill_byte
}
}
}
.relative_position:
}