About jAsm


This is the documentation for the 6502, 65C02, 65CE02, 45GS02 and Z80 assembler jAsm. It was written in 2015 by me, Jonas Hultén, because I used DAsm before and over the years it started to feel outdated. I missed namespaces and I was annoyed by some bugs, like its inability to use local variables as macro arguments. It felt a bit slow on my slow laptop computer as well, although that wasn't a strong argument. Also, I hadn't written a language before and it felt like an interesting challenge.

jAsm was written with two main goals. It should be fast enough to assemble a large program in under a second and it should support everything DAsm can do and more. To reach the required speed it tries to use memory as linearly as possible and it proved to be faster than DAsm in the end.

The assembler was made for Commodore 64 programming and some features were specifically made to help programming for that computer. However, it shouldn't stop anyone from making programs for other computers with it.

jAsm looks a lot like C. It wasn't meant to do that but over the course of development it moved closer and closer because it was easier to solve parsing problems that way.

It took 7 months to complete this first version of the assembler. It is still a bit rough around the edges but has some power under the hood. Let's start!

About This Document

This documentation covers the language and syntax provided by the assembler but not any details about specific supported processors. It was written when only 6502 was supported so the document is heavily geared towards that processor.

Table of Contents

Processor Support

6502

jAsm supports all regular instructions of the 6502. Instructions are written in lower case.

lda #0
sta $d020

The brk instruction takes an optional immediate argument since rti actually will return to the instruction after that argument (this goes for 65C02, 65CE02 and 45GS02 as well).

brk // valid but rti won't return directly after this instruction
brk #0 // optional argument makes rti return to next instruction

Due to the large amount of source code with upper case instruction keywords, a python script is provided to convert upper case keywords in all .asm files in a directory. Run that like this.

python3 tools/convert_6502_keyword_case.py <my_source_directory>

65C02

jAsm supports all regular instructions of the Western Design Center 65C02. Instructions are written in lower case.

stz $d020
bra loop

The bit operation instructions don't have the bit in the instruction name as some assemblers do. Instead it is a separate argument. To follow convention, there is no '#' before the bit number to indicate immediate mode, even if that would be more consistent.

bbr 0, zp, label
bbs 1, zp, label
rmb 2, zp
smb 3, zp

65CE02

jAsm supports all regular instructions of the Commodore Semiconductor Group 65CE02. Instructions are written in lower case.

ldz $d020
bru loop

Just like 65C02, the bit operation instructions don't have the bit in the instruction name as some assemblers do. Instead it is a separate argument. To follow convention, there is no '#' before the bit number to indicate immediate mode, even if that would be more consistent.

bbr 0, zp, label
bbs 1, zp, label
rmb 2, zp
smb 3, zp

The aug instruction isn't available since that's intended to extend the processor with more instructions in the future.

The stack pointer relative access addressing mode is written like this.

lda ($55,sp),y

45GS02

jAsm has experimental support for the new Mega65 instructions of the 45GS02, along with the instructions of CSG4510.

Instructions are written in lower case.

ldz $d020
bru loop

Just like 65CE02, the bit operation instructions don't have the bit in the instruction name as some assemblers do. Instead it is a separate argument. To follow convention, there is no '#' before the bit number to indicate immediate mode, even if that would be more consistent.

bbr 0, zp, label
bbs 1, zp, label
rmb 2, zp
smb 3, zp

The stack pointer relative access addressing mode is written like this.

lda ($55,sp),y

The indirect quad addressing mode is written using brackets.

lda [$55],z

Z80

jAsm supports all regular instructions of the Z80. Instructions are written in lower case.

ld a, 0
ld (hl), a

There's also a script to convert Z80 uppercase keywords to lowercase. Run that like this.

python3 tools/convert_z80_keyword_case.py <my_source_directory>

Starter Guide


We'll start by creating a small program in a text file.

processor "6502"

section code, "main", $8000
{
    inc $d020
    rts
}

Save this to a file named main.jasm. Use utf-8 format, because this is what jAsm expects. 7-bit ASCII is also ok since that is compatible with the utf-8 format. Now we'll assemble it into a binary. Open a command line window and change the current directory to where the main.jasm file is. Type this on the command line.

jasm -hla main.jasm main.prg

Now you have a program that changes the border color on a Commodore 64. Load it into an emulator or onto a real machine.

LOAD"MAIN.PRG",8,1

Now start it.

SYS32768

The border color changes.

Basic Start

If you want to start it on a Commodore 64 with a BASIC line, you need to add the necessary data to produce a SYS line at the BASIC start. This is specific to the Commodore BASIC v2. This example shows how to do that in jAsm.

processor "6502"

section code, "main", $0801
{
    define word = .next_basic_line // next BASIC line
    define word = 2016 // line number
    define byte = $9e // SYS token
    define byte[] = { string(.start) }
    define byte = 0 // end of line
.next_basic_line:
    define word = 0 // zero next BASIC line to mark end of program

.start:
    inc $d020
    rts
}

Stuff written after // are comments and will be completely ignored by the assembler.

.next_basic_line and .start are labels that represent the addresses in memory where they are placed. The dot before the name means it is local to the space between the closest surrounding curly braces. define places variable data into the program. A word is two bytes long. The SYS token is written in hexadecimal form, which is what the dollar sign indicates.

string(.start) means "call the built in function string with the argument .start". The function will return a string representation of .start.

Basic Macro

This BASIC line thing will be used a lot in programs since almost all programs loaded from disk will need it. Let's break out this code into a handy macro that we can reuse. The macro will need two arguments, one is the line number and one is the address to start the program from.

processor "6502"

macro basic_sys_line(.line_number, .sys_address)
{
    define word = .next_basic_line // next BASIC line
    define word = .line_number
    define byte = $9e // SYS token
    define byte[] = { string(.sys_address) }
    define byte = 0 // end of line
.next_basic_line:
    define word = 0 // zero next BASIC line to mark end of program
}

section code, "main", $0801
{
    basic_sys_line(2016, .start)

.start:
    inc $d020
    rts
}

The start of the main section invokes the macro and this inserts the code in the macro at the place of invocation.

Using Files

The main section of our example looks a lot cleaner now. We can now move the macro to its own file. We can build a small library of handy macros to help us be productive and avoid solving the same problem several times.

Move the macro code into a file called macros.jasm and place it where main.jasm lies. We can now include the macros in main.jasm.

processor "6502"

include "macros.jasm"

section code, "main", $0801
{
    basic_sys_line(2016, .start)

.start:
    inc $d020
    rts
}

Defining Constants

The border color changing address isn't exactly self explanatory. The BASIC start address is also a naked constant that isn't exactly self explained. Let's make this a bit better.

processor "6502"

include "macros.jasm"

const BASIC_START = $0801
const BORDER_COLOR = $d020

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    inc BORDER_COLOR
    rts
}

I use uppercase characters for fixed address constants (basically any naked constant) to make it easy to identify them. BASIC_START and BORDER_COLOR can now be used instead of the naked constants. Let's move the constants out into their own file as well. Call this c64.jasm since they describe constants specific to Commodore 64. We'll include this as well in the program.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    inc BORDER_COLOR
    rts
}

Conditional Assembly

Now, what if we wanted to port this to VIC20? We would only need to create a vic20.jasm file with different BASIC_START and BORDER_COLOR addresses and then include that instead of the c64.jasm file. We can also support both at the same time. Let's put this in the vic20.jasm file.

const BASIC_START = $1001
const BORDER_COLOR = $900f // this address controls both background and border colors

Now, what we need is a way to include either the c64.jasm or vic20.jasm file based on an option somewhere. Let's add the selection first.

processor "6502"

include "macros.jasm"
if (C64_BUILD) {
    include "c64.jasm"
} else {
    include "vic20.jasm"
}

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    inc BORDER_COLOR
    rts
}

Command Line Constants

The if statement wants a boolean expression within the parentheses and if true the first block of code is used, otherwise the second block is used. We can feed constants from the command line to solve this. The command line option is -d and it needs to be followed by an assignment. In this case we want to assign C64_BUILD to true or false.

jasm -d C64_BUILD=true main.jasm main.prg
jasm -d C64_BUILD=false main.jasm main.prg

Defining Data

Let's try a hello world example. We'll drop the VIC20 support to make the code shorter. We will define the string "hello world!" and print it, character by character. We have already seen how to define a string in memory in the BASIC line. Printing is done with a jump to $ffd2, which prints a single character. Let's add the following naked constant to the c64.jasm file.

const CHROUT = $ffd2

Now we'll add the loop to print the text.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #0
.loop:
    lda hello_world_text,x
    jsr CHROUT
    inx
    cpx #sizeof(hello_world_text)
    bne .loop
    rts

    define byte[] hello_world_text = { "HELLO WORLD!" }
}

The define now has a name before the equal sign. This becomes a special kind of label. It can be used as a normal label but it also contains information about the defined data. sizeof is a function that returns the size in bytes of such a labeled object or array.

Coding For Readability

This works but is hard to read. It isn't obvious where the loop starts and ends unless we read the instructions. Let's improve it using indentation.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #0
.loop:
        lda hello_world_text,x
        jsr CHROUT
        inx
        cpx #sizeof(hello_world_text)
    bne .loop
    rts

    define byte[] hello_world_text = { "HELLO WORLD!" }
}

Automatic Labels

This is better but can be improved further. jAsm supports an automatic @loop label at the beginning of a scope defined by curly braces.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #0
    {
        lda hello_world_text,x
        jsr CHROUT
        inx
        cpx #sizeof(hello_world_text)
        bne @loop
    }
    rts

    define byte[] hello_world_text = { "HELLO WORLD!" }
}

It's now much easier to read the loop and we got rid of the explicitly defined label .loop.

Subroutines

If we want to print more text we need to move the loop into a subroutine which can be called with a jsr instruction and some parameters in registers.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #>hello_world_text // high byte from address
    lda #<hello_world_text // low byte from address
    ldy #sizeof(hello_world_text)
    jsr print_text
    rts

    define byte[] hello_world_text = { "HELLO WORLD!" }


    // -> xa: address to text
    // -> y: size of text
    subroutine print_text
    {
        // self modifying code
        sta .addr
        stx .addr + 1
        sty .size

        ldx #0
        {
            const .addr = * + 1
            lda $ffff,x // just a dummy address, it will be overwritten
            jsr CHROUT
            inx
            const .size = * + 1
            cpx #0 // just a dummy value, it will be overwritten
            bne @loop
        }
        rts
    }
}

* in the subroutine represents the current program counter. * + 1 points one byte into the next instruction, which is where the instruction argument is. All is well, except that it doesn't assemble!

main.jasm(25,7) : Error 3004 : Reference to undefined symbol .addr
main.jasm(26,7) : Error 3004 : Reference to undefined symbol .addr
main.jasm(26,13) : Error 3000 : Operator + is not defined for left hand side unknown type.
main.jasm(27,7) : Error 3004 : Reference to undefined symbol .size

Declaring Symbols

There is something wrong with .addr and .size. The reason is that local constants are not accessible outside the scope they are defined in. Local constants are always accessible inside the scope they are defined in, even in inner scopes. The scope is defined by the closest enclosing curly braces. So .addr and .size is accessible inside the loop but not outside.

To solve this we can declare the symbol names in the subroutine scope but define the constants inside the loop. This is the working subroutine.

processor "6502"

// -> xa: address to text
// -> y: size of text
subroutine print_text
{
    // declaring constants
    declare .addr
    declare .size

    // self modifying code
    sta .addr
    stx .addr + 1
    sty .size

    ldx #0
    {
        const .addr = * + 1
        lda $ffff,x // just a dummy address, it will be overwritten
        jsr CHROUT
        inx
        const .size = * + 1
        cpx #0 // just a dummy value, it will be overwritten
        bne @loop
    }
    rts
}

There is a more intuitive way to declare the .addr and .size addresses. Instruction data labels can point directly to the instruction argument by placing a label definition between the instruction and the argument.

processor "6502"

// -> xa: address to text
// -> y: size of text
subroutine print_text
{
    // declaring constants
    declare .addr
    declare .size

    // self modifying code
    sta .addr
    stx .addr + 1
    sty .size

    ldx #0
    {
        lda .addr: $ffff,x // just a dummy address, it will be overwritten
        jsr CHROUT
        inx
        cpx .size: #0 // just a dummy value, it will be overwritten
        bne @loop
    }
    rts
}

This subroutine can be reused so let's move it to its own file. Name a new file screen_io.jasm and paste the subroutine into it. Now we'll modify the main file to include this new file. Note that we now must include the file inside the section because otherwise generated code or data would lie outside any section and that isn't allowed. Only code sections can contain code or data. The other include files only contain constant definitions and macros and they don't directly produce any code or data themselves. That's why they can be outside a section.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #>hello_world_text
    lda #<hello_world_text
    ldy #sizeof(hello_world_text)
    jsr print_text
    rts

    define byte[] hello_world_text = { "HELLO WORLD!" }

    include "screen_io.jasm"
}

Bss Sections

Self modifying code is handy and can improve efficiency but it doesn't work if the code is in a cartridge ROM, because it can't be modified. Let's try modifying the code to use the zero page instead. To do this we need to reserve some space for variables in the zero page area. This is done with a bss section. BSS stands for "Block Started by Symbol" and means a static memory block that is part of the program, but without its content stored in the executable file. The bss section doesn't generate any code or data, it just reserves uninitialized space. I reserved the last 5 bytes in the zero page area from $fb to, but not including, $100.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section bss, "zero page", $fb, $100
{
    reserve word addr
}

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #>hello_world_text
    lda #<hello_world_text
    ldy #sizeof(hello_world_text)
    jsr print_text
    rts

    define byte[] hello_world_text = { "HELLO WORLD!" }

    include "screen_io.jasm"
}

The reserve statement can reserve one type or an array of them, just like the define statement. The difference is that a reserve statement can't put actual values into anything. Also, you must specify array sizes with a number between the brackets.

The addr constant has no leading dot. This means that it is a global constant. It is accessible from anywhere in the program. Making it global is necessary since it doesn't exist in the same scope as the code that uses it.

Note that the bss section header has an extra value added after the start address. This is the end of the section. If the section grows beyond this value, an error is generated. This is an effective way to keep the variables under control.

Now we need to modify the print subroutine to not modify itself and instead use the allocated pointer.

// -> xa: address to text
// -> y: size of text
subroutine print_text
{
    sta addr
    stx addr + 1

    tya
    tax // size left in x
    ldy #0 // pointer offset
    {
        lda (addr),y
        jsr CHROUT
        iny
        dex
        bne @loop
    }
    rts
}

It would also be nice to avoid having to specify the length of the string when printing it. The code became a bit kludgy when swapping registers. We can solve this by removing the need for the size argument. If we zero terminate the string we can get rid of it (or swap argument registers).

processor "6502"

include "macros.jasm"
include "c64.jasm"

section bss, "zero page", $fb, $100
{
    reserve word addr
}

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #>hello_world_text
    lda #<hello_world_text
    jsr print_text
    rts

    define byte[] hello_world_text = { "HELLO WORLD!", 0 }

    include "screen_io.jasm"
}

Now the zero is added. Let's update the subroutine.

// -> xa: address to text
subroutine print_text
{
    sta addr
    stx addr + 1

    ldy #0 // pointer offset
    {
        lda (addr),y
        beq @continue
        jsr CHROUT
        iny
        bne @loop
    }
    rts
}

Now that looks better. The @continue is another automatic label that is defined by the closest surrounding closing curly braces.

Section Parts

One thing that isn't really great is that it isn't obvious what addr is used for. It would be nice if it was connected to the print subroutine somehow. We can make that connection by creating partial sections in the screen_io.jasm file that adds to the sections in the main file. We do that by moving the reserve into the screen_io.jasm file. We also move the include outside the main section, because we can't define a partial section within a section.

This is what main.jasm looks like after the change.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section bss, "zero page", $fb, $100
{
}

section code, "main", BASIC_START
{
    basic_sys_line(2016, .start)

.start:
    ldx #>hello_world_text
    lda #<hello_world_text
    jsr print_text
    rts

    define byte[] hello_world_text = { "HELLO WORLD!", 0 }
}

include "screen_io.jasm"

The screen_io.jasm file now needs to define two section parts, one for the zero page reservation and one for the code.

section part, "zero page"
{
    // temporary text address when printing
    reserve word addr
}

section part, "main"
{
    // -> xa: address to text
    subroutine print_text
    {
        sta addr
        stx addr + 1

        ldy #0 // pointer offset
        {
            lda (addr),y
            beq @continue
            jsr CHROUT
            iny
            bne @loop
        }
        rts
    }
}

We now have a kind of module with the print subroutine and its zero page variable. It can sit beside other potential modules in a larger program without overlapping. We don't have to specify a single address and it will be optimally packed together.

Namespaces

What if some other module also wants to have a temporary address called addr? That could be a problem. One solution for this is to put the print related names in a namespace.

We'll enclose the contents of the screen_io.jasm file in a screen namespace.

And the same for the screen_io.jasm file.

namespace screen
{
    section part, "zero page"
    {
        // temporary text address when printing
        reserve word addr
    }

    section part, "main"
    {
        // -> xa: address to text
        subroutine print_text
        {
            sta addr
            stx addr + 1

            ldy #0 // pointer offset
            {
                lda (addr),y
                beq @continue
                jsr CHROUT
                iny
                bne @loop
            }
            rts
        }
    }
}

The reference to the print subroutine must now specify the namespace in one way or another. One way would be to explicitly type it in front of the print name like this:

jsr screen::print_text

If print_text is used a lot in one place it is also possible to specify that a namespace should be used in a scope. As long as other names don't start to collide, this is just as good.

processor "6502"

include "macros.jasm"
include "c64.jasm"

section bss, "zero page", $fb, $100
{
}

section code, "main", BASIC_START
{
    using namespace screen

    basic_sys_line(2016, .start)

.start:
    ldx #>hello_world_text
    lda #<hello_world_text
    jsr print_text
    rts

    define byte[] hello_world_text = { "HELLO WORLD!", 0 }
}

include "screen_io.jasm"

Modules

A namespace expose everything to the outside world. Sometimes that's what you want but it could also be nice to control the module's interface. This can be done using /modules/ instead of namespaces. In a module, all global variables are local to the module unless they are marked for export.

In our example, addr doesn't need to be exposed outside, but the print subroutine must be.

module screen
{
    section part, "zero page"
    {
        // temporary text address when printing
        reserve word addr
    }

    section part, "main"
    {
        // -> xa: address to text
        export subroutine print_text
        {
            sta addr
            stx addr + 1

            ldy #0 // pointer offset
            {
                lda (addr),y
                beq @continue
                jsr CHROUT
                iny
                bne @loop
            }
            rts
        }
    }
}

Accessing the print_text subroutine in the module is done exactly the same way it was accessed in the namespace so the rest of the program can be left unchanged.

Debugging in VICE

jAsm can assist debugging in the VICE emulator by exporting the names of addresses for use in the emulator. Add --dump-vice-symbols and a filename to the command line arguments to export this information.

jasm --dump-vice-symbols main.vs main.jasm main.prg

Now, a symbol file will be created called main.vs. Let's start the emulator (install it first if you don't have it) and use the file.

x64sc -moncommands main.vs -autostart main.prg

Hello world should be printed on the screen.

Hello World Example

Start the monitor (alt-h in Linux) and type d 080d.

(C:$e5d1) d 080d
.C:080d   .sys_address:
.C:080d  A2 08       LDX #$08
.C:080f  A9 15       LDA #$15
.C:0811  20 22 08    JSR .print_text
.C:0814  60          RTS
.C:0815   .hello_world_text:
.C:0815  48          PHA
.C:0816  45 4C       EOR $4C
.C:0818  4C 4F 20    JMP $204F
.C:081b  57 4F       SRE $4F,X
.C:081d  52          JAM
.C:081e  4C 44 21    JMP $2144
.C:0821  00          BRK
.C:0822   .print_text:
.C:0822  85 FB       STA $FB
.C:0824  86 FC       STX $FC
.C:0826  A0 00       LDY #$00
.C:0828  B1 FB       LDA (.addr),Y
.C:082a  F0 06       BEQ $0832
.C:082c  20 D2 FF    JSR $FFD2
.C:082f  C8          INY
.C:0830  D0 F6       BNE $0828
.C:0832  60          RTS
.C:0833  00          BRK
.C:0834  00          BRK
.C:0835  00          BRK
(C:$0836)

You'll get a disassembled listing of the program and some of the labels are visible in the listing! The zero page addresses didn't get a name. That's a limitation in VICE so we can't help that. The CHROUT address didn't get a name either. How come? Well, the constant is only a number and not all numbers should be exported to VICE because those would act as addresses and it would get very confusing. There is a work-around for this. You can explicitly set a value to be an address like this.

const address BASIC_START = $0801
const address BORDER_COLOR = $d020
const address CHROUT = $ffd2

Change the c64.jasm file to this, assemble and restart the emulator.

.C:0822   .print_text:
.C:0822  85 FB       STA $FB
.C:0824  86 FC       STX $FC
.C:0826  A0 00       LDY #$00
.C:0828  B1 FB       LDA (.addr),Y
.C:082a  F0 06       BEQ $0832
.C:082c  20 D2 FF    JSR .CHROUT
.C:082f  C8          INY
.C:0830  D0 F6       BNE $0828
.C:0832  60          RTS

Problem solved!

VICE Breakpoints

To aid debugging you can set breakpoints in your program. This makes it easy to stop the program in a specific subroutine and single step through it. You do this by creating a label with a name that begins with breakpoint. Let's try this. Add a label somewhere in the print_text subroutine, like this.

// -> xa: address to text
subroutine print_text
{
.breakpoint:
    sta addr
    stx addr + 1

    ldy #0 // pointer offset
    {
        lda (addr),y
        beq @continue
        jsr CHROUT
        iny
        bne @loop
    }
    rts
}

The emulator stops almost immediately.

BREAK: 1  C:$0822  (Stop on exec)
#1 (Stop on  exec 0822)  141 016
.C:0822  85 FB       STA $FB        - A:15 X:08 Y:00 SP:f4 ..-.....    3114547
(C:$0822)

You can step through the instructions with the z command in the monitor.

There are two more types of breakpoints. A label beginning with read_breakpoint will stop execution when that memory address is accessed to read. A label beginning with write_breakpoint will stop the execution when that memory address is accessed to write.

Now you know the basics of jAsm and should be able to start experimenting yourself. The language has more to offer and the complete syntax is described in the reference section. Good luck!

Compiling jAsm


Fetching Source Code

You need to fetch the source code from SourceHut to get started. If you have a command line Mercurial client you can clone the repository like this.

hg clone https://hg.sr.ht/~bjonte/jasm

jAsm compiles using CMake and Clang.

Compiling Using CMake

To build with CMake you need CMake 3.5, Clang, Mercurial and python3 installed. On Debian, Ubuntu or Mint systems you can use apt-get to fetch the dependencies like this.

sudo apt-get install cmake clang mercurial python3

Clone the repository into a directory called 'jasm' and build it like this.

hg clone https://hg.sr.ht/~bjonte/jasm
cd jasm
export CXX=/usr/bin/clang++
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
sudo make install

If you want to cross compile binaries for Windows you need to install MingW.

sudo apt-get install mingw-w64

Cross compile like this.

hg clone https://hg.sr.ht/~bjonte/jasm
cd jasm
mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=../win64_cross_compile_toolchain.txt -DCMAKE_BUILD_TYPE=Release ..
make

You will find the binaries in build/jasm. You will also need the MingW dynamic link libraries found here in Linux Mint.

/usr/lib/gcc/x86_64-w64-mingw32/7.3-win32/libgcc_s_seh-1.dll
/usr/lib/gcc/x86_64-w64-mingw32/7.3-win32/libstdc++-6.dll

Starting jAsm


jAsm is a command line tool. It will print its arguments if started without any. Basically it needs an input file and an output file.

jasm input.jasm output.bin

There are some flags to tweak how the assembler behaves.

Bank Mode

When working with several memory banks it is handy to place them after each other in memory. That way it is possible to check which bank code or data belongs to just looking at the address. For example, cartridge bank 0 could be located at $08000-$0a000 and bank 1 at $18000-$1a000. However, jAsm will generate an error when trying to reference bank 1 in data definitions or instructions because the addresses exceeds 16 bits. This can be overridden with the --bank-mode flag, which automatically truncates long addresses.

jasm --bank-mode input.jasm output.bin

A shortcut alternative is -bm.

This also have implications on the high byte unary operator (>). Without bank mode '>addr' will mean the same as 'addr>>8' but with the bank mode enabled this will be '(addr>>8)&$ff' to make sure the result is an eight bit value suitable for instructions taking a byte argument.

Predefined Constants

You can instruct the assembler to create some initial constants that can be accessed in the source code with the --define flag.

jasm --define INFINITE_LIVES=true --define STARTING_LIVES=3 input.jasm output.bin
jasm --define DEFAULT_NAME=bobo input.jasm output.bin

You can feed it with integers, booleans and strings, like in the examples above.

A shortcut alternative is -d.

Symbol Dumps

The constants and variables in the assembled program can be written to text files in these formats.

jAsm format
Contains most information with metadata about types
VICE format
Format suitable for the VICE emulator that also can contain breakpoints.
No$GBA
Format suitable for the EightyOne emulator for example.

Dump jAsm symbols like this.

jasm --dump-symbols symbols.txt input.jasm output.bin

A shortcut alternative is -ds.

Dump VICE symbols like this.

jasm --dump-vice-symbols symbols.vs input.jasm output.bin

A shortcut alternative is -dv.

Dump No$GBA symbols like this.

jasm --dump-gba-symbols symbols.sym input.jasm output.bin

A shortcut alternative is -dg.

Hex Output

The assembled program can be written as a hex file interleaved with embedded source lines that produced the output to help understanding what the assembler produced.

Write hex output like this.

jasm --dump-hex hex_output.txt input.jasm output.bin

A shortcut alternative is -dh.

The file will output all source lines that generate data. The first column is the program counter, then up to four columns of binary data. This is followed by a line number and then the source code that produced the generated data.

    ./source/main_loop.jasm
--------------------------------------------------------------------------------
    0400: 20 17 04         7:       jsr setup_cpu
                           8: 
    0403: 20 46 04         9:       jsr blank_screen
                          10: 
    0406: 20 00 1f        11:       jsr mmu::setup
    0409: 20 6b 04        12:       jsr init_reset_vector

When the source file changes, the file name and a line with dashes will be added. In case there is a longer jump in line numbers or a jump backwards, a partially dashed line is printed.

    046b: ad 06 d5        51:       lda MMURCR
    046e: 48              52:       pha
  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
    046f: ad 06 d5        67:       lda MMURCR
    0472: 29 f7           68:       and #~MMURCR_COMMON_TOP
    0474: 09 04           69:       ora #MMURCR_COMMON_BOTTOM
    0476: 8d 06 d5        70:       sta MMURCR

Binary Header

By default, jAsm outputs only the binary data without any header. To generate a program file for Commodore 64 that can be loaded from BASIC, a two byte header must be added containing the load address in little endian format. You can add this header using --header-little-endian-address.

jasm --header-little-endian-address input.jasm output.prg

A shortcut alternative is -hla.

Include Paths

You can add include paths using the --include-dir flag. jAsm will look in these for included files.

jasm --include-dir some/dir --include-dir other/dir input.jasm output.bin

A shortcut alternative is -i.

Max Errors

With the --max-errors flag, you can specify the number of errors that will be printed before jAsm stops assembling.

jasm --max-errors 4 input.jasm output.bin

A shortcut alternative is -me.

Output Files and Sections

The default output mode will merge all code sections into one big binary and pad the inbetween space with zero. With the flag --output-multiple-files, this can be changed to store one file per section instead. Each file will be named after the output file but add the section name before the file extension.

jasm --output-multiple-files input.jasm output.bin

A shortcut alternative is -om.

You can choose to have jAsm name the files after the sections by not specifying an output file name.

jasm --output-multiple-files input.jasm

You may want to add an extension to the section names when using them as file names. Use the option --file-extension to do that.

jasm --output-multiple-files --file-extension prg input.jasm

A shortcut alternative is -ext.

Default Processor

You can set the default processor to use when assembling the source code using the option --processor. If you do this you won't need to specify the processor in the source code, unless you need to switch it.

jasm --processor 6502 input.jasm output.bin

or

jasm --processor z80 input.jasm output.bin

A shortcut alternative is -p.

Pseudo Instructions

You can enable a number of extra instructions to simplify programming using the option --pseudo-instructions. The result differs depending on the processor.

A shortcut alternative is -pi.

6502 pseudo instructions

These are the pseudo instructions for 6502.

bhs addr // branch if higher or same
blt addr // branch if lower

These are equivalent to bcs and bcc, respectively.

65C02 pseudo instructions

These are the pseudo instructions for 65C02.

bhs addr // branch if higher or same
blt addr // branch if lower

These are equivalent to bcs and bcc, respectively.

dea // decrement A register
ina // increment A register

These are equivalent to the implied mode dec and inc, respectively.

65CE02 pseudo instructions

These are the pseudo instructions for 65CE02.

bhs addr // branch if higher or same
blt addr // branch if lower

These are equivalent to bcs and bcc, respectively.

dea // decrement A register
ina // increment A register

These are equivalent to the implied mode dec and inc, respectively.

bra label // branch unconditionally

This is equivalent to the bru instruction.

45GS02 pseudo instructions

These are the pseudo instructions for 45GS02.

bhs addr // branch if higher or same
blt addr // branch if lower

These are equivalent to bcs and bcc, respectively.

dea // decrement A register
ina // increment A register

These are equivalent to the implied mode dec and inc, respectively.

bra label // branch unconditionally

This is equivalent to the bru instruction.

Z80 pseudo instructions

These are the pseudo instructions for Z80.

ld bc,de
ld bc,hl
ld de,bc
ld de,hl
ld hl,bc
ld hl,de

They are implemented using two instructions under the hood. First the high register part is loaded and then the low.

Verboseness

jAsm supports several levels of output during assembly. This is controlled by the -v0, -v1, -v2 and -v3 flags.

jasm -v2 input.jasm output.bin
FlagMeaning
-v0Show errors
-v1Show errors and warnings
-v2Show errors, warnings, printouts and general information
-v3Show errors, warnings, general information and debugging information

Return Codes

jAsm returns with return code 0 for success and non-zero if an error occurred.

Language Reference


This section documents the entire syntax. Have a look at the starter guide first to get a grasp of the basics before digging into this.

Input Format

jAsm uses Unicode utf-8 encoded text files only. If you provide something that can't be interpreted as utf-8, an error will be returned.

To assemble instructions jAsm needs to know what processor to target. This is done by either specifying the processor using command line flags or by a keyword in the source code. Specify the processor in a source file like this.

processor "6502"

After this statement, the assembler can handle 6502 processor instructions. You can switch processor in a source file several times.

processor "6502"
    rts
processor "z80"
    ret

It is also possible to momentarily change the processor and switch back to whatever it was before. The processor pop statement is used to change back to the previously set processor.

processor "6502"
    rts
    processor "z80"
        ret
    processor pop
    rts
processor pop

Included files inherit the processor from the file with the include statement but the processor set in the included file won't affect the file where the include statement is.

Suppose we have a file named test.jasm:

// processor 6502 inherited from main.jasm
rts

processor "z80"
// processor is now set to z80
ret

and a file named main.jasm:

processor "6502"
    // processor is now set to 6502
    include "test.jasm"
    // processor is still 6502
    lda #0

When including test.jasm, the rts instruction is assembled using 6502 because it was inherited from main.jasm. The ret instruction is assembled as z80 since the processor was changed in the included file before the instruction. After the included file the processor is 6502 since the included file won't affect the file it is included from.

Comments

jAsm supports C style single line comments. They span the rest of the line.

lda #0 // this is a comment
rts

Multiline comments are also supported with the same syntax as C. They are started with /* and ended with */.

lda #0 /* this
is a
comment */

A notable difference from C is that jAsm supports nested comments.

/* this is
  /* also */
a comment */

Assembler Instruction Syntax

This documentation doesn't cover the actual instructions, their meaning and so on. You will have to find that elsewhere.

The assembler instructions are entered using lowercase letters. All standard opcodes and addressing modes are supported. Examples:

lda #0
tay
sta $d020
ldx #5
lda ($fb),y
sta $9000,x
jmp $8000

Some assemblers use brackets as expression parentheses to avoid colliding with the indirect addressing modes but jAsm uses normal parentheses for both.

lda ($fb + 1) + 1,y // not indirect addressing
lda ($fb + 1 + 1),y // indirect addressing
jmp ($1fff + 1)*2 // not indirect addressing
jmp (($1fff + 1)*2) // indirect addressing

Instructions end with a new line or a semicolon. You can stack together instructions on one line like this.

inx; inx; inx;

Constants

Named constants can be defined in the source code to replace naked constants. This is encouraged as much as possible since it makes the source code much more readable. Two types exist. 1) Labels set to the current program counter and 2) constant declarations. Labels are names followed by a colon.

setup:

The label will contain the address to the next byte in memory. A label can also be placed between an instruction and its argument to be assigned the address to the instruction argument.

lda value: #0
inc value // increment what is loaded by the previous instruction next time it is executed

On Z80, there are at most two arguments so in some cases two labels can be defined to instruction arguments. Note how the label is placed when using indirect addressing modes. It has to be placed before the parenthesis.

ld hl, index
ld bc, data
ld index:(ix+0), data:0
inc (hl) // increase index in instruction above

Instruction labels become Memory Storage Types and it means they have a defined size and the low and high parts of word addresses can be accessed directly through the lo and hi properties.

lda value:$ffff
inc value.lo // the low part of the previous instruction's argument

Constant declarations look like this.

const NUM_LIVES = 5 // constant declaration

With this defined we can use them to produce more readable assembler code.

lda #NUM_LIVES
jsr setup
rts

Sometimes it is convenient to define constants that are exported as addresses. This is done using the address keyword.

const address SCREEN = $0400

Address constants can be exported to the VICE emulator via a command line option. Labels are automatically marked as addresses since they point into the source code address space but constants do not by default.

Constants and labels can be defined and used in any order, unlike C where everything needs to be declared before use.

Variables

A constant cannot change its value throughout the source.

const A = 4
A = 3 // Error 3017 : Cannot reassign constant.

To do that you need a variable.

var A = 4
A = 3 // it works!

Since the value is dependent on the parsing order (top to bottom), variables must be declared before they are used.

lda #A // Error 3004 : Reference to undefined symbol A
var A = 3

Local vs Global

There are local and global symbols (constants, labels and variables). Local symbols always start with a period character.

const A = 3 // global variable
const .A = 4 // local variable

Symbols cannot be used outside their scope. Global symbols have the entire source code as their scope. Local symbols can only be used inside the closest outer scope, which is delimited by curly braces.

{
    const .A = 1 // a local .a
    {
        lda #.A // this will be 2

        const .A = 2 // another local .a

        lda #.A // this will be 2
    }
    lda #.A // this will be 1
}
lda #.A // Error 3004 : Reference to undefined symbol .A

Program Counter

The program counter is always represented by an asterisk (*). This can be used to refer to things relative the current address.

lda #0
inc *-1 // increment the value 0 in the previous instruction

Automatic Scope Variables

Every scope has automatic local variables generated to simplify loop constructs in the code without the need to make up names for the loop labels. At the start of the scope @loop is created and at the end @continue is created. This makes it possible to loop and exit the loop without explicitly creating labels.

{
    lda text,x
    beq @continue // break out if zero is found
    jsr CHROUT
    inx
    bne @loop // loop back to the scope start
}
rts

Dynamically Created Symbol Names

Constant and variable names can be constructed by an expression if the dynamic keyword is used when defining the symbols.

const dynamic "begin" + "end" = 100
lda #beginend // this will be 100

This can be used together with the symbol function to create and access dynamically generated symbols.

repeat 3
{
    const .addr = symbol("data" + string(@i))
    lda #<.addr
    ldx #>.addr
    jsr print_text
}
rts

define byte[] data0 = { "one", 0 }
define byte[] data1 = { "next", 0 }
define byte[] data2 = { "last", 0 }

Namespaces

Global symbols have a tendency to collide name-wise with each other in large programs, especially if there are several people working on the same code. jAsm supports namespaces to reduce these problems. For example, setup is a common name and different systems may provide their own setup. If the symbols are placed in their own namespaces they can coexist.

namespace random
{

setup:
    // initialize randomizer
    rts

}

namespace raster
{

setup:
    // initialize raster system
    rts

}

Outside the namespaces you need to specify which setup you are referring to. This is done like this.

jsr random::setup
jsr raster::setup

Global symbols are fetched relative to the current namespaces. Sometimes you need to use absolute namespace references to resolve ambiguity. This starts with ::.

const A = 1
namespace aa
{
    const A = 2
}
namespace bb
{
    const A = 3

    lda #::A // this is 1
}

Namespaces can be nested to form deeper namespaces.

namespace aa
{
    namespace bb
    {

    }
}

You can declare that you will be using a namespace in a scope to avoid having to specify it every time you reference it.

namespace system_a
{
    const A = 0
}

namespace system_b
{
    using namespace ::system_a
    const B = A // this is ok because we are now using namespace system_a as well
}

Note that this doesn't resolve ambiguity so, in some cases, you may still need to specify the absolute namespace for symbols.

Modules

Namespaces encapsulates global symbols in its own space but expose all of them outside the namespace. Modules exist to solve the problem of exposing too much.

All global symbols are by default private to the module. The keyword export is used to make symbols accessible from outside the module.

module system
{
    const A = 0
    export const B = 1
}

const C = system::A // Error 3004: Reference to undefined symbol A
const D = system::B // this is ok because B has been exported

Place the export keyword before a statement that creates a symbol to export it.

export subroutine func
{
    export lbl:

    export define byte number = 3

    export macro exit()
    {
        rts
    }
}

A module can also declare that it needs values from the user of the module. This is done with the import keyword at the beginning of the module definition.

module system
    import color
{
    subroutine init
    {
        lda #color
        sta $d020
        rts
    }
}

The value system::color must be assigned somewhere once.

const BLACK = 0
system::color = BLACK

Import several variables like this.

module system
    import color1, color2
{
}

Declaring Symbols

Local symbols cannot be accessed outside their scope and sometimes they need to be defined in an inner scope. This is common when using self modifying code. In the following example, it isn't possible to access .char outside the loop.

const address SCREEN = $0400

lda #0
sta .char // Error 3004 : Reference to undefined symbol .char
ldx #NUM_ELEMENTS - 1
{
    lda .char: #0 // address to the value to be loaded into the accumulator
    sta SCREEN,x
    inc .char
    dex
    bpl @loop // loops back to enclosing scope beginning
}

This problem can be solved by declaring the symbol .char in the outermost scope where it needs to be accessed.

const address SCREEN = $0400

declare .char // declaring .char to be used in this scope
lda #0
sta .char // works!
ldx #NUM_ELEMENTS - 1
{
    lda .char: #0 // define the value of the declared symbol
    sta SCREEN,x
    inc .char
    dex
    bpl @loop
}

This technique can be used with any type of local symbol to move its scope.

Expressions

Expressions in jAsm are similar to expressions in C. They can contain assignments and assignments return the value assigned. This has the side effect that you can do multiple assignments.

var aa = 0
var bb = 1
aa = bb = 2

Normal parentheses are always used in expressions, not brackets.

Operators

jAsm supports a number of operators, similar to C but not exactly the same. The operators are, in the order of precedence:

OperatorTypeTypesExample
()call operatormacro, subroutinemac(aa,bb)
[]array indexing operatorstring, listaa[3]
.property operatorstring, list, dictaa.length
++postfix incrementintegeraa++
--postfix decrementintegeraa--
++prefix incrementinteger++aa
--prefix decrementinteger--aa
!boolean notboolean!aa
~bitwise notinteger~aa
+unary additioninteger, float+3
-unary subtractioninteger, float-3
<unary low byteinteger<aa
>unary high byteinteger>aa
*multiplicationinteger, floataa * bb
/divisioninteger, floataa / bb
+additioninteger, float, string, listaa + bb
-subtractioninteger, floataa - bb
<<logical left shiftintegeraa << 1
>>logical right shiftintegeraa >> 1
>>>arithmetic right shiftintegeraa >>> 1
&bitwise andintegeraa & bb
^bitwise exclusive orintegerab ^ bb
|bitwise orintegeraa | bb
<less than comparisoninteger, float, stringaa < bb
>greater than comparisoninteger, float, stringaa > bb
<=less or equal comparisoninteger, float, stringaa <= bb
>=greater or equal comparisoninteger, float, stringaa >= bb
==equal comparisonboolean, integer, float, stringaa == bb
!=not equal comparisonboolean, integer, float, stringaa != bb
&&boolean andbooleanaa && bb
||boolean orbooleanaa || bb
=assignmentallaa = 1
+=add and assigninteger, float, string, listaa += 1
-=subtract and assigninteger, floataa -= 1
*=multiply and assigninteger, floataa *= 2
/=divide and assigninteger, floataa /= 2
&&=boolean and, and assignbooleanaa &&= bb
||=boolean or, and assignbooleanaa ||= bb
&=bitwise and, and assignintegeraa &= bb
|=bitwise or, and assignintegeraa |= bb
^=bitwise exclusive or, and assignintegeraa ^= bb
<<=logical left shift, and assignintegeraa <<= 1
>>=logical right shift, and assignintegeraa >>= 1
>>>=arithmetic right shift, and assignintegeraa >>>= 1

Statements

Statements are blocks of code that control the generation of instructions or change the assembler state. Statements can optionally be separated by a semicolon, just like in C. Newline characters only matter in instructions where it is impossible for the assembler to know if some instructions (like rol) have an address following or not. In all other cases newlines are completely ignored. The following is valid in jAsm.

const
aa
=
1
;

const bb = 2 const cc = 3

In some cases you may be confused by the greedy parser which tries to include as much as possible in the current statement. Look at this.

var aa = 0
var bb = 1 + aa
++aa // Error 3044 : Expression must have side effect.

The parser tries to include as much as possible in the variable declaration for b. The ++ operator is applied to the a in the second line which leaves a single a in the third line. The assembler tries to be helpful by pointing out that the result is meaningless. This case needs to be resolved by a semicolon to separate the statements.

var aa = 0
var bb = 1 + aa;
++aa // ok!

Data Types

jAsm has a couple of built in data types.

Boolean Type

Booleans can only be either true or false. Comparison operators return boolean values. They are well suited for conditional assembly.

const USE_DEBUG_OUTPUT = true

Integer Type

Integer numbers are 32 bit signed numbers in the range [-2147483648, 2147483647].

const NUMBER = 123

Integer numbers can be expressed in several formats. All the following constants will evaluate to decimal number 31.

const DEC_NUMBER = 31 // decimal number with base 10
const HEX_NUMBER = $1f // hexadecimal number with base 16
const BIN_NUMBER = %11111 // binary number with base 2

Floating Point Type

Floating point numbers are 64 bit signed numbers with decimal points. They can represent large numbers and precision increases closer to zero. The largest number is roughly 10308. The smallest number is roughly 10-308.

const NUMBER1 = 123.0
const NUMBER2 = 0.0
const NUMBER3 = -1e-50

String Type

Strings are quoted text. The characters are stored as wide characters (32 bits in Linux and 16 bits in Windows), leaving you a large selection of characters to choose from. This is why utf-8 is used as the file format for source code.

const STRING = "Hello"

There are a number of special characters that can be encoded in strings using a special backslash syntax. Whenever a backslash is encountered in a string the next character is checked to see if a special character should be used.

Code Result
\\ The backslash character itself
\t The horizontal tab character (9)
\n The newline character (10)
\r The carriage return character (13)
\0 The null character (0)
\' The single quote character
\" The double quote character

The backslash has the same special meaning when specifying a single character as it has in strings. This can be used to specify the single quote character itself for example.

const QUOTE_CHAR = '\''

The following operators and methods are supported by the string type.

Function Argument types Description Examples
+ string Returns left and right side strings concatenated. const a = "Commodore" + "64" // "Commodore64"
[index] integer Returns the character at (zero based) index. const a = "Commodore64"
lda #a[1] // 'o'
length Returns the length of the string. const a = "Commodore64"
lda #a.length // 11
substring(start, length) integer Returns the part of the string starting at (zero based) start and spanning length characters. The range can be partly outside the string and the result will be the union of the string and the range. const a = "Commodore64"
const b = a.substring(3, 4) // "modo"

String Conversions

To support different platform's character sets there is a string() function that is used to convert unicode strings, which is the default string type, to other character sets.

const PET_HELLO = string("Hello", "pet", "lowercase")

The function takes a number of arguments. First the string to convert, then a number of conversion properties which specifies the format, subformat, locale and flags for the conversion, in any order.

The following format properties are supported:

Format Comment
ascii7 7 bit ascii format.
pet The character set used in Commodore PET models after PET 2001.
pet2001 The character set used in the Commodore PET 2001 model.
vic20 The character set used in the Commodore VIC 20.
c16 The character set used in the Commodore 16.
plus4 The character set used in the Commodore Plus/4.
c64 The character set used in the Commodore 64.
c128 The character set used in the Commodore 128.
zx80 Sinclair specific character set.
zx81 Sinclair specific character set.

The following optional subformats are supported.

Subformat Supported formats Comment
uppercase All This is the default subformat.
lowercase pet2001, pet, vic20, c16, plus4, c64, c128 The character set with both lower and uppercase characters.
uppercase_screen pet2001, pet, vic20, c16, plus4, c64, c128 The character set as screen codes.
lowercase_screen pet2001, pet, vic20, c16, plus4, c64, c128 The character set with both lower and uppercase characters as screen codes.

The following optional locale properties are supported.

Locale Supported formats Comment
english All This is the default locale.

The following optional flag properties are supported.

Flag Supported formats Comment
high_bit_term All The last character in the string is modified to have bit 7 set. This is sometimes used as a cheap terminator for the string.

String Functions

There are a couple of functions specific to operating on string data or characters.

Function Argument types Description Examples
uppercase(text [, locale]) string|integer, string Returns an uppercase version of the string or character sent as the first argument. uppercase("Commodore") // "COMMODORE"
uppercase("Cåmmodåre", "swedish") // "CÅMMODÅRE"
uppercase('a') // 65
lowercase(text [, locale]) string|integer, string Returns a lowercase version of the string or character sent as the first argument. lowercase("Commodore") // "commodore"
lowercase("ABCÅÄÖ", "swedish") // "abcåäö"
lowercase('A') // 97

When specifying a locale string in the string functions, these are the currently supported locales.

Locale Comment
default This is the default locale. It uses the default C locale. It doesn't handle any characters other than A-Z.
english This is the US English locale. This is much like the C locale.
swedish This is the Swedish locale. Supports the åäö characters.

List Type

The list type can hold a collection of values with different types. A list is created using the list function. This constructs a list containing the arguments.

const PRIMES = list(1, 2, 3, 5, 7, 11)
Function Argument types Description Examples
+ list Concatenates two lists. const aa = list(1, 2)
const bb = list(3, 4)
aa + bb // [1, 2, 3, 4]
+= list Concatenates two lists. var aa = list(1, 2)
aa += list(3, 4)
aa // [1, 2, 3, 4]
[index] integer Returns the item at (zero based) index in the list. const aa = list(5, 6, 7)
aa[1] // 6
push(x) any Adds x to the end of the list and returns the list. var aa = list(1, 2, 3)
aa.push(4) // [1, 2, 3, 4]
pop() Removes the last element in the list and returns the list. var aa = list(1, 2, 3)
aa.pop() // [1, 2]
insert(position, value) integer, any Inserts value at zero based index position and returns the list. var aa = list(1, 2, 3)
aa.insert(1, 99) // [1, 99, 2, 3]
erase(position)
erase(position, length)
integer, integer Erase the part of the list defined by position and the optional length argument. Specifying only the position will erase one element. The list is returned. The range can be partly outside the list. var aa = list(1, 2, 3, 4)
aa.erase(1, 2) // [1, 4]
aa.erase(0) // [4]
keep(position)
keep(position, length)
integer, integer Erase everything except the part of the list defined by position and the optional length argument. Specifying only the position will keep one element. The list is returned. The range can be partly outside the list. var aa = list(1, 2, 3, 4)
aa.keep(1, 2) // [2, 3]
aa.keep(0) // [2]
clear() Clears the list and returns it. var aa = list(1, 2, 3)
aa.clear() // []
sort(before) macro Sort the elements in the list according to an item ordering macro that takes two arguments and returns true if the first argument should be before the second. The list is returned. const .less = macro(.a, .b) { return .a < .b }
var aa = list(8, 4, 5, 1)
aa.sort(.less) // [1, 4, 5, 8]
empty Returns true if the list is empty, otherwise false. list(2, 4, 8).empty // false
list().empty // true
length Returns the number of elements in the list. const aa = list(2, 4, 8)
lda #aa.length // 3

Dictionary Type

The dictionary type can hold a collection of values with different types. A dictionary is created using the dict function. This constructs a dictionary containing key and value pairs as arguments.

const FRUITS = dict("apples" = 2, "bananas" = 5)

Values can be of any type, but keys must be booleans, integers or strings.

set(key, value) boolean|integer|string, any Adds value to be accessible by key in the dict. var aa = dict()
aa.set("hi", 0)
get(key) boolean|integer|string Fetches the value for a specific key. var aa = dict("a" = 0, "b" = 1)
aa.get("b") // 1
erase(key) boolean|integer|string Removes a value with the specified key from the dict. var aa = dict("a" = 0, "b" = 1)
aa.erase("b")
clear() Remove all keys and values from the dict. var aa = dict("a" = 0, "b" = 1)
aa.clear()
has(key)
boolean|integer|string Return true if the key exists in the dict, otherwise false. var aa = dict("a" = 0, "b" = 1)
aa.has("a") // true
aa.has("c") // false
empty Returns true if the dict is empty, otherwise false. dict("one" = 1).empty // false
dict().empty // true
length Returns the number of elements in the dict. const aa = dict("one" = 1, "two" = 2)
lda #aa.length // 2

Passing Values

Values are always passed by value, never by reference. Everytime you assign a value to some other variable, a copy is made. This makes it possible to assign constants to variables and variables to constants without problems. Values passed as macro arguments will be copied before executing the macro body as well.

const aa = list(1, 2, 3)
var bb = aa
bb.pop()
print("{} {}\n", aa, bb) // [1, 2, 3] [1, 2]

There is no ghosting or pointers that can mess up the data unexpectedly.

Type Conversions

There are a number of functions in the root namespace dedicated to converting between the built-in types.

Function Accepted input types Description Examples
int(value) numeric Strips decimal part from a value and converts it to an integer. int(5) // 5
int(5.8) // 5
float(value) numeric Converts value into a floating point value. float(5) // 5.0
float(5.5) // 5.5
string(value [, property, ...]) numeric|string, string, ... Converts value to a string according to a specific character set. See String Conversions for details about properties. string("Hello", "petscii", "lowercase")
string(123, "zx81")
hexstring(value) int Converts value into a readable hexadecimal string. hexstring(100) // "64"
unicode(value) int Converts unicode codepoint value to a string. unicode(65) // "A"

Memory Storage Types

There are specific byte, word and long types for memory storage. They are used when reserving or defining data to include in the assembler program.

define byte = 5

The memory storage data types store negative values as signed values and positive values as unsigned.

The word storage type has a pair of properties to address the first and second byte in the word. These are lo and hi.

define word number = 5
const high_addr = number.hi
const low_addr = number.lo

These properties will take number which points to the 5 i memory and return the offset to the high and low byte of the word respectively.

Storage Conversions

There are a number of functions in the root namespace dedicated to converting between the memory storage type ranges.

Function Accepted input types Description Examples
byte(value) numeric Returns value truncated to integer and with the number of bits reduced to 8. byte(257) // 1
byte(-1) // 255
word(value) numeric Returns value without decimal part and with the number of bits reduced to 16. word(128.3) // 128
long(value) numeric Returns value without decimal part and with the number of bits reduced to 32. long(1000000) // 1000000

Math Functions

jAsm provides a number of mathematical functions in the root namespace.

Function Accepted input types Description Examples
abs(x) numeric Returns the absolute part of x. abs(-10) // 10
acos(x) numeric Returns arc cosine of x in radians. acos(-1) // 3.1415926536
asin(x) numeric Returns arc sine of x in radians. asin(-1) // -1.5707963268
atan(x) numeric Returns arc tangent of x in radians. atan(1) // 0.7853981634
atan2(y, x) numeric, numeric Returns arc tangent of y/x in radians. atan2(1, 1) // 0.7853981634
ceil(x) numeric Returns x after rounding it up to the closest integer. ceil(0.1) // 1.0
ceil(-0.1) // 0.0
clamp(t, a, b) numeric, numeric, numeric Returns t clamped to the range [a..b]. clamp(0.1, 1.0, 2.0) // 1.0
clamp(5, 0, 10) // 5
cos(x) numeric Returns cosine of x radians. cos(PI) // -1.0
cosh(x) numeric Returns hyperbolic cosine of x radians. cosh(PI) // 11.591953344
degrees(x) numeric Returns radian angle x in degrees. degrees(PI) // 180.0
exp(x) numeric Returns ex. exp(1) // 2.7182818285
floor(x) numeric Returns x after rounding it down to the closest integer. floor(0.9) // 0.0
floor(-0.9) // -1.0
lerp(t, a, b) numeric, numeric, numeric Linearly interpolate a value between [a..b] using t [0..1] where 0 returns a and 1 returns b. t can also be outside the [0..1] range. lerp(0.5, 0.0, 10.0) // 5.0
lerp(-1, 0, 10) // -10
log(x) numeric Returns the natural logarithm of x. log(10) // 2.302585093
log10(x) numeric Returns the base-10 logarithm of x. log10(100) // 2.0
logn(x, n) numeric, numeric Returns the base-n logarithm of x. logn(243, 3) // 5.0
max(a, ...) numeric Returns the largest of the arguments. max(2, 4) // 4
max(2, 4.0) // 4.0
max(a) list Returns the largest of the numeric elements in the list. max(list(2, 4)) // 4
max(list(2, 4.0)) // 4.0
min(a, ...) numeric Returns the smallest of the arguments. min(2, 4) // 2
min(2, 4.0) // 2
min(a) list Returns the smallest of the numeric elements in the list. min(list(2, 4)) // 2
min(list(2, 4.0)) // 2
modulo(a, b) integer, integer Returns the remainder from the Euclidean division a/b. modulo(7, 3) // 1
modulo(-7, 3) // 2
pow(a, b) numeric Returns ab. pow(2, 4) // 16.0
radians(x) numeric Returns angle x in radians. radians(90) // 1.570796327
remainder(a, b) integer, integer Returns the remainder from the floored division a/b. This is commonly what the % operator does in C. remainder(7, 3) // 1
remainder(-7, 3) // -1
round(x) numeric Returns x after rounding it to the closest integer. round(0.9) // 1.0
round(0.1) // 0.0
sin(x) numeric Returns sine of x radians. sin(PI) // 0.0
sinh(x) numeric Returns hyperbolic sine of x radians. sinh(PI) // 11.548739357
sqrt(x) numeric Returns the square root of x. sqrt(16) // 4.0
tan(x) numeric Returns the tangent of an angle of x radians. tan(1.0) // 0.5493061444
tanh(x) numeric Returns the hyperbolic tangent of and angle of x radians. tanh(1) // 0.761594156

Math Constants

jAsm has a couple of predefined constants in the root namespace.

Constant Value
E 2.718281828459045
PI 3.141592653589793

Print and Formatting

A print function exists to output text when assembling. This can be useful when you want to know about locations or calculations made in the assembled code.

Note that in order to see the output you need to use at least verbose level -v2. See Verboseness for more information.

There are two functions dedicated to formatting and printing, format and print. Both use the same arguments but format returns the result and print outputs it. Let's take format as the example when looking at the arguments.

const WIDTH = 40
format("width: {}", WIDTH) // returns "width: 40"

The first argument is the format string that describes the output format. Each pair of curly brackets in the format string inserts the next argument to the function, as a string, where the brackets are.

const WIDTH = 40
const HEIGHT = 25
format("width: {}, height: {}", WIDTH, HEIGHT) // returns "width: 40, height: 25"

It is possible to control the alignment of the injected text using a format specifier inside the curly brackets.

const WIDTH = 40
format("width: {L4}", WIDTH) // returns "width: 40  "
format("width: {R4}", WIDTH) // returns "width:   40"

When formatting integers you can control the minimum number of digits used.

const WIDTH = 40
format("width: {D4}", WIDTH) // returns "width: 0040"

Integers can also be formatted as hexadecimal numbers.

const WIDTH = 40
format("width: {X4}", WIDTH) // returns "width: 0028"

Floating point numbers will by default be displayed as a short representation of either fixed-point or scientific notation.

format("{}", 0.001) // returns "0.001"
format("{}", 0.0000001) // returns "1e-07"

It is possible to force fixed-point to be used with a specific number of decimal digits.

format("{F4}", 0.0000001) // returns "0.0000"
format("{F4}", 1.23) // returns "1.2300"
format("{F4}", 10) // returns "10.0000"

Alignment and number formatting specifiers can be combined.

format("{R8F4}", 1.23) // returns "  1.2300"

To print an opening curly bracket, prefix it with a backslash.

format("\{{}}", 1.23) // returns "{1.23}"
Function Argument types Description Examples
format string, ... Returns a string with the additional arguments injected into the format string argument. format("Commodore{}", 64) // "Commodore64"
print string, ... Prints a string with the additional arguments injected into the format string argument. print("Commodore{}", 64) // Commodore64

Symbol Functions

Function Accepted input types Description Examples
symbol(s) string Returns the value of the symbol that is stored as s. const .a = 5; symbol(".a") // 5

Asserts

jAsm supports static asserts to help improve the robustness of your programs. Use those to verify limitations in your program. The following example shows a common use case.

subroutine object_offset
{
    lda object_index
    static_assert(OBJECT_SIZE == 8, "This code only supports object sizes of 8")
    asl
    asl
    asl
    tax
    rts
}

The first argument is a boolean expression. If this evaluates to false, the assembler will generate an error and print the string in the second argument.

Sections

Code Sections

To output anything, a jAsm source file needs to contain a code section. Here is a simple example program that changes the border color on a Commodore 64 and returns.

section code, "main", $8000
{
    inc $d020
    rts
}

A section has a unique name, a start address and an optional end address. The name is used to name the output files when using the command line option to write one file per section. The filenames will consist of the output name specified on the command line, concatenated with an underscore and the section name. This way, each filename will be unique.

An end address can be specified after the start address. This will enforce that the code within the section actually fits within it. If it overflows, the assembler exits with an error.

section code, "main", $8000, $9000
{
    inc $d020
    rts
}

Sometimes you want the code to end at a position rather than start at a position. You can do this by setting the section start based on the end minus the length of the section data. The following example shows how it can be done and still enforce that the size must fit within two memory locations.

const .section_start = $8000
const .section_end = $9000
const .code_size = code_end - code_start
static_assert(.section_end - .code_size >= .section_start, "section overflow")
section code, "main", .section_end - .code_size, .section_end
{
code_start:
    // code here
    // ...
code_end:
}

Sections in Sections

Sections can be placed within sections. This is useful in two cases, 1) store relocated code and 2) output the size and placement of code.

In the following example the inner section is stored within the outer section at $8000 but is assembled like it was located at address $9000. So moving the code from $8000 to $9000 makes it run perfectly. This will only create one single tight code section, even if jAsm is configured to output one file per section. This only affects the outermost sections.

section code, "main", $8000
{
    // move the code to the proper location
    ldx #end - start - 1
    {
        lda start,x
        sta target,x
        dex
        bpl @loop
    }
    jmp target

start:
    section code, "reloc", $9000
    {
    target:
        inc $d020
        jmp target
    }
end:
}

If jAsm is started with the -v2 flag, the output will print the sections like this.

$8000 - $8014 ($0014) code: main
  $9000 - $9006 ($0006) code: reloc

The following example measures the size of a piece of code.

const address CHROUT = $ffd2

section code, "main", $8000
{
    ldx #0
    {
        lda str,x
        jsr CHROUT
        inx
        cpx #sizeof(str)
        bne @loop
    }
    rts

    // measure the size of string data
    section code, "string", *
    {
        define byte[] str = {
            "LONG STRING DATA STORED HERE... ",
            "NO ONE KNOWS WHERE IT ENDS..."
        }
    }
}

The asterisk represents the current program counter value and this relocates the section to the address it is already at, thus it only affects the assembler information output. If jAsm is started with the -v2 flag, the output will print the sections like this.

$8000 - $804a ($004a) code: main
  $800e - $804a ($003c) code: string

BSS Sections

Bss sections is used to reserve memory for variables in your assembler program. This section type doesn't output anything, it just keeps track of a program counter to measure the size of reserved space. It isn't possible to place instructions or other data generating statements in a bss section. Reservation of space is done with the reserve statement.

section bss, "variables", $9000
{
    reserve byte num_lives
    reserve byte num_boosts
}

Section Parts

It is possible to add to an existing section later in the source code using section parts.

section code, "main", $8000
{
    nop
}

section part, "main"
{
    rts // some more code
}

A section part refers to the name of a previously defined section to add its contents to it. This can be used to create single file modules with code and variable reservations for specific systems. Empty sections can be created in a main file for zero page variables, code and variables and includes a number of modules. The modules adds to these sections to form a complete program.

Section Part Mapping

It is possible to name a module's sections using generic names like "code", "variables" and "zero page" and still have the power to map these to more specific section names in a main program.

Let's say that the main program defines two locations for variable storage.

section bss, "low variables", $1000, $1100
{
}

section bss, "high variables", $2000, $2200
{
}

A generic module can reserve variable storage like this.

section part, "variables"
{
    reserve byte lives
}

The main program can then include the generic module inside a section remap like this to get the module's variables stored in the low variable section.

section mapping
    "variables" = "low variables"
{
    include "module.jasm"
}

Several mappings can be specified like this.

section mapping
    "zero page" = "zp",
    "variables" = "low variables",
    "main" = "code page 1"
{
    include "module.jasm"
}

Building ROM Images

Sections can be used to build large cartridge images with banks. Do that by creating an outer section for all the banks and one inner section per bank.

section code, "main", 0 // start address will not be used
{
    section code, "image_1", $e000, $10000
    {
        // code here
        align $2000, $ff
    }
    section code, "image_2", $e000, $10000
    {
        // code here
        align $2000, $ff
    }
    // more sections
}

The align keyword is used here to fill up the rest of each bank up to where the next one begins.

An alternative is to use the bank mode and place the sections in their own 64 kB address space.

section code, "main", 0 // start address will not be used
{
    section code, "image_1", $0e000, $10000
    {
        // code here
        align $2000, $ff
    }
    section code, "image_2", $1e000, $20000
    {
        // code here
        align $2000, $ff
    }
    // more sections
}

That way, you can create macros to generate special code when jumping between banks based on the addresses of program counter and jump target address.

Building Overlayed Code Sectors

Sections can also be used when building a game that streams code from disk at runtime. Each streaming code sector gets its own section and the command line option --output-multiple-files is used to output one file per section. If the same code files are used in several streaming code sectors, you use namespaces to keep them apart.

const address PROGRAM_START = $1000

section code, "main", PROGRAM_START
{
    // code here
}
section bss, "streaming_buffer", *
{
    // Reserve space for the streaming buffer. The size corresponds to the largest of the sectors.
    reserve byte[max(sector_1::end - sector_1::start, sector_2::end - sector_2::start)] buffer
}

section code, "sector_1", buffer
{
    namespace sector_1
    {
    start:
        // code here
    end:
    }
}

section code, "sector_2", buffer
{
    namespace sector_2
    {
    start:
        // code here
    end:
    }
}

Conditional Assembly

Code blocks can be selected or rejected with the if statement.

if (USE_FEATURE)
{
    jsr feature_update
}

The parentheses must contain a boolean expression to evaluate whether the code will be used or not. Two different code blocks can be selected in a mutually exclusive fashion, using the if-else statement.

if (USE_FEATURE)
{
    jsr feature_update
}
else
{
    jsr featureless_update
}

You can choose to use or reject large blocks of code, even entire sections if needed.

Sometimes you need to select between more than two options. This is what the if-elif-else statement does.

if (USE_FEATURE_1)
{
    jsr feature1_update
}
elif (USE_FEATURE_2)
{
    jsr feature2_update
}
else
{
    jsr feature3_update
}

Since if is a statement, it can't be used inside expressions or data definitions. You can use the function select to do that. It takes a boolean as its first argument and if that evaluates to true, the second argument is returned, otherwise the third. This is much like the ternary if-operator in C.

define byte size = select(USE_FEATURE_1, feature1_size, feature2_size)

Note that the function lazy evaluates its arguments so the argument that isn't returned will not be evaluated. The following code assembles even though .invalid is never defined since that branch is not evaluated.

const .valid = 5
const .selected = select(true, .valid, .invalid)

Include Source

Large programs may need to be separated into several files. You can include other source files in a source file using the include statement.

include "some_dir/some_file.jasm"

This will act as if all the text in some_file.jasm was pasted over the include statement. Files will be searched for in the current directory first, and then all additional include directories specified by command line options.

Include Data

Data like pictures, sprites and character sets can be included in a code section, to be accessible from code. Use the incbin statement for that.

incbin "some_dir/some_file.bin"

The assembler will look in the current directory first and then all additional include directories specified by command line options.

You can add an optional byte offset into the file where to start reading.

incbin "some_dir/some_file.bin", 2

This will skip the first two bytes of the file. It is also possible to set a max size to read.

incbin "some_dir/some_file.bin", 2, 4

This will read at most 4 bytes from offset 2 in the specified file.

JSON

Sometimes it's needed to communicate with other tools when building a large project with several compile steps. jAsm supports exporting JSON data using the json_write function. One dictionary variable (with all contents within) can be exported, as long as it only contains data types matching the JSON specification. That's dictionaries, lists, strings, numbers and booleans. The first argument is the filename, the second the data to export and the third states whether to make the output human readable. A non-human readable export is, a more compact, single line of data.

const data = dict("lives" = 3, "rooms" = list(1, 4, 8))
const humanreadable = true
json_write("some_dir/some_file.json", data, humanreadable)

Data in JSON format can also be imported into a variable using the json_read function. The entire file will be imported as dictionary data.

const imported_data = json_read("some_dir/some_file.json")

This can be used to get compile settings into the build or data from a previous compile into the next.

Defining Data

You can define data to be included in a code section using the define statement.

define byte max_lives = 3

This adds a single byte with the value 3 and creates a label max_lives pointing to it. All storage types can be used in the define statement. You can also create arrays of data.

define word[] pointers = {
    ptr1, ptr2, ptr3
}

You always need to provide curly braces when defining arrays. This is also true if you are defining strings.

define byte[] str = { "HELLO" }

It is possible to specify the size of the array, to verify that the number of elements match.

define word[NUM_POINTERS] pointers = {
    ptr1, ptr2, ptr3
}

If NUM_POINTERS doesn't match the number of pointers defined, an error will be returned.

pointers acts like a label but you can also index into the array using the array operator [].

lda pointers[1]
sta low_byte
lda pointers[1] + 1
sta high_byte

In the code example above, the array index start at zero so the second pointer was fetched and stored.

Another way to index is to use the offsetof function. It will return the offset in bytes from the beginning of the array.

lda pointers + offsetof(pointers[1])
sta low_byte
lda pointers + offsetof(pointers[1]) + 1
sta high_byte

This isn't more convenient in this case but there are cases when determining the offset is useful.

Another handy function operating on defined data is sizeof. It returns the size in bytes of the consumed space.

ldx #sizeof(pointers) // 6
Function Accepted input types Description Examples
offsetof(x) offset type Returns the offset in bytes from the beginning of defined or reserved data to x. define byte[] ints = { 1, 2, 3 }
lda #offsetof(ints[2])
sizeof(x) offset type Returns the size in bytes of x. define byte[] ints = { 1, 2, 3 }
lda #sizeof(ints)

It is also possible to define data without specifying a name.

define byte = 3
define byte[] = { 1, 2, 3 }

The define statement can also be used to fill a larger memory block with values without specifying each value if they follow a pattern. This will generate 100 bytes of zeroes.

define byte[100] = { 0, ... }

This can also fill using a more complex pattern like this:

define byte[100] = { "HELLO WORLD!", ... }

Multidimensional arrays are also allowed. The following example defines an array of arrays of words.

define word[][] test = {
    {0, 1},
    {2, 3},
    {4, 5}
}

The address to the value 2 is test[1][0] because the first array indexing operator is operating on the outermost array.

Reserving Space

In bss sections you can allocate space for variables in your program. You use the reserve statement to do that.

reserve byte lives

This will reserve one byte for lives and create a label to the memory address.

You can reserve an array of a storage data type as well.

reserve long[16] coordinates

It is also possible to reserve space without specifying a name. In this case you will need to provide a semicolon to signal that there will be no name following the type.

reserve byte;
reserve byte[3];

The sizeof and offsetof functions can also be used on reserved memory labels, just like for defined data.

Multidimensional array space can be reserved as well.

reserve byte[40][25] screen

Using this you can create a handy way to address screen coordinates.

lda #'1'
sta screen[y][x]

Subroutines

To create a subroutine you really only need to place a label somewhere and jump to it. jAsm allows you to express it a bit more explicitly using the subroutine keyword.

// -> a: the value to multiply
// <- a: the result
// <> x: preserved
// <> y: preserved
subroutine multiply_by_8
{
    asl
    asl
    asl
    rts
}

A subroutine also has the property that it can be called like a macro without arguments. The two following lines are equivalent.

jsr multiply_by_8
multiply_by_8()

The macro style call has the advantage that a module can change a subroutine into a macro to inline the code, or the other way around, without changing the calling code.

Enumerations

jAsm supports enumerated constants. It is a simplified way of assigning a series of numbers without specifying each number. It makes it easier to insert a value in the middle without re-enumerating all following numbers.

enum pause_menu
{
    continue,
    options,
    exit
}

In this example, continue will contain 0, options 1 and exit 2. You access the values using the pause_menu enum like this.

ldx #pause_menu.continue
jsr draw_menu_option

The first enum value is by default 0, but any of the values can be explicitly specified like this.

enum pause_menu
{
    continue = 1, // 1
    options, // 2
    exit = 10 // 10
}

Enum values can be specified relative to other values as well.

enum device
{
    joy1,
    joy2,
    paddle1 = device.joy1,
    paddle2 = device.joy1,
    paddle3 = device.joy2,
    paddle4 = device.joy2
}

Loops

Sometimes it is necessary to write a lot of repetitive code. jAsm supports loops for this purpose.

For Loop

The for loop is a general form of loop that can be used in a large variety of situations. It is very similar to the C for-loop with a tighter set of options.

for(var .i = 0; .i < 5; ++.i)
{
    nop
}

This creates five nop instructions. The for loop starts with an optional variable declaration, followed by a required ending condition expression and ends with an optional variable modification expression. The loop takes another pass as long as the ending condition expression evaluates to true.

Range-based For Loop

A more specific form of for loop exists which conveniently iterates over lists.

const .bits = list(0, 1, 2, 4, 5, 7)

for(var .b in .bits)
{
    define byte = 1 << .b
}

This creates six mask bytes according to the bits in the list. Inside the loop the special variable @i is set to the zero-based index to the value in the list.

const .names = list("PICTURE", "GAME", "LEVEL1", "LEVEL2")

for(var .name in .names)
{
    define byte[] = { string(@i), " ", .name, 0 }
}

This will generate data for filenames where each name begins with a number and a space character before the descriptive name.

When the loop starts, a copy of the list will be made. The iteration is done over the copy to avoid problems where the list is accidently modified inside the loop.

Strings can also be iterated over with this form of loop.

const .message = "SECRET"

for(var .char in .message)
{
    define byte[] = { .char ^ $ff }
}

This will store an encoded string where each character has all its bits flipped.

This type of loop can be used on dicts as well like this.

const .colors = dict("black" = 0, "white" = 1, "red" = 2)

for(var .name, .color in .colors)
{
    define byte[] = {.name, 0}
    define byte = .color
}

Note that the iteration over the dict keys and values will not be done in any particular order.

Repeat Loop

The repeat loop is a simplified version of the for loop. It can only repeat itself a fixed number of times and doesn't use a complex exit condition expression. It generates an automatic local label @i as a zero based loop iteration counter.

repeat 5
{
    define byte = @i
}

This defines numbers 0, 1, 2, 3 and 4 in memory.

Break Loops

The break statement can be used in any form of loop to exit it prematurely.

repeat 5
{
    if (@i == 3)
    {
        break
    }
    define byte = @i
}

This defines numbers 0, 1 and 2 in memory since the loop is exited before the end condition is reached.

Macros

Macros are a way to generate adaptable and reusable code blocks. A macro is a function type object which generates its contents where it is invoked. This is much like an inline function or a template function in C++.

macro memset(.addr, .size)
{
    ldx #.size - 1
    {
        sta .addr,x
        dex
        bpl @loop
    }
}

This is a simple macro to generate a loop to clear a block of memory. The arguments are put into the local constants .addr and .size when the macro is invoked.

lda #0
memset(data, sizeof(data))

// ...

reserve byte[55] data

A powerful feature is that the macro can change its behavior based on its arguments. What if the size to clear is 2? A loop wouldn't be very efficient in that case. The macro can be changed to solve this more efficiently.

macro memset(.addr, .size)
{
    if (.size < 4)
    {
        repeat .size
        {
            sta .addr + @i
        }
    }
    elif (.size < 129)
    {
        ldx #.size - 1
        {
            sta .addr,x
            dex
            bpl @loop
        }
    }
    else
    {
        static_assert(false, "memset doesn't support larger sizes... yet.")
    }
}

The loop is unrolled for sizes less than 4, otherwise a loop will be constructed and if the size is too big, the assert triggers. This can be extended to support all sizes optimally and then you will never again need to write a memory clear loop!

Macros can also be locally defined.

{
    macro .write2(.addr)
    {
        sta .addr
        sta .addr + 1
    }
    lda #0
    .write2(ptr1)
    .write2(ptr2)
    .write2(ptr3)
}

Macros are first class objects and therefore they can be stored as constants or variables. They can also be sent as arguments to functions or macros. This enables code injection in macros, using other macros as arguments.

macro print_char()
{
    jsr CHROUT
}

macro print_text(.text, .printer)
{
    ldx #0
    {
        lda .text,x
        beq @continue
        .printer()
        inx
        bne @loop
    }
}

print_text(text1, print_char)
print_text(text2, print_char)
rts

define byte[] text1 = { "wow", 0 }
define byte[] text2 = { "cool", 0 }

Now this print_text macro is generic enough to be reused even with other types of output devices.

Macros can return values if desired. That makes it possible for macros to also act as pure functions if no instructions are generated within them. Values are returned with the return statement that takes an optional expression to return as argument. This macro calculates the screen address based on a screen base address and screen coordinates.

macro screen_pos(.start, .x, .y)
{
    return .start + .x + 40*.y
}

Macros can be called recursively. This example calculates a Fibonacci Sequence using a recursive macro.

macro fibonacci(.value)
{
    if (.value == 0) {
        return 0
    }
    if (.value == 1) {
        return 1
    }
    return fibonacci(.value - 1) + fibonacci(.value - 2)
}

repeat 10
{
    define byte = fibonacci(@i)
}

A macro can be ended early without returning any value like this.

macro send_large(.size)
{
    if (.size < 4) {
        return;
    }
    jsr send_it
}

Note that the return statement must end with a semicolon if no value is to be returned, otherwise an expression is expected.

Alignment

Sometimes code or data needs to be aligned to avoid extra cycles spent on traversing memory block boundaries. This is done with the align statement.

align 256

This will align the program counter so that it ends up where the address modulo 256 is 0. In code sections, the alignment pads with zeros by default. If you need to pad with something else in code sections you can supply an additional fill byte argument.

align 256, 55

This will fill up the gap to next page boundary with the number 55.

Sometimes it is necessary to ensure that a block of code or data is within a 256 byte page to avoid extra cycles being spent on indexing or branch instructions. The following macro can be used to verify that a memory block is within alignment.

// Check that the code/data between .start and .end won't cross an alignment border.
macro assert_within_alignment(.start, .end, .alignment)
{
    static_assert(.end - .start >= 0, "Content wraps around")
    static_assert(.end - .start <= .alignment, "Content too large for alignment")
    if (.end - .start > 0) {
        static_assert(.start/.alignment == (.end - 1)/.alignment, "Content crosses alignment")
    }
}

When a code block needs to be within a 256 byte page to work, this macro can be used to automatically add dummy data at the macro location until the code block (that needs to be placed later in the same section) is within the alignment boundaries.

// Makes sure the area between .start and .end is within an alignment block.
// It adds bytes of data at the macro position to make sure the area isn't
// crossing any alignment boundaries.
macro align_within_page(.start, .end, .alignment, .fill_byte)
{
    const .size = .end - .start
    static_assert(.relative_position <= .start, "Macro must be placed earlier in memory")
    static_assert(.size >= 0, "Content wraps around")
    static_assert(.size <= .alignment, "Content too large for alignment")
    if (.size > 0) {
        // This is quite tricky because the code cannot make calculations based
        // on the aligned position since it will cause variable oscillation. Instead
        // it is using the distance to the area to calculate the unaligned position
        // and base the alignment size on that.
        const .distance = .start - .relative_position
        const .start_before = * + .distance
        const .end_before = .start_before + .size
        const is_aligned = .start_before/.alignment == (.end_before - 1)/.alignment
        if (!is_aligned) {
            repeat .alignment - modulo(.start_before, .alignment) {
                define byte = .fill_byte
            }
        }
    }
.relative_position:
}