Garter – assembly

gasm

Garter Assembly is the intermediate language of the Garter toolchain. Each instruction in Garter Assembly very closely mirrors the real machine code instructions while remaining platform agnostic, and unlike many other assembly languages, gasm has separate names for separate instructions so there's wondering what opcode move relates to.

Writing a simple program

At bare minimum, a *.gasm assembly file contains an executable section, an entry point, and some instructions. The simplest Garter Assembly program looks something like

section executable
entry start:
    immediate ar 0
    syscall exit ar

Data, Sections, and Branching

Garter Assembly has an extremely lax syntax. Whitespace and endlines are totally ignored, symbols like commas are totally optional, and most language keywords are full-word names so there's no having to look up what the specific shorthand you need to know is.

You can make inline comments with // and multiline comments beginning with /* and ending with */

Sections

When your program is loaded into a computer's memory, that computer needs to know what parts of the program are executable, writable, and readable, and that's the job of sections. In Garter Assembly, we declare our sections with the section keyword, followed by the permissions that section should have: readable, writable, and executable.

Variables and Labels

Instructions and data are, really, the same thing, and labels tell us (and the assembler) where to find them. Labels are nicknames for addresses, numbers that we give names so that

our code is easier to read, and
we don't have to manually input and constantly update hard-coded addresses,

but under the hood, labels just get replaced with the numberical positions they represent during assembly. They're allowed to contain any alphanumeric characters plus $ and _, but they may not begin with a number, and we also can't redefine any keywords of the Garter Assembly language. To declare a label, we just use the name outside of an instruction. Optionally, you can follow the definition of a label with a : for readability, but it is not required.

entry is a special label that is expected by the linker to build an executable; it tells the linker where the instruction pointer should be when your program is first launched.

If you'd like multiple labels to refer to the same location, just put multiple labels back to back like I did in the Writing a Simple Program example.

Data

To define non-instruction data in Garter, you say the type:

unsigned integer types (u8, u16, u32, u64) or uint for the largest available
signed integer types (i8, i16, i32, i64) or int for the largest available
floating point types (f32, f64) or float for the largest available

and then you just list the values off. Data may be listed in the following formats:

hexidecimal (prefix 0x0 or postfix 0h notation)
decimal (0)
floating point (0.0 with an optional postfix 0.0f)
string data (using single ', double ", or tick ` quotes)

If you wanted to define a C-style string in memory, that would look something like

cstring u8 "Hello, World!\n" 0

Logic and Branching

On most machines, branching is done by doing a compare which sets the zero and carry flags in the CPU, and then calling a conditional-branch instruction which, given a label, will either jump or noop depending on those flags. First we compare the values of two registers, then we use the appropriate branch instruction to jump (or not jump) to a given label. The compare instruction only accepts registers as arguments.

Here's a list of all the branch instructions and what they do:

goto (goto label always)
if (if a == b, goto label)
not (if a != b, goto label)
gt (if a > b, goto label)
lt (if a < b, goto label)
ge (if a >= b, goto label)
le (if a <= b, goto label)
gotor (goto address in register)

Instruction Set & Syntax

Most assembly instructions are going to act upon either a label or a register. Garter Assembly gives us just a few registers to work with:

general purpose registers ar, br, cr, and dr
the stack-pointer register sr

Move Instructions

All instructions are written instruction {base} {source} To get data into or out of the general purpose registers, you can use the move instructions:

move (register) (register): copy the of the source register to the base register
load (register) (register): copy the value at the source address into the base register; optionally, you can specify the size like load u16 ar br
store (register) (register): copy the value of the base register into the source address; optionally, you can specify the size like store u8 ar br
immediate (register) (label or literal): set the value of the base register to the literal value or label address

Stack Instructions

push (register): pushes the value of the base register onto the stack
pop (register): puts the value at the top of the stack into the base register

Integer Maths

The result of any arithmetic operations will be in the ar register

add (base + source)
sub (base - source)
mul (base * source)
div (base / source, remainder will be in dr)
left (base << source)
right (base >> source)
and (base & source)
or (base | source)
xor (base ^ source)

Floating-point Maths

fadd (base + source)
fsub (base - source)
fmul (base * source)
fdiv (base / source)

Number conversion

These each accept one register as an argument

fload (bitwise move from base to floating-point equivalent register)
fstore (bitwise move to base from floating-point equivalent register)
fcast (cast an integer from base to a floating-point value in the floating-point equivalent register)
fint (cast a floating-point value to an integer in the equivalent register)

System Interrupts

Garter Assembly takes a few common system calls and gives us wrappers so we can write the logic once, and it will put the arguments in the right order for various architectures so your code is neater and cross-platform. Different syscalls expect different arguments. NOTE: the syscall keyword is the only one that does not directly map to a specific instruction or set of instructions in native machine code; it is more akin to a preprocessor directive.

syscall open: filename (register or label), mode flags (literal or register); puts a file descriptor in ar or negative on failure
syscall close: file descriptor (register); puts 0 in ar or negative on failure
syscall read: file descriptor (register), buffer (label or register), max amount (literal or register); puts number of bytes read in ar or -1 on failure
syscall write: file descriptor (register), buffer (label or register), max amount (literal or register); puts number of bytes written in ar or -1 on failure
syscall break: address (literal or register); if the value is 0, returns the current program break address; if not, puts the break address in ar or -1 on failure
syscall exit: exit code (literal or register); terminates the process with the specified exit code

If you'd prefer to do things the "old fashioned" way, you can use the interrupt keyword. Keep in mind that interrupt takes no arguments on 64-bit systems, but on some 32-bit systems requires an interrupt code (literal) as an argument.

Offsets, Structs, and the Preprocessor

Basic arithmetic (+, -, *, and /) may be done by the preprocessor on literal values and label addresses in-line instructions, and you can use the preprocessor #define directive to replace literal values with a name, so if you wanted to access a value from within a C struct

typedef struct {
    uint32_t bar;
    uint8_t baz;
} foo_t;

that might look like

#define foo_bar 0
#define foo_baz 4
load ar foo_label+foo_baz

Note that only literal values and labels can have arithmetic performed on them.