.. Mars Documentation
Copyright (C) 2011 Matt Giuca
.. This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.. This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.. You should have received a copy of the GNU General Public License
along with this program. If not, see .
.. _ref-isa:
Instruction set architecture
============================
.. sectionauthor:: Matt Giuca
Mars internally compiles source code down to "Mars internal representation"
(Mars IR) for processing and code generation. Mars IR is a very high-level
instruction set (more abstract than, say, the Java virtual machine), but still
is broken down much further than the Mars source code.
Very limited processing takes place before Mars source code is converted into
Mars IR. Syntax checking and type checking is done on the Mars source text or
abstract syntax tree formats, and then it is converted into Mars IR. All of
the subsequent analysis and code generation is done on Mars IR. All backends
use the Mars IR format as input, except the pretty printer. Mars IR has the
following properties:
* Preserves Mars types: Functions in Mars IR have the same headers as in Mars
itself. Therefore, all types are the same, and Mars IR is fully aware of
type polymorphism. Similarly, it is aware of CGCs, algebraic data types and
constructors.
* Flattened execution: There are no statements or expressions, only
instructions. Instructions are similar to statements, but they do not nest;
they typically perform the role of a single expression, and assign the
result to a new variable. Arguments to instructions are usually local
variable names (for example, most instructions cannot refer to global
variables; only special instructions designed for this purpose).
* Control flow graph: Functions are stored internally as a set of basic
blocks. Each basic block is a sequence of instructions without any branches,
and the last instruction in the block (the "terminator") is a special
instruction that can return from the function or branch to another block.
This is similar to labels-and-jumps assembly code, except that jumps between
blocks are explicit; it is not possible to "fall" from one block to the
next (and as a corollary, it is not possible to jump into the middle of a
block).
* Explicit "return" variable: Each function has a specific local variable
designated the "return" variable. The result of the function is the value of
this variable when the function ends.
* Explicit entry and exit block: Each function has a specific block designated
the "entry" block, which is the block that is executed initially. Perhaps
less conventionally, there is a specific block designated the "exit" block.
There is no "return" instruction; the function execution ends when control
falls out of the "exit" block.
* Static single assignment: All local variables may be assigned at exactly one
place (statically) in the code. If a variable needs to be "re-assigned,"
instead a new variable must be created with a qualified name (for example,
if `x` is already assigned, assign `x:1`). A special "phi" instruction is
provided to allow a variable to take on the value of one of many source
variables, to deal with branches and loops which assign variables.
An informal textual representation of Mars IR is "Mars assembly" (MAS). The
:program:`marsc` program can output MAS representation using
:option:`--backend=asm `. This is "informal" as it doesn't
use the official instruction names (it tries to look more like Mars syntax),
and there is no parser for MAS; it's just a debugging format.
Internally, Mars IR is referred to as "CFG representation" (control-flow
graph), due to the main data structure used. The ``ast_cfg`` module is
responsible for converting parsed Mars source code (in AST or abstract syntax
tree form) into Mars IR, as well as doing some semantic checks such as
initialised variable checks. There is no specific binary representation for
Mars IR.
What follows is a description of each of the Mars IR instructions, which
appear within basic blocks within functions and CGCs. They are separated into
three categories, which determine the context in which the instructions are
allowed.
``phi`` instruction
-------------------
The ``phi`` instruction is a special instruction that may only appear at the
beginning of a basic block, before any other instructions. Each block may have
zero or more phi instructions.
:samp:`phi {dest} {sources}`
Copies the value of a variable in `sources` to `dest`, depending on which
block the execution just came from. `sources` is a list of (`block` -
`variable`) pairs, which MUST contain an entry for each predecessor block
to the current block. At runtime, the block that was executing previously
before jumping to the current block will be selected from the list, and
the corresponding `variable` will be chosen, and its value copied into
`dest`.
This instruction is necessary due to the static single assignment form of
Mars code. Since it is not possible to assign the same variable in two
places (e.g., `a`), the code must assign two different variables (e.g.,
`a:1` and `a:2`). In order to use the value of this variable no matter
where it was assigned, a ``phi`` should create another variable (e.g.,
`a:3`) from the original two variables.
Basic instructions
------------------
The basic instructions form the body of a basic block. Each block may have
zero or more basic instructions.
Many instructions accept an *atom*, which refers to one of the following:
* A variable name,
* A number (double-precision floating point), or
* A constant symbol (parameterless constructor). This MUST be the name of a
constant symbol, not a constructor function.
:samp:`nop`
Does nothing.
:samp:`mov {dest} {source}`
Copies the value of atom `source` to variable `dest`.
:samp:`ld_arraylit {dest} {sources}`
Creates a new array, with the values of the variables listed in `sources`
as its elements, and assigns it to `dest`. `sources` is a list of atoms.
:samp:`ld_cgc {dest} {cgc}`
Loads the value of a computable global constant `cgc` to `dest`. This may
execute the body of the CGC, which may cause side-effects or fail to
return.
:samp:`ld_field {dest} {source} {ctor} {index}`
Loads a field of an object, by index of a given constructor. Fields are
indexed by parameters of the constructor used to create the object (0 is
the first parameter, etc).
Copies the value of field number `index` from variable `source` to `dest`.
`source` MUST be a variable of an algebraic data type. It MUST be
statically known to have been constructed with constructor `ctor`. This
constructor MUST have more than `index` parameters.
This is a very low-level operation compared to Mars' :ref:`field reference
` expression, because it assumes that the
constructor is known. It is illegal to use this on an arbitrary variable
of an algebraic data type with multiple constructors, as the constructor
will not be known. To implement field reference, first, perform a
``switch`` on the constructor, then use ``ld_field`` within each branch of
the switch. ``ld_field`` is also used to implement variable bindings in
pattern matches, where the constructor is already known. (The Mars
compiler creates special Mars IR functions for high-level field access,
which internally perform the ``switch`` and ``ld_field`` described above.)
:samp:`fieldset {source} {ctor} {index} {value}`
Destructively modifies a field of an object, by index of a given
constructor. Fields are indexed by parameters of the constructor used to
create the object (0 is the first parameter, etc).
Copies the value of atom `value` to field number `index` of variable
`source`, mutating the object referenced by `source`.
`source` MUST be a variable of an algebraic data type. It MUST be
statically known to have been constructed with constructor `ctor`. This
constructor MUST have more than `index` parameters.
See the notes for ``ld_field``.
:samp:`fieldreplace {dest} {source} {ctor} {index} {value}`
Copies an object, changing a single field in the resulting object, by
index of a given constructor.
Fields are indexed by parameters of the constructor used to create the
object (0 is the first parameter, etc).
Creates a new object with constructor `ctor`. Initialises all fields of
the new object with the values of the fields in `source`, except for field
number `index`, which is initialised to the value of atom `value`.
Assigns the new object to `dest`.
`source` MUST be a variable of an algebraic data type. It MUST be
statically known to have been constructed with constructor `ctor`. This
constructor MUST have more than `index` parameters.
See the notes for ``ld_field``.
:samp:`new_closure {dest} {func} {args}`
Creates a new closure object from a closure template function `func`.
Curries arguments `args` into the template's closure vars, assigning the
result to `dest`. Constructs a new function object with the parameters and
result of the closure template.
`func` MUST be the name of a closure template.
`args` is a list of atoms containing the arguments. It MUST contain
the exact number of closure variables expected by `func`, and any
supplied arguments must be of correct types.
Ordinary functions cannot be converted into first-class function objects;
to accomplish this, a closure template with no closure variables must be
created, and loaded with ``new_closure`` and no `args`.
Therefore, backends SHOULD optimise for the special case where `args` is
empty, to avoid constructing a new closure object where possible.
:samp:`call {dest} {func} {args}`
Calls a function from local variable `func`, supplying arguments `args`
and assigning the result to `dest`.
`func` MUST be a variable containing a function value. `args` is a list of
atoms containing the arguments. It MUST contain the exact number of
arguments expected by `func`, of correct types.
The execution may cause side-effects or fail to return.
:samp:`call_ctor {dest} {ctor} {args}`
Calls constructor function `ctor`, supplying arguments `args` and
assigning the result to `dest`.
`ctor` MUST not be the name of a constant symbol.
`args` is a list of atoms containing the arguments. It MUST contain
the exact number of arguments expected by `ctor`, of correct types.
:samp:`call_global {dest} {func} {args}`
Calls global function `func`, supplying arguments `args` and assigning the
result to `dest`.
`func` MUST not be the name of a computable global constant or
constructor.
`args` is a list of atoms containing the arguments. It MUST contain
the exact number of arguments expected by `func`, of correct types.
The execution may cause side-effects or fail to return.
..
Terminator instructions
-----------------------
The terminator instructions may only appear at the end of a basic block, after
any other instructions. These instructions are responsible for control flow
decisions, about where to jump to after the current block. Each block must
have precisely one terminator instruction at the end.
:samp:`switch {control} {cases} [{default}]`
Conditional branch based on the value of the `control` variable. Matches
either numbers (if `control` has type :type:`Num`) or algebraic data
type constructors. `cases` is a list of (`num-or-constructor-name` -
`block`) pairs.
If `control` matches one of the `cases`, branches to that basic block.
Otherwise, branches to `default` (which is a block ID).
`default` is optional, but MUST NOT be omitted unless the `control` is an
algebraic type, and the `cases` cover all of the type's constructors
(in particular, it MUST NOT be omitted for switches over numbers).
Two cases MUST NOT have the same case value.
This is a very low-level operation compared to Mars' :ref:`switch
` statement, because it cannot bind variables or
match recursively. The Mars compiler handles variable bindings using the
``ld_field`` instruction. See :ref:`switch factoring `
for the rather complicated algorithm that performs this translation.
:samp:`cond_branch {if} {then} {else}`
Conditional branch based on the value of the `if` atom, which MUST be
an :type:`Num`. If `if` is not equal to 0, branches to the `then` block.
Otherwise, branches to the `else` block.
:samp:`branch {target}`
Unconditionally branches to the `target` block.
:samp:`null_terminator`
Does nothing. Note that this **does not** imply a fall-through to the
subsequent block. This instruction MUST NOT appear in any block other than
a) the function's designated "exit block", or b) a block which is
statically unreachable. In the case of the exit block, this instruction
signals the end of the function's execution.