.. Mars Documentation Copyright (C) 2011 Matt Giuca .. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. .. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. .. You should have received a copy of the GNU General Public License along with this program. If not, see . .. _ref-isa: Instruction set architecture ============================ .. sectionauthor:: Matt Giuca Mars internally compiles source code down to "Mars internal representation" (Mars IR) for processing and code generation. Mars IR is a very high-level instruction set (more abstract than, say, the Java virtual machine), but still is broken down much further than the Mars source code. Very limited processing takes place before Mars source code is converted into Mars IR. Syntax checking and type checking is done on the Mars source text or abstract syntax tree formats, and then it is converted into Mars IR. All of the subsequent analysis and code generation is done on Mars IR. All backends use the Mars IR format as input, except the pretty printer. Mars IR has the following properties: * Preserves Mars types: Functions in Mars IR have the same headers as in Mars itself. Therefore, all types are the same, and Mars IR is fully aware of type polymorphism. Similarly, it is aware of CGCs, algebraic data types and constructors. * Flattened execution: There are no statements or expressions, only instructions. Instructions are similar to statements, but they do not nest; they typically perform the role of a single expression, and assign the result to a new variable. Arguments to instructions are usually local variable names (for example, most instructions cannot refer to global variables; only special instructions designed for this purpose). * Control flow graph: Functions are stored internally as a set of basic blocks. Each basic block is a sequence of instructions without any branches, and the last instruction in the block (the "terminator") is a special instruction that can return from the function or branch to another block. This is similar to labels-and-jumps assembly code, except that jumps between blocks are explicit; it is not possible to "fall" from one block to the next (and as a corollary, it is not possible to jump into the middle of a block). * Explicit "return" variable: Each function has a specific local variable designated the "return" variable. The result of the function is the value of this variable when the function ends. * Explicit entry and exit block: Each function has a specific block designated the "entry" block, which is the block that is executed initially. Perhaps less conventionally, there is a specific block designated the "exit" block. There is no "return" instruction; the function execution ends when control falls out of the "exit" block. * Static single assignment: All local variables may be assigned at exactly one place (statically) in the code. If a variable needs to be "re-assigned," instead a new variable must be created with a qualified name (for example, if `x` is already assigned, assign `x:1`). A special "phi" instruction is provided to allow a variable to take on the value of one of many source variables, to deal with branches and loops which assign variables. An informal textual representation of Mars IR is "Mars assembly" (MAS). The :program:`marsc` program can output MAS representation using :option:`--backend=asm `. This is "informal" as it doesn't use the official instruction names (it tries to look more like Mars syntax), and there is no parser for MAS; it's just a debugging format. Internally, Mars IR is referred to as "CFG representation" (control-flow graph), due to the main data structure used. The ``ast_cfg`` module is responsible for converting parsed Mars source code (in AST or abstract syntax tree form) into Mars IR, as well as doing some semantic checks such as initialised variable checks. There is no specific binary representation for Mars IR. What follows is a description of each of the Mars IR instructions, which appear within basic blocks within functions and CGCs. They are separated into three categories, which determine the context in which the instructions are allowed. ``phi`` instruction ------------------- The ``phi`` instruction is a special instruction that may only appear at the beginning of a basic block, before any other instructions. Each block may have zero or more phi instructions. :samp:`phi {dest} {sources}` Copies the value of a variable in `sources` to `dest`, depending on which block the execution just came from. `sources` is a list of (`block` - `variable`) pairs, which MUST contain an entry for each predecessor block to the current block. At runtime, the block that was executing previously before jumping to the current block will be selected from the list, and the corresponding `variable` will be chosen, and its value copied into `dest`. This instruction is necessary due to the static single assignment form of Mars code. Since it is not possible to assign the same variable in two places (e.g., `a`), the code must assign two different variables (e.g., `a:1` and `a:2`). In order to use the value of this variable no matter where it was assigned, a ``phi`` should create another variable (e.g., `a:3`) from the original two variables. Basic instructions ------------------ The basic instructions form the body of a basic block. Each block may have zero or more basic instructions. Many instructions accept an *atom*, which refers to one of the following: * A variable name, * A number (double-precision floating point), or * A constant symbol (parameterless constructor). This MUST be the name of a constant symbol, not a constructor function. :samp:`nop` Does nothing. :samp:`mov {dest} {source}` Copies the value of atom `source` to variable `dest`. :samp:`ld_arraylit {dest} {sources}` Creates a new array, with the values of the variables listed in `sources` as its elements, and assigns it to `dest`. `sources` is a list of atoms. :samp:`ld_cgc {dest} {cgc}` Loads the value of a computable global constant `cgc` to `dest`. This may execute the body of the CGC, which may cause side-effects or fail to return. :samp:`ld_field {dest} {source} {ctor} {index}` Loads a field of an object, by index of a given constructor. Fields are indexed by parameters of the constructor used to create the object (0 is the first parameter, etc). Copies the value of field number `index` from variable `source` to `dest`. `source` MUST be a variable of an algebraic data type. It MUST be statically known to have been constructed with constructor `ctor`. This constructor MUST have more than `index` parameters. This is a very low-level operation compared to Mars' :ref:`field reference ` expression, because it assumes that the constructor is known. It is illegal to use this on an arbitrary variable of an algebraic data type with multiple constructors, as the constructor will not be known. To implement field reference, first, perform a ``switch`` on the constructor, then use ``ld_field`` within each branch of the switch. ``ld_field`` is also used to implement variable bindings in pattern matches, where the constructor is already known. (The Mars compiler creates special Mars IR functions for high-level field access, which internally perform the ``switch`` and ``ld_field`` described above.) :samp:`fieldset {source} {ctor} {index} {value}` Destructively modifies a field of an object, by index of a given constructor. Fields are indexed by parameters of the constructor used to create the object (0 is the first parameter, etc). Copies the value of atom `value` to field number `index` of variable `source`, mutating the object referenced by `source`. `source` MUST be a variable of an algebraic data type. It MUST be statically known to have been constructed with constructor `ctor`. This constructor MUST have more than `index` parameters. See the notes for ``ld_field``. :samp:`fieldreplace {dest} {source} {ctor} {index} {value}` Copies an object, changing a single field in the resulting object, by index of a given constructor. Fields are indexed by parameters of the constructor used to create the object (0 is the first parameter, etc). Creates a new object with constructor `ctor`. Initialises all fields of the new object with the values of the fields in `source`, except for field number `index`, which is initialised to the value of atom `value`. Assigns the new object to `dest`. `source` MUST be a variable of an algebraic data type. It MUST be statically known to have been constructed with constructor `ctor`. This constructor MUST have more than `index` parameters. See the notes for ``ld_field``. :samp:`new_closure {dest} {func} {args}` Creates a new closure object from a closure template function `func`. Curries arguments `args` into the template's closure vars, assigning the result to `dest`. Constructs a new function object with the parameters and result of the closure template. `func` MUST be the name of a closure template. `args` is a list of atoms containing the arguments. It MUST contain the exact number of closure variables expected by `func`, and any supplied arguments must be of correct types. Ordinary functions cannot be converted into first-class function objects; to accomplish this, a closure template with no closure variables must be created, and loaded with ``new_closure`` and no `args`. Therefore, backends SHOULD optimise for the special case where `args` is empty, to avoid constructing a new closure object where possible. :samp:`call {dest} {func} {args}` Calls a function from local variable `func`, supplying arguments `args` and assigning the result to `dest`. `func` MUST be a variable containing a function value. `args` is a list of atoms containing the arguments. It MUST contain the exact number of arguments expected by `func`, of correct types. The execution may cause side-effects or fail to return. :samp:`call_ctor {dest} {ctor} {args}` Calls constructor function `ctor`, supplying arguments `args` and assigning the result to `dest`. `ctor` MUST not be the name of a constant symbol. `args` is a list of atoms containing the arguments. It MUST contain the exact number of arguments expected by `ctor`, of correct types. :samp:`call_global {dest} {func} {args}` Calls global function `func`, supplying arguments `args` and assigning the result to `dest`. `func` MUST not be the name of a computable global constant or constructor. `args` is a list of atoms containing the arguments. It MUST contain the exact number of arguments expected by `func`, of correct types. The execution may cause side-effects or fail to return. .. Terminator instructions ----------------------- The terminator instructions may only appear at the end of a basic block, after any other instructions. These instructions are responsible for control flow decisions, about where to jump to after the current block. Each block must have precisely one terminator instruction at the end. :samp:`switch {control} {cases} [{default}]` Conditional branch based on the value of the `control` variable. Matches either numbers (if `control` has type :type:`Num`) or algebraic data type constructors. `cases` is a list of (`num-or-constructor-name` - `block`) pairs. If `control` matches one of the `cases`, branches to that basic block. Otherwise, branches to `default` (which is a block ID). `default` is optional, but MUST NOT be omitted unless the `control` is an algebraic type, and the `cases` cover all of the type's constructors (in particular, it MUST NOT be omitted for switches over numbers). Two cases MUST NOT have the same case value. This is a very low-level operation compared to Mars' :ref:`switch ` statement, because it cannot bind variables or match recursively. The Mars compiler handles variable bindings using the ``ld_field`` instruction. See :ref:`switch factoring ` for the rather complicated algorithm that performs this translation. :samp:`cond_branch {if} {then} {else}` Conditional branch based on the value of the `if` atom, which MUST be an :type:`Num`. If `if` is not equal to 0, branches to the `then` block. Otherwise, branches to the `else` block. :samp:`branch {target}` Unconditionally branches to the `target` block. :samp:`null_terminator` Does nothing. Note that this **does not** imply a fall-through to the subsequent block. This instruction MUST NOT appear in any block other than a) the function's designated "exit block", or b) a block which is statically unreachable. In the case of the exit block, this instruction signals the end of the function's execution.