Procedures

Procedure definitions are program-level constructs for defining functions and other global constants.

In common usage, the term “procedure” is synonymous with “function definition”, because most procedures define functions. Mars also lets you use procedures to specify other global constants.

All procedures create a global constant, which is just a non-modifiable variable with a type and a value. The nature of the global constant depends on the form of the procedure.

There are two forms of procedure:

  • A procedure with parameters is a function definition. These create global constants which are functions. The global constant’s value may be passed around without executing the procedure. When the value is applied, the procedure is executed.
  • A procedure without parameters is a computable global constant. These create global constants with arbitrary types. At runtime, the value of the global constant is computed by executing the procedure (it is the value returned by the procedure).

Note

A computable global constant is not necessarily not a function. It may have a function type, if the procedure itself returns a function value.

Procedure definitions have the following syntax:

proceduredef ::=  "def" proc_head ":" NEWLINE
                      INDENT var_decl* statement+ DEDENT
                  | "def" proc_head "=" expression
                  | "def" "native_import" [native_alias] proc_head

Procedure header

The procedure header provides the public interface to the procedure:

proc_head   ::=  proc_name [proc_params] "::" ["io"] type
proc_params ::=  "(" [proc_param ("," proc_param)*] ")"
proc_param  ::=  param_name "::" type
proc_name   ::=  lower_identifier
param_name  ::=  lower_identifier

The procedure header specifies the following information:

  • The name of the procedure.
  • Whether the procedure has parameters (ie. whether it is a function definition or a computable global constant definition).
  • The name and type of each formal parameter.
  • Whether the procedure is allowed to perform I/O side-effects.
  • The return type of the procedure.

Functions may optionally have zero or more parameters.

Computable global constants

A procedure without parameters (a computable global constant) has a head of the following form:

def f :: t

This defines a global constant of type t. The constant’s value is the result of evaluating the procedure. The time at which the procedure is evaluated is unspecified. It may be evaluated once at load time, once when it is first used, or upon each usage (depending on the implementation).

It is a compile-time error for a computable global constant to use the io declaration.

Note

Because of the fact that the evaluation time is undefined, computable global constant definitions should not perform side-effects or produce errors. If a zero-parameter procedure is to perform side-effects, it should be declared as a function with zero arguments (a nullary function), thereby making its evaluation time explicit – this is why io is forbidden in constants.

Function definitions

A procedure with parameters (a function definition) has a head of the following form:

def f(a1 :: t1, a2 :: t2, ..., an :: tn) :: t

where n >= 0.

This defines a global constant of type (t1, t2, ..., tn) -> t. The constant’s value is a function object which executes the body of the procedure when all arguments are applied.

Unlike computable global constants, the body of a function definition is not executed until it is explicitly applied.

A function with zero arguments is a nullary function.

Note

It is legal, and useful, to define a nullary function (with a head of the form def f() :: t). This is distinct from a computable global constant, because it will have type () -> t, not just t, and will require explicit application, of the form f().

The names a1, a2, ..., an are the names of the function’s formal parameters. These names are not part of the function’s interface (they make no difference to the caller). They are used in the function body, as the names of local-scoped variables of the specified type, and take the value of the corresponding actual parameter at runtime.

It is a compiler error for two parameters to have the same name.

A function may be declared with the io keyword, in the following form:

def f(a1 :: t1, a2 :: t2, ..., an :: tn) :: io t

where n >= 0.

This defines a global function of type (t1, t2, ..., tn) -> io t. The body of such a function will be permitted to perform I/O effects.

Procedure body

There are two ways to define a procedure – as a single expression, or with a full body.

The “full body” syntax is the most general, and allows for procedures with local variables, and an arbitrary number of statements. It is written in the form:

def f(args) :: [io] rettype:
    declaration 1
    ...
    declaration n

    statement 1
    ...
    statement n

The “single expression” syntax is a short-hand notation, only allowing procedures with no local variables, and a single expression to be evaluated. It is written in the form:

def f(args) :: [io] rettype = expression

This is syntactic sugar for:

def f(args) :: [io] rettype:
    return expression

We say that a procedure’s body is “in an I/O context” if and only if the procedure was declared with the io keyword. It follows that all of the statements and expressions within such a procedure are “in an I/O context” as well. This has an implication for the static semantics of function application expressions.

Local variables

Mars distinguishes between local and global variables. All local variables are scoped to the entire procedure body (there is no block scope). A name refers to a local variable if and only if it:

  • Is a formal parameter, or
  • Is explicitly declared with the var keyword, or
  • Appears on the left-hand side of an assignment statement, or
  • Appears in a pattern.

Local variables may be given the same name as a global constant or function. In this case, the local variable shadows the global constant of the same name.

Local variables may optionally be explicitly declared. For simplicity, all declarations must appear before any statements. Declarations have the following syntax:

var_decl ::=  "var" var_name "::" type NEWLINE
var_name ::=  lower_identifier

Note that formal parameters are considered local variables, and must not be re-declared. It is a compiler error if a variable is declared with the same name as a formal parameter, or if two local variables are declared with the same name.

It is currently an error to declare a local variable with a type which includes a type variable not found in the procedure header. This is generally not a useful thing (since type variables are monomorphic; see type unification), but if it is desired, the variable cannot be explicitly declared.

Local variables (other than parameters) are not initialised with any particular value. The compiler is required to guarantee that variables are not used uninitialised (see semantic errors).

Native imports

A native procedure is special type of procedure, the body of which was written in a language other than Mars (a so-called “foreign function”). Native procedures are typically written in C, but may be written in any language which can be compiled into executable native code in a shared library and conforms to the standard calling conventions of the platform (for other languages, C can typically used as a bridging language). Note that this tends to exclude C++, as it has a complex calling convention.

Native procedures must be explicitly imported before they can be used, using the native_import form of procedure declaration. This form of procedure declaration has no body (there is no ":" or "=" part), and allows an extra production:

native_alias ::=  string_literal

The proc_name of a native procedure serves two distinct purposes: as with all procedures, it defines the name of a global constant which contains the function, but it also denotes the name of the procedure to look up in the foreign library’s symbol table. For example:

def native_import puts(s :: CPtr) :: CInt

The above code defines a Mars function called puts. At link time, Mars will search for a function called “puts” in all of the linked libraries, which will serve as the implementation for this function.

If a native_alias is supplied, then that will serve as the name of the procedure to look up in the symbol table instead. This allows you to define a function with a different Mars name than it has in the symbol table, which can be necessary if the name is already used in the Mars program, or the name is an illegal Mars identifier (e.g., if it starts with an uppercase letter). For example:

def native_import "def" ghi(x :: CInt) :: CInt

The above code defines a Mars function called ghi. At link time, Mars will search for a function called “def” in all of the linked libraries.

It is a type error for a native procedure to have a non-native parameter or return type. Native procedures should be defined with the same number and types of arguments as the function’s original definition in the foreign language, but behaviour is undefined if the wrong number of arguments are listed (as Mars cannot check how many arguments are expected).

Native procedures which do not take arguments must be declared as zero-arity functions (with parentheses), not computable global constants. Of course, it is possible to write a normal computable global constant which calls the native function, so it is still possible to cache the result of a native function if desired.

Warning

Use of native procedures must be handled with care, because they are not type-safe at all. Even if a function is declared with the correct number of arguments, Mars cannot guarantee type safety. If the function is expecting a pointer in any argument position, you must pass a valid pointer of the correct type, or behaviour is undefined.

It is currently not possible to declare a native function with a different number or type of parameters to an earlier declaration, including by the Mars runtime itself! A key example is the C standard printf function, which can theoretically be declared with any number of parameters, but which the Mars runtime already declares with one parameters. Therefore, it is not possible to declare printf with more than one parameters.

It is recommended that all native procedures be named starting with two underscores (__), and that a wrapping procedure be supplied which takes proper Mars types as arguments, converts them to CInt as required, calls the native procedure, converts the result to a safe Mars type, and returns it, to avoid exposing low-level error-prone code to the rest of the program.

Note

It is a linker error if the native function cannot be found in any linked object file. In mars, this error will be shown at load time. However, marsc does not check for the presence of functions; the linker error will surface when you link the program into an executable.

Procedure evaluation

The evaluation of procedures is the heart of the Mars execution model. Mars procedures are evaluated eagerly, and arguments are passed by-value.

Global constant references

If a global constant is named in a variable reference expression, the expression must evaluate to a first-class value.

If the constant is a computable global constant, that value must be the result of evaluating the procedure (which should be a constant [1]). The time at which the procedure is evaluated, and the number of times it is evaluated, is unspecified, and left up to the implementation. Implementations may (but are not limited to) choose one or more of the following evaluation strategies:

  • Evaluate some computable global constants at compile time [2].
  • Evaluate all computable global constants at program load time.
  • Evaluate each computable global constant when it is first referenced, and memoize the result.
  • Evaluate each computable global constant every time it is referenced.

If the constant was defined with a function definition, the expression’s value is a function object which will call the specified function when applied with all parameters.

Function objects and currying

A function object is an opaque first-class value of type (t1, t2, ..., tm) -> t, where m >= 0. Internally, function objects have n formal parameters, the first n - m of which are bound to actual values, and the remaining m of which are unbound (where 0 <= m <= n).

Since Mars features partial function application, a mechanism known as currying takes place. This mechanism is as follows:

When a function is partially applied (using the ... notation), it is applied to i parameters, where i <= m. No evaluation takes place. Instead, the result is of the application is the given function object, but with its first i unbound formal parameters bound to the given actual parameters. The result has type (ti+1, ti+2, ..., tm) -> t.

When a function is fully applied, it must be applied to exactly m parameters. The result of the application is the evaluation of the function with the final m formal parameter bound to the given actual parameters. The result has type t.

Parameter binding

Functions have n formal parameters. A formal parameter is a special local variable, which is bound to a value before the execution of the function’s body. The value which a formal parameter is bound to is called the actual parameter.

Formal parameters may be bound well in advance of the actual evaluation of the function, due to the currying mechanism outlined above. The bound values must be stored in an implementation-defined manner.

As Mars is a call-by-value language, actual parameters are evaluated before being bound to formal parameters. The binding process is to first evaluate the actual parameter expression, and then bind its value to the formal parameter name. The binding is the same as for an assignment statement – it is shallow, and the formal parameter has the same identity as the actual parameter.

The call-by-value mechanism guarantees that direct modification of the value will not affect the caller. Specifically, if the callee re-assigns one of its formal parameters with a new value, that change shall not affect the original value from which the parameter was derived. Nor shall it affect the binding in the callee’s function object (so subsequent uses of the same function object will keep the original binding).

Note

This does not hold up for indirect modification of a value. Mars allows impure constructs such as array element and field update. Since the formal parameter has the same identity as the original actual parameter, making destructive updates to a record or array in the formal parameter will also update the actual parameter, along with any other aliases.

Body execution

The evaluation of a procedure is a matter of executing each statement in the procedure’s body sequentially, according to the execution rules for that kind of statement, until a return statement is encountered.

The return statement’s expression is evaluated, and its value is used as the result of the expression which caused the procedure to be evaluated.

Semantic errors

There are several cases which cause a procedure to be rejected at compilation time:

  • If there is any control-flow path which results in a local variable being referenced before being assigned, the compiler MUST reject the procedure and the program. This doesn’t apply to formal parameters, which are always bound.
  • If there is any control-flow path which does not result in a return statement being encountered, the compiler MUST reject the procedure and the program.
  • If a non-io procedure calls a function with an io type.

Warning

The second rule is perhaps over-zealous, as it forbids control-flow paths which raise errors but do not return values. A more detailed analysis should be proposed, including noting which procedures always produce errors.

Footnotes

[1]The result of a computable global constant should be a constant, but might not be, if the procedure performs side-effects such as user input. The behaviour of computable global constants with side-effects is undefined, so for all intents and purposes, “computable global constants” may indeed simply be treated as constants.
[2]Implementors are to keep in mind that compilation must terminate in a finite amount of time, while computable global constants may not terminate. Therefore, this evaluation strategy must only be employed conservatively, if evaluation is guaranteed to terminate.