.. Mars Documentation
   Copyright (C) 2008 Matt Giuca
   11/12/2008

.. This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.

.. This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

.. You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.

.. index:: keyword: native_import
.. _ref-procedures:
.. _native_import:

**********
Procedures
**********

.. sectionauthor:: Matt Giuca <matt.giuca@gmail.com>

Procedure definitions are program-level constructs for defining functions and
other global constants.

In common usage, the term "procedure" is synonymous with "function
definition", because most procedures define functions. Mars also lets you
use procedures to specify other global constants.

All procedures create a global constant, which is just a non-modifiable
variable with a type and a value. The nature of the global constant depends on
the form of the procedure.

There are two forms of procedure:

* A procedure with parameters is a **function definition**. These create
  global constants which are functions. The global constant's value may be
  passed around without executing the procedure. When the value is applied,
  the procedure is executed.
* A procedure without parameters is a **computable global constant**. These
  create global constants with arbitrary types. At runtime, the value of the
  global constant is computed by executing the procedure (it is the value
  returned by the procedure).

.. note::

   A computable global constant is not necessarily *not a function*. It may
   have a function type, if the procedure itself returns a function value.

Procedure definitions have the following syntax:

.. productionlist::
   proceduredef: "def" `proc_head` ":" NEWLINE
               :     INDENT `var_decl`* `statement`+ DEDENT
               : | "def" `proc_head` "=" `expression`
               : | "def" "native_import" [`native_alias`] `proc_head`

.. index:: keyword: def
.. _ref-procedure-header:
.. _def:
.. _io:

Procedure header
================

The procedure header provides the public interface to the procedure:

.. productionlist::
   proc_head: `proc_name` [`proc_params`] "::" ["io"] `type`
   proc_params: "(" [`proc_param` ("," `proc_param`)*] ")"
   proc_param: `param_name` "::" `type`
   proc_name: `lower_identifier`
   param_name: `lower_identifier`

The procedure header specifies the following information:

* The name of the procedure.
* Whether the procedure has parameters (ie. whether it is a function
  definition or a computable global constant definition).
* The name and type of each formal parameter.
* Whether the procedure is allowed to perform I/O side-effects.
* The return type of the procedure.

Functions may optionally have zero or more parameters.

Computable global constants
---------------------------

A procedure without parameters (a **computable global constant**) has a head
of the following form::

    def f :: t

This defines a global constant of type :type:`t`. The constant's value is the
result of evaluating the procedure. The time at which the procedure is
evaluated is unspecified. It may be evaluated once at load time, once when it
is first used, or upon each usage (depending on the implementation).

It is a compile-time error for a computable global constant to use the
:keyword:`io` declaration.

.. note::

   Because of the fact that the evaluation time is undefined, computable
   global constant definitions should not perform side-effects or produce
   errors. If a zero-parameter procedure is to perform side-effects, it should
   be declared as a function with zero arguments (a nullary function), thereby
   making its evaluation time explicit -- this is why :keyword:`io` is
   forbidden in constants.

Function definitions
--------------------

A procedure with parameters (a **function definition**) has a head of the
following form::

    def f(a1 :: t1, a2 :: t2, ..., an :: tn) :: t

where *n* >= 0.

This defines a global constant of type :type:`(t1, t2, ..., tn) -> t`. The
constant's value is a function object which executes the body of the procedure
when all arguments are applied.

Unlike computable global constants, the body of a function definition is not
executed until it is explicitly applied.

A function with zero arguments is a **nullary function**.

.. note::

   It is legal, and useful, to define a nullary function (with a head of the
   form ``def f() :: t``). This is distinct from a computable global constant,
   because it will have type :type:`() -> t`, not just :type:`t`, and will
   require explicit application, of the form ``f()``.

The names *a1*, *a2*, ..., *an* are the names of the function's **formal
parameters**. These names are not part of the function's interface (they make
no difference to the caller). They are used in the function body, as the names
of local-scoped variables of the specified type, and take the value of the
corresponding actual parameter at runtime.

It is a compiler error for two parameters to have the same name.

A function may be declared with the :keyword:`io` keyword, in the following
form::

    def f(a1 :: t1, a2 :: t2, ..., an :: tn) :: io t

where *n* >= 0.

This defines a global function of type :type:`(t1, t2, ..., tn) -> io t`. The
body of such a function will be permitted to perform I/O effects.

.. _ref-io-context:

Procedure body
==============

There are two ways to define a procedure -- as a single expression, or with a
full body.

The "full body" syntax is the most general, and allows for procedures with
local variables, and an arbitrary number of statements. It is written in the
form::

    def f(args) :: [io] rettype:
        declaration 1
        ...
        declaration n

        statement 1
        ...
        statement n

The "single expression" syntax is a short-hand notation, only allowing
procedures with no local variables, and a single expression to be evaluated.
It is written in the form::

    def f(args) :: [io] rettype = expression

This is syntactic sugar for::

    def f(args) :: [io] rettype:
        return expression

We say that a procedure's body is "in an I/O context" if and only if the
procedure was declared with the :keyword:`io` keyword. It follows that all of
the statements and expressions within such a procedure are "in an I/O context"
as well. This has an implication for the static semantics of :ref:`function
application expressions <ref-function-application-expr>`.

.. index:: keyword: var
.. _var:

Local variables
---------------

Mars distinguishes between local and global variables. All local variables are
scoped to the entire procedure body (there is no block scope). A name refers
to a local variable if and only if it:

* Is a formal parameter, or
* Is explicitly declared with the :keyword:`var` keyword, or
* Appears on the left-hand side of an assignment statement, or
* Appears in a pattern.

Local variables may be given the same name as a global constant or function.
In this case, the local variable shadows the global constant of the same name.

Local variables may optionally be explicitly declared. For simplicity, all
declarations must appear before any statements. Declarations have the
following syntax:

.. productionlist::
   var_decl: "var" `var_name` "::" `type` NEWLINE
   var_name: `lower_identifier`

Note that formal parameters are considered local variables, and *must not* be
re-declared. It is a compiler error if a variable is declared with the same
name as a formal parameter, or if two local variables are declared with the
same name.

.. Launchpad bug #482947:

It is currently an error to declare a local variable with a type which
includes a type variable not found in the procedure header. This is generally
not a useful thing (since type variables are monomorphic; see :ref:`type
unification <ref-type-unification>`), but if it is desired, the variable
cannot be explicitly declared.

Local variables (other than parameters) are not initialised with any
particular value. The compiler is required to guarantee that variables are not
used uninitialised (see :ref:`semantic errors <ref-semantic-errors>`).

.. _ref-native-imports:

Native imports
==============

A *native procedure* is special type of procedure, the body of which was
written in a language other than Mars (a so-called "foreign function"). Native
procedures are typically written in C, but may be written in any language
which can be compiled into executable native code in a shared library and
conforms to the standard calling conventions of the platform (for other
languages, C can typically used as a bridging language). Note that this tends
to exclude C++, as it has a complex calling convention.

Native procedures must be explicitly imported before they can be used, using
the :keyword:`native_import` form of procedure declaration. This form of
procedure declaration has no body (there is no ``":"`` or ``"="`` part), and
allows an extra production:

.. productionlist::
   native_alias: `string_literal`

The :token:`proc_name` of a native procedure serves two distinct purposes: as
with all procedures, it defines the name of a global constant which contains
the function, but it also denotes the name of the procedure to look up in the
foreign library's symbol table. For example::

    def native_import puts(s :: CPtr) :: CInt

The above code defines a Mars function called :func:`puts`. At link time, Mars
will search for a function called "``puts``" in all of the linked libraries,
which will serve as the implementation for this function.

If a :token:`native_alias` is supplied, then that will serve as the name of
the procedure to look up in the symbol table instead. This allows you to
define a function with a different Mars name than it has in the symbol table,
which can be necessary if the name is already used in the Mars program, or the
name is an illegal Mars identifier (e.g., if it starts with an uppercase
letter). For example::

    def native_import "def" ghi(x :: CInt) :: CInt

The above code defines a Mars function called :func:`ghi`. At link time,
Mars will search for a function called "``def``" in all of the linked
libraries.

It is a type error for a native procedure to have a non-native parameter or
return type. Native procedures *should* be defined with the same number and
types of arguments as the function's original definition in the foreign
language, but behaviour is undefined if the wrong number of arguments are
listed (as Mars cannot check how many arguments are expected).

Native procedures which do not take arguments **must** be declared as
zero-arity functions (with parentheses), **not** computable global constants.
Of course, it is possible to write a normal computable global constant which
calls the native function, so it is still possible to cache the result of a
native function if desired.

   .. warning::

      Use of native procedures must be handled with care, because they are not
      type-safe at all. Even if a function is declared with the correct number
      of arguments, Mars cannot guarantee type safety. If the function is
      expecting a pointer in any argument position, you must pass a valid
      pointer of the correct type, or behaviour is undefined.

      It is currently not possible to declare a native function with a
      different number or type of parameters to an earlier declaration,
      including by the Mars runtime itself! A key example is the C standard
      ``printf`` function, which can theoretically be declared with any number
      of parameters, but which the Mars runtime already declares with one
      parameters. Therefore, it is not possible to declare ``printf`` with
      more than one parameters.

      It is recommended that all native procedures be named starting with two
      underscores (``__``), and that a wrapping procedure be supplied which
      takes proper Mars types as arguments, converts them to :type:`CInt` as
      required, calls the native procedure, converts the result to a safe Mars
      type, and returns it, to avoid exposing low-level error-prone code to
      the rest of the program.

   .. note::

      It is a linker error if the native function cannot be found in any
      linked object file. In :program:`mars`, this error will be shown at load
      time. However, :program:`marsc` does not check for the presence of
      functions; the linker error will surface when you link the program into
      an executable.

.. _ref-procedure-evaluation:

Procedure evaluation
====================

The evaluation of procedures is the heart of the Mars execution model. Mars
procedures are evaluated eagerly, and arguments are passed by-value.

Global constant references
--------------------------

If a global constant is named in a :ref:`variable reference
<ref-variable-references>` expression, the expression must evaluate to a
first-class value.

If the constant is a computable global constant, that value must be the result
of evaluating the procedure (which should be a constant [#e1]_). The time at
which the procedure is evaluated, and the number of times it is evaluated, is
unspecified, and left up to the implementation. Implementations may (but are
not limited to) choose one or more of the following evaluation strategies:

* Evaluate some computable global constants at compile time [#e2]_.
* Evaluate all computable global constants at program load time.
* Evaluate each computable global constant when it is first referenced, and
  memoize the result.
* Evaluate each computable global constant every time it is referenced.

If the constant was defined with a function definition, the expression's value
is a function object which will call the specified function when applied with
all parameters.

Function objects and currying
-----------------------------

A function object is an opaque first-class value of type
:type:`(t1, t2, ..., tm) -> t`, where *m* >= 0.
Internally, function objects have *n* formal parameters, the first *n* - *m*
of which are bound to actual values, and the remaining *m* of which are
unbound (where 0 <= *m* <= *n*).

Since Mars features partial function application, a mechanism known as
**currying** takes place. This mechanism is as follows:

When a function is :ref:`partially applied <ref-function-application-expr>`
(using the ``...`` notation), it is applied to *i* parameters, where *i* <=
*m*. No evaluation takes place. Instead, the result is of the
application is the given function object, but with its first *i* unbound
formal parameters bound to the given actual parameters. The result has type
:type:`(ti+1, ti+2, ..., tm) -> t`.

When a function is :ref:`fully applied <ref-function-application-expr>`,
it must be applied to exactly *m* parameters.  The result of the application
is the evaluation of the function with the final *m* formal parameter bound to
the given actual parameters. The result has type :type:`t`.

Parameter binding
-----------------

Functions have *n* formal parameters. A **formal parameter** is a
special local variable, which is bound to a value before the execution of the
function's body. The value which a formal parameter is bound to is called the
**actual parameter**.

Formal parameters may be bound well in advance of the actual evaluation of the
function, due to the currying mechanism outlined above. The bound values must
be stored in an implementation-defined manner.

As Mars is a **call-by-value** language, actual parameters are evaluated
before being bound to formal parameters. The binding process is to first
evaluate the actual parameter expression, and then bind its value to the
formal parameter name. The binding is the same as for an assignment statement
-- it is shallow, and the formal parameter has the same identity as the actual
parameter.

The call-by-value mechanism guarantees that direct modification of the value
will not affect the caller. Specifically, if the callee re-assigns one of its
formal parameters with a new value, that change shall not affect the original
value from which the parameter was derived. Nor shall it affect the binding in
the callee's function object (so subsequent uses of the same function object
will keep the original binding).

.. note::

   This does *not* hold up for indirect modification of a value. Mars
   allows impure constructs such as array element and field update. Since the
   formal parameter has the same identity as the original actual parameter,
   making destructive updates to a record or array in the formal parameter
   will also update the actual parameter, along with any other aliases.

Body execution
--------------

The evaluation of a procedure is a matter of executing each statement in the
procedure's body sequentially, according to the execution rules for that kind
of statement, until a :keyword:`return` statement is encountered.

The :keyword:`return` statement's expression is evaluated, and its value is
used as the result of the expression which caused the procedure to be
evaluated.

.. _ref-semantic-errors:

Semantic errors
---------------

There are several cases which cause a procedure to be rejected at compilation
time:

* If there is any control-flow path which results in a local variable being
  referenced before being assigned, the compiler MUST reject the procedure and
  the program. This doesn't apply to formal parameters, which are always
  bound.
* If there is any control-flow path which does not result in a
  :keyword:`return` statement being encountered, the compiler MUST reject the
  procedure and the program.
* If a non-:keyword:`io` procedure calls a function with an :keyword:`io`
  type.

.. warning::

   The second rule is perhaps over-zealous, as it forbids control-flow paths
   which raise errors but do not return values. A more detailed analysis
   should be proposed, including noting which procedures *always* produce
   errors.

.. rubric:: Footnotes

.. [#e1] The result of a computable global constant *should* be a constant,
   but might not be, if the procedure performs side-effects such as user
   input. The behaviour of computable global constants with side-effects is
   undefined, so for all intents and purposes, "computable global constants"
   may indeed simply be treated as constants.
.. [#e2] Implementors are to keep in mind that compilation must terminate in a
   finite amount of time, while computable global constants may not terminate.
   Therefore, this evaluation strategy must only be employed conservatively,
   if evaluation is guaranteed to terminate.