.. Mars Documentation
   Copyright (C) 2008 Matt Giuca
   11/12/2008

.. This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.

.. This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

.. You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.

.. _ref-types:

Types
=====

.. sectionauthor:: Matt Giuca <matt.giuca@gmail.com>

Mars is a strict statically typed language. Its type system is based on the
parametric Hindley-Milner system of Haskell (and before it, ML). All variables
and expressions have a known type at compile time.

In the traditional sense, a *data type* is a set of possible values. Data
types in Mars are exclusive -- every value belongs to exactly one type.
:type:`Num` and :type:`Array(Num)` are examples of data types. The values that
may be stored in variables of a given type are said to "inhabit" that type.

Mars also features *type constructors*, which are types with parameters.
:type:`Array` is an example of a type constructor. A *type constructor* is
still a type -- it is said to be "uninhabited".

Kinds
-----

To distinguish between different types of types, Mars has what is known as a
"kind system". This is similar to Haskell's kind system, with two important
differences.

Firstly, type constructors are *n*-ary, not unary. This means types take all
of their arguments at the same time, rather than implicitly currying as they
do in Haskell (so in Mars, ``T(a,b)`` is **not** equivalent to ``T(a)(b)``,
whereas in Haskell, it is).

Secondly, type constructors in Mars may be *variadic*. This means that a type
may take an arbitrary number of arguments, similar to a "var-args" function in
many programming languages.

For a general overview of kinds, see
`Kind (type theory) <http://en.wikipedia.org/wiki/Kind_(type_theory)>`_ on
Wikipedia.

Kind values take one of three forms:

1. :type:`*` is the kind of all data types.
2. :type:`(k1,k2,...,kn) -> k` is the kind of a type constructor, which takes
   exactly *n* types of kind :type:`k1`, :type:`k2`, ..., :type:`kn`,
   respectively, and produces a type of kind :type:`k`.
3. :type:`(k1,k2,...,kn,**) -> k` is the kind of a type constructor, which
   takes *n* or more types, and produces a type of kind :type:`k`. The first
   *n* types must be of kind :type:`k1`, :type:`k2`, ..., :type:`kn`,
   respectively, and any additional arguments must be of kind :type:`*`.

The kind of a type is used to distinguish between type constructors and data
types, and (for type constructors) the number and kind of their arguments.

Kinds are not directly visible in Mars. They are not specified in source code,
hence there is no syntax for them.
They are merely a behind-the-scenes mechanic, which is automatically computed
for each type. Note that it is possible to display the kind of a type [#k1]_.

.. note::

   While this general definition of the kind system allows type constructors
   with non-star argument and return kinds, the Mars language
   provides no way to create such types, nor are there any such built-in
   types. Hence, an implementation may simply represent type constructors in
   the less general form of an (*arity* :: Num, *isvariadic* :: Bool) pair.

Type Values
-----------

Type values have the following syntax:

.. productionlist::
   type: `function_type` | `primary_type`
   function_type: "(" [`type_list`] ")" "->" ["io"] `type`
                : | `primary_type` "->" `type`
   primary_type: `atom_type`
               :  | `primary_type` "(" [`type_list`] ["," "..."] ")"
   atom_type: `type_variable` | `type_name` | "->" ["io"]
   type_list: `type` ("," `type`)*
   type_variable: `lower_identifier`
   type_name: `upper_identifier`

Type values take one of the following forms:

1. A :token:`type name <type_name>` is an identifier beginning with an
   uppercase letter, and references a user-defined or built-in type
   constructor. Its kind is the kind of the referenced type constructor.
2. A :token:`type variable <type_variable>` is an identifier beginning with a
   lowercase letter, and represents an unknown type. Its kind is inferred from
   the context.
3. The token :token:`->` is the special builtin type :type:`->` (function
   type constructor). Its kind is :type:`(*, **) -> *`. Note that this form
   is rarely used.
4. Type application, of the form ``t(t1,t2,...,tn)``, or partial type
   application, of the form ``t(t1,t2,...,tn,"...")`` (where the final ``...``
   is a literal ":token:`...`" token). See :ref:`type-application`.
5. A function type, of the form ``(t1,t2,...,tn) -> t`` is syntactic sugar
   for the type ``->(t1,t2,...,tn,t)``. It represents a function which takes
   *n* arguments, ``t1`` through to ``tn``, as input, and returns a ``t``.
   Its kind is always :type:`*`. The :token:`->` operator is
   right-associative. Parentheses around the argument type are optional for
   functions of exactly one argument.
6. An I/O function type, of the form ``(t1,t2,...,tn) -> io t`` is syntactic
   sugar for the type ``->io(t1,t2,...,tn,t)``.

.. warning::

   The current Mars implementation does not support the ``...`` notation. It
   provides no way to perform partial type application. In light of the fact
   that type constructors cannot accept or produce non-star types, it is
   a useless feature.

.. _type-application:

Type application
++++++++++++++++

Type application is the application of zero or more type arguments to a type
constructor, producing a new type. Typically, type constructors have kind
:type:`(*,*,...,*) -> *`, and each type argument has kind :type:`*`,
resulting in a new type of kind :type:`*`. However, this is not the most
general case.

In general, type application takes the form ``t(t1, t2, ..., tn [, "..."])``.
If the application ends with "``...``", it is said to be **curried**.
Type ``t`` must have kind :type:`(k1, k2, ..., km [, **]) -> k`, or the type
application is a kind error, and the compiler must reject the program. If the
argument kinds of ``t`` end with "``**``", ``t`` is said to be **variadic**.

It must hold that *n* == *m*, with the following exceptions:

* If the application is curried, it must hold that *n* <= *m*.
* If ``t`` is variadic, it must hold that *n* >= *m*.
* If both the application is curried, and ``t`` is variadic, *n* may be any
  size.

For each ``ti``, where 1 <= *i* <= ``min(n, m)``, ``ti`` must have kind
:type:`ki`. Any additional arguments must have kind :type:`*`.

If the application is not curried, it is considered complete. The resulting
type has kind :type:`k`.

If the application is curried, additional arguments are required. The
resulting type has kind :type:`(kn+1, ..., km [, **]) -> k`. (That is, it is
a new type constructor, accepting all remaining arguments after the first
*n*).

Examples of Type Values
+++++++++++++++++++++++

* :type:`Num`, a built-in type of kind :type:`*`.
* :type:`List`, a type defined in the :mod:`prelude`, of kind
  :type:`* -> *` (that is, requiring one type argument).
* :type:`List(Num)`, the type :type:`Num` applied to type :type:`List`,
  producing a type of kind :type:`*`, which is a data type.
* :type:`->`, a special built-in type of kind :type:`(*, **) -> *`.
* :type:`->(Num, Num)`, two arguments, both :type:`Num`, applied to type
  :type:`->`, producing a type of kind :type:`*`.

  * A function which accepts an :type:`Num`, and produces an :type:`Num`.
* :type:`Num -> Num`, the same as above, with a more natural syntax.
* :type:`->io(Num, Num)`, a function which accepts an :type:`Num`, performs
  some I/O effect, and produces an :type:`Num`.
* :type:`Num -> io Num`, the same as above, with a more natural syntax.
* :type:`->(Num, ...)`, one argument of type :type:`Num`, applied to type
  :type:`->`, and curried, producing a type of kind :type:`(**) -> *`.
* :type:`a`, a type variable, of unknown kind (requires context).
* :type:`a -> Num`, a function type which accepts an argument of any type,
  and produces an :type:`Num`.
* :type:`->(a, b, c)`, three arguments applied to type :type:`->`, producing
  a type of kind :type:`*`.

  * A function which accepts two arguments, of types :type:`a` and
    :type:`b`, and produces a :type:`c`.
* :type:`(a, b) -> c`, the same as above, with a more natural syntax.
* :type:`a -> b -> c`, a function which accepts an :type:`a`, and
  produces a function which accepts a :type:`b`, and produces a :type:`c`.

  * This is a classic "curried style" function, as may be found in Haskell.
  * This type may also be written as :type:`->(a, ->(b, c))`.

.. _ref-type-unification:

Type unification
----------------

Unification is the main algorithm Mars uses for type checking and type
inference. For a general overview of unification, see
`Unification <http://en.wikipedia.org/wiki/Unification>`_ on Wikipedia.

When two or more values are expected to have the same type, their types are
unified. The specification explicitly states when types should be unified. If
the two types are the same, they successfully unify. Otherwise, they fail to
unify, and the program must be rejected (due to a type error).

For example, the type :type:`Num` will unify with the type :type:`Num`, but
fail to unify with the type :type:`Array(Num)`.

This is complicated by type variables. Each type variable is either **free**,
**bound** or **rigid**.

* A **free** type variable *a* will successfully unify with any type *t*, but
  once unified, *a* is bound to *t* for the entirety of the function.
* A **bound** type variable *a*, bound to type *t*, will unify with *s* if and
  only if *t* unifies with *s*.
* A **rigid** type variable *a* is never bound, and unifies only with *a*.

For example, the type :type:`a` (if *a* is a free type variable) will unify
with the type :type:`Num`, but result in *a* being bound to :type:`Num`. In
future unifications within the same function, :type:`a` will unify only with
:type:`Num`.

Type variables explicitly named in the header or body of a procedure are
**rigid**, and will not unify with any type other than themselves. For
example, this code is invalid::

 def to_int(x :: a) :: Num:
     return x                # Type error

It is invalid because the header specifies that the caller may pass an
argument of non-:type:`Num` type, but in that case, it won't be able to
return an :type:`Num`. Thus the type variable *a* is rigid; the procedure is
only valid if *a* is not unified with any other type.

Non-rigid type variables are introduced by implicitly-typed variables (for
type inference) or by expressions with polymorphic types (such as the empty
array literal, or global variables and functions with polymorphic types). For
example::

 def singleton(x :: Num) :: Array(Num):
     v = []
     return array_add(v, x)

In this example, the variable `v` is given the type :type:`Array(a)`, with
free variable *a*, when it is first assigned. During the call to
:func:`array_add`, *a* is unified with :type:`Num`, so `v` has type
:type:`Array(Num)`. It would be a type error to treat `v` as a different type
elsewhere, even though it hasn't got any data in it (once its type is bound,
it is bound for the entire body of the function; type variables in Mars are
*monomorphic*), as in the following example::

 def monomorphic(x :: Num) :: Array(Num):
     v = []
     r = array_add(v, x)
     w = array_add(v, [1])   # Type error
     return r

The unification rules are as follows (note all rules are symmetric):

* Type variable *a* unifies with type *t* with the rules as above (regardless
  of whether *t* is a type name, a type variable or a type application). If
  successful, this results in variable *a* becoming bound.
* Type name *x* unifies with type name *x*, and no other type name.
* Type names do not unify with type applications.
* Type application *t* ( *t1*, ..., *tn* ) unifies with type application *s*
  ( *s1*, ..., *sn* ) if and only if *t* unifies with *s* and *ti* unifies
  with *si* for all *i* <= *n*. Any bindings made in the recursive
  unifications apply.

Polymorphism in global constants
++++++++++++++++++++++++++++++++

The example above shows that type variables in Mars are monomorphic. This is
true only for local variables. As a special rule, global constants (including
functions and data constructors) in Mars are polymorphic, meaning they can be
given a different binding upon each use.

The type of each global constant is "generalised" by taking each type variable
in its original type, and universally quantifying it. A variable of type *t*
containing type variables *a1*, ..., *an* is generalised as "∀ *a1*, ...,
*an*. *t*"

A successful unification of a type with a universal quantification will *not*
cause the quantified variables to become bound, so they may be unified again
as a free variable.

The example above can be "fixed" by making `v` a global constant::

 def v :: Array(a) = []

 def monomorphic(x :: Num) :: Array(Num):
     r = array_add(v, x)
     w = array_add(v, [1])
     return r

Now `v` has type :type:`∀a. Array(a)`. The first unification between
:type:`∀a. Array(a)` and :type:`Array(Num)` succeeds without binding *a*.
Thus, the second unification between :type:`∀a. Array(a)` and
:type:`Array(Array(Num))` also succeeds.

The polymorphic / monomorphic distinction is important when dealing with
polymorphic functions. Consider::

 def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
     y = array_map(f, x)
     z = array_map(g, x)
     return Pair(y, z)

Contrast with::

 def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
     mymap = array_map
     y = mymap(f, x)
     z = mymap(g, x)     # Type error
     return Pair(y, z)

In the former example, :func:`array_map` is called twice. Its type is
:type:`∀t. ∀u. (t -> u, Array(t)) -> Array(u)`. On the first call, its first
argument is unified with :type:`a -> b`; hence it is treated as though it has
type :type:`(a -> b, Array(a)) -> Array(b)`, binding `y`'s type to
:type:`Array(b)`. However, this unification *does not* change the type of
:func:`array_map`. On the second call, its first argument is unified with
:type:`a -> c`; hence it is treated as though it has type :type:`(a -> c,
Array(a)) -> Array(c)`, binding `z`'s type to :type:`Array(c)`. The two uses
of :func:`array_map` treat it as though it has two different, incompatible
types, which is valid because the function itself is polymorphic.

In the latter example, :func:`array_map` is used only once, to assign to the
local (monomorphic) variable `mymap`. This gives `mymap` the type
:type:`(t -> u, Array(t)) -> Array(u)` (note the lack of universal
quantification -- the free type variables *t* and *u* are scoped to the whole
function). On the first call to `mymap`, its first argument is unified with
:type:`a -> b`; hence free variable *t* is bound to rigid variable *a* and
free variable *u* is bound to rigid variable *b*. This permanently changes the
type of `mymap` to :type:`(a -> b, Array(a)) -> Array(b)`. It is as if the
function contained the declaration::

 var mymap :: (a -> b, Array(a)) -> Array(b)

Therefore, on the second call to `mymap`, its first argument of type :type:`a
-> b` is unified with :type:`a -> c`, and the unification of rigid variables
*b* and *c* fails, producing a type error.

The bottom line is that once a function (or any other value) is assigned to a
local variable, it has a fixed type, even if it is not explicitly declared.

.. _ref-builtin-types:

Built-In Types
--------------

Mars defines several built-in types, which could not be written in the
language itself. The specific values that are classified by each type are
discussed in :ref:`ref-values`.

.. Figure out a way to not print the word "class" here.

.. type:: Num

Has kind :type:`*`. Values of this type are :ref:`floating-point numbers
<ref-value-num>`, represented as binary values of at least 64-bits (the
standard "double" format).

.. type:: Array(a)

Has kind :type:`* -> *`. Values of this type family are :ref:`arrays
<ref-value-array>` with elements of type *a*.

.. type:: ->

Has kind :type:`(*, **) -> *`. Values of this type family are pure
:ref:`functions <ref-value-function>` (not permitted to perform I/O effects).

This type constructor is variadic; it accepts one or more type arguments. Note
the special syntax: ``a -> b`` is syntactic sugar for ``->(a, b)``, ``(a, b)
-> c`` is syntactic sugar for ``->(a, b, c)``, and so on.

.. type:: ->io

Has kind :type:`(*, **) -> *`. Values of this type family are I/O
:ref:`functions <ref-value-function>` (permitted to perform I/O effects).

Note the special syntax: ``a ->io b`` is syntactic sugar for ``->io(a, b)``,
``(a, b) ->io c`` is syntactic sugar for ``->io(a, b, c)``, and so on.

.. index:: keyword: type
.. _ref-type-declarations:
.. _type:

Native types
------------

There are several additional built-in types which are called the *native
types*. These types are all prefixed with "C", and are compatible with the
equivalent C data type. These types are designed to help marshall data between
Mars code and code written in another language, such as C. It is meant to be
used only by foreign function interface code, not normal Mars code.

.. type:: CChar

Has kind :type:`*`. Values of this type are equivalent to C ``signed char``
values, represented as 8-bit signed integers on all supported platforms. Can
also represent ``unsigned char`` values.

.. type:: CShort

Has kind :type:`*`. Values of this type are equivalent to C ``short`` values,
represented as 16-bit signed integers on all supported platforms. Can also
represent ``unsigned short`` values.

.. type:: CInt

Has kind :type:`*`. Values of this type are equivalent to C ``int`` values,
and are typically 32-bit signed integers. Can also represent ``unsigned int``
values.

.. type:: CFloat

Has kind :type:`*`. Values of this type are equivalent to C ``float`` values,
represented as 32-bit IEEE-754 single-precision binary floating-point numbers
on all supported platforms.

.. type:: CDouble

Has kind :type:`*`. Values of this type are equivalent to C ``double`` values,
represented as 64-bit IEEE-754 double-precision binary floating-point numbers
on all supported platforms. Note that this is generally equivalent to the Mars
:type:`Num` type, but an explicit conversion is required anyway to ensure
compatibility with C code.

.. type:: CPtr

Has kind :type:`*`. Values of this type are equivalent to C ``void*`` values,
represented as machine-word-sized unsigned integers (typically 32 or 64 bits,
depending on the platform). This type can be used to represent pointers to any
native type, and functions are supplied for dereferencing pointers of each
type.

These types cannot be used for arithmetic, or in any other context where
a :type:`Num` is required (they are distinct types). However, a pair of
functions is provided to convert back and forth from :type:`Num` to each
native type, so it is possible to perform arithmetic that way.
As an exception, the functions :func:`cptr_add` and :func:`cptr_diff` are
provided to do direct arithmetic on pointers (since it would be unsafe to
convert pointers to floating-point numbers and perform arithmetic).

The integral types can represent either a signed or unsigned number; separate
conversion functions are provided for the signed and unsigned variants. Care
must be taken to use the correct variant when passing the value to or from a
native function (for example, if a C function returns an ``unsigned short``,
the :func:`cushort_num` function should be used to convert from the resulting
:type:`CShort` to a Mars :type:`Num`).

These have the special distinction of being the only types which may be passed
to and from native functions (see `native-imports`).

There is no literal notation for any of these types; they may only be created
via the function for converting a :type:`Num` to that type, or by calling a
native function. Additionally, :type:`CPtr` may be created by calling
:func:`malloc`.

Calling :func:`show` on these values produces a string of the form "<*type*
*value*>". The value is the signed decimal representation of the number,
except for :type:`CPtr`, which shows the value in hexadecimal, prefixed by
"``0x``".

Type Declarations
-----------------

Type declarations have the following syntax:

.. productionlist::
   typedef: "type" `type_name` [`type_params`] ":" NEWLINE
          :     INDENT `constructor`+ DEDENT
   type_params: "(" [`type_variable` ("," `type_variable`)*] ")"
   constructor: `ctor_name` [`ctor_params`]
   ctor_params: "(" [`ctor_param` ("," `ctor_param`)*] ")"
   ctor_param: [`param_name` "::"] `type`
   ctor_name: `upper_identifier`
   param_name: `lower_identifier`

Use of the typedef production will declare a new globally-available type name,
as well as a number of globally-available constructor function names. Types
must have a name (an identifier beginning with an uppercase letter), and at
least one constructor.

A type may optionally have zero or more type parameters. If a type does not
have parameters, its kind is :type:`*`. If a type has *n* parameters, its
kind is :type:`(*1, *2, ..., *n) -> *`. Any parameter variables may be used
in the types of the constructor parameters.

This statement only creates type constructors of the form
:type:`(*,*,...,*) -> *`. It is not possible to create a user-defined type
constructor which accepts or returns higher-kinded types, or variadic type
constructors.

.. note::

   Declaring a type with empty parentheses (eg. ``type Foo()``) is possible in
   this grammar, but **not** something you should ever want to do.

   It creates a type *constructor*, of kind :type:`() -> *`, which accepts
   *zero* arguments. This type cannot be used as a data type without
   accompanying parentheses wherever it is referred to.

   This is allowed in Mars, for more transparency in the kind system, but in
   practice, all nullary types should be declared without parentheses (eg.
   ``type Foo``), giving them the kind :type:`*`.

Types have one or more constructors. Each constructor has a globally-unique
name, and zero or more parameters. The constructor declares a
globally-available function of the same name, which takes the given arguments
as inputs, and returns a value of the type being declared. Each constructor
parameter may optionally have a name. No two parameters of a constructor may
have the same name. The same parameter name *may* appear in multiple
constructors, but it MUST have the same type in all constructors.

For example, the following type is illegal, as a constructor has two
parameters with the same name::

 type Foo:
     X(v :: Num, v :: Num)               # Error: Duplicate field name

The following type is valid, as the name `v` appears in multiple constructors,
but has the same type in all instances::

 type Foo:
     X(u :: Num, v :: Num)
     Y(v :: Num)

The following type is illegal, as the name `v` appears in multiple
constructors with differing types::

 type Foo:
     A(u :: Num, v :: Num)
     B(v :: Array(Num))                  # Error: Duplicate field name

.. note::

   A constructor without parameters is, in effect, declaring a global
   constant of this type. A type is an "enumeration type" if all of its
   constructors are nullary, and as such, it simply declares a fixed-size set
   of named constants.

   Note that, just as with the type itself, a constructor without parameters
   is distinct from a constructor with 0 parameters. Constructors with 0
   parameters require an empty function application, and are typically not
   useful.

The values that inhabit a user-defined type are specified in
:ref:`ref-value-userdef`.

.. rubric:: Footnotes

.. [#k1] The Mars interpreter will print out the kind of any type, using the
   :command:`:k` command.