.. Mars Documentation
Copyright (C) 2011 Matt Giuca
.. This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.. This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.. You should have received a copy of the GNU General Public License
along with this program. If not, see .
.. _ref-typedicts:
*****************
Type dictionaries
*****************
.. sectionauthor:: Matt Giuca
Mars has no type classes per se, but the built-in functions :func:`show` and
:func:`eq` are defined to work on any type. This means that polymorphic
functions need to know something about the types of their arguments in order
to perform the correct string conversion or equality check. There are three
main ways to deal with this problem:
1. Every object implicitly knows enough about its type to perform the
operation,
2. Polymorphic functions are told about the types which are unknown at compile
time when they are called,
3. All polymorphism is specialised at compile-time, and each specialised
version intrinsically knows how to handle the concrete types.
Approach #1 is used by object-oriented languages to perform dynamic method
lookup; since at best, containers can have heterogeneous subtypes, and at
worst, all objects are dynamically typed, each object must individually track
its own type (usually with a vtable). This approach is used by the built-in
Mars interpreter.
Approach #2 is used by the implementation of Haskell, and the Mars LLVM
backend. It is far more efficient than the first approach, but only works for
typeclasses with parametric polymorphism (note that object-oriented languages
use approach #1 to solve a different problem, dynamic method dispatch).
Approach #3 is used by C++ for templates (its equivalent of parametric
polymorphism), but can generate a lot of excess code and has significant
problems for modularity (requires code generation at call time).
The LLVM backend uses approach 2, passing type dictionaries (or in C++
terminology, vtables) to keep track of the implicit type classes :type:`Show`
and :type:`Eq` (in Haskell parlance).
Code generation
===============
To simplify backends which require type dictionaries, this entire system is
implemented as a source-to-source translation. Any Mars program can be
translated into another valid and equivalent Mars program which does not call
the :type:`show` or :type:`eq` built-ins, passing type dictionaries instead.
In other words, type dictionaries require no special features, except for
special functions for showing and comparing built-in types.
Hence, the type dictionary transformation produces normal Mars code, with one
exception: all new identifiers generated by the transformation contain a
colon, to avoid collisions with existing identifiers. (Mars source-to-source
transformations are allowed to introduce identifiers with colons, whereas user
code cannot.)
There are three parts to the code generation:
* Boilerplate built-in code. This code is (almost) the same for every program.
It includes the basic definition of the type dictionary, as well as type
dictionaries for the built-in types. It only differs from program to program
in its selection of which function arities to support (based on which
function arities the program uses).
* User type dictionary code. The compiler generates some unique code for each
user-defined type, comprising a type dictionary for that type.
* Function augmentation. All polymorphic functions and calls to polymorphic
functions are augmented (modified) to support type dictionary passing.
The type dictionary
===================
A *type dictionary* is a runtime object which provides information about a
particular type. In the current implementation, the only "information" we need
is an implementation of the :func:`show` and :func:`eq` functions for the
type. Therefore, a type "dictionary" is just a pair of :type:`a ->
Array(Num)` and :type:`(a, a) -> Num`. This is available to the generated
code as the special user-defined type::
type Dict(a):
Dict(show :: a -> Array(Num), eq :: (a, a) -> Num)
Note that type dictionaries correspond to *data types* only (see
:ref:`ref-types`). Hence, there is a type dictionary for the :type:`Num` data
type, and the :type:`Array(Num)` data type (and all other instantiations of
the :type:`Array` type), but there is no single type dictionary for the
:type:`Array` type constructor.
At compile-time, a *show* and an *eq* function, and a type-dictionary
constructor function is generated for each type *constructor*. For
non-parameterised types, this means two simple functions, and a computable
global constant for the dictionary.
For example, a user-defined type :type:`NumBox`::
type NumBox:
NumBox(Num)
would have the following type dictionary generated::
def show:NumBox(x :: NumBox) :: Array(Num):
var res :: Array(Num)
switch x:
case NumBox(f_0):
res = "NumBox("
res = array_concat(res, dict:Num.show(f_0))
res = array_concat(res, ")")
return res
def eq:NumBox(x :: NumBox, y :: NumBox) :: Num:
switch x:
case Pair(x_f_0):
switch y:
case Pair(y_f_0):
if dict:Num.eq(x_f_0):
pass
else:
return 0
return 1
def dict:NumBox :: Dict(NumBox) = Dict(show:NumBox, eq:NumBox)
The *show* and *eq* functions for types with parameters take
type dictionaries as arguments, which are used to recurse on the members of
the type. There is no constant type dictionary for such types; instead, there
is a type dictionary *constructor function* which takes a type dictionary for
each parameter, and curries it into the *show* and *eq* functions, producing a
dictionary.
For example, the prelude type :type:`Pair`::
type Pair(a, b):
Pair(a, b)
would have the following type dictionary generated::
def show:Pair(dict:a :: Dict(a), dict:b :: Dict(b), x :: Pair(a, b)) :: Array(Num):
var res :: Array(Num)
switch x:
case Pair(f_0, f_1):
res = "Pair("
res = array_concat(res, dict:a.show(f_0))
res = array_concat(res, ", ")
res = array_concat(res, dict:b.show(f_1))
res = array_concat(res, ")")
return res
def eq:Pair(dict:a :: Dict(a), dict:b :: Dict(b), x :: Pair(a, b), y :: Pair(a, b)) :: Num:
switch x:
case Pair(x_f_0, x_f_1):
switch y:
case Pair(y_f_0, y_f_1):
if dict:a.eq(x_f_0, y_f_1):
pass
else:
return 0
if dict:b.eq(x_f_1, y_f_1):
pass
else:
return 0
return 1
def dict:Pair(dict:a :: Dict(a), dict:b :: Dict(b)) :: Dict(Pair(a, b)):
return Dict(show:Pair(dict:a, dict:b, ...), eq:Pair(dict:a, dict:b, ...))
Notice that the *show* and *eq* functions recurse by looking up the *show* and
*eq* functions in the appropriate type dictionaries. The :func:`dict:Pair`
function can be used to create a type dictionary for a Pair, given the type
dictionaries of the Pair's type parameters.
Dictionaries of built-in types
------------------------------
The built-in type :type:`Num` has special built-in functions :func:`show:Num`
and :func:`eq:Num` which convert an number into a string, and compare two
numbers, respectively. Its type dictionary is defined as::
def dict:Num :: Dict(Num) = Dict(show:Num, eq:Num)
The built-in type :type:`Array` has non-primitive *show* and *eq* functions::
def show:Array(dict:a :: Dict(a), x :: Array(a)) :: Array(Num):
var res :: Array(Num)
var i :: Num
res = "["
if 0 < array_length(x):
res = array_concat(res, dict:a.show(array_ref(x, 0)))
i = 1
while i < array_length(x):
res = array_concat(array_concat(res, ", "), dict:a.show(array_ref(x, i)))
i = i + 1
res = array_concat(res, "]")
return res
def eq:Array(dict:a :: Dict(a), x :: Array(a), y :: Array(a)) :: Num:
var i :: Num
i = 0
if eq:Num(array_length(x), array_length(y)):
pass
else:
return 0
while i < array_length(x):
if dict:a.eq(array_ref(x, i), array_ref(y, i)):
pass
else:
return 0
i = i + 1
return 1
def dict:Array(dict:a :: Dict(a)) :: Dict(Array(a)):
return Dict(show:Array(dict:a, ...), eq:Array(dict:a, ...))
There is a special built-in function :func:`show:func` of type :type:`a ->
Array(Num)`, which displays a function in an implementation-defined manner.
Functions have a non-primitive *eq* function::
def eq:func(x :: a, y :: a) :: Num:
return error("Functions cannot be compared for equality")
The type dictionary for functions is complicated by the fact that they can
have any arity. A separate *show*, *eq* and dictionary needs to be generated
for each arity (these are generated as needed, for functions of any arity).
For example, if there is any function of arity 2, the following code is
generated::
def dict:func:2(dict:a :: Dict(a), dict:b :: Dict(b), dict:c :: Dict(c)) :: Dict((a, b) -> c):
return Dict(show:func, eq:func)
Augmenting polymorphic functions
================================
Any polymorphic function in the program may be augmented with type
dictionaries (one type dictionary per type variable). In practice, we avoid
unnecessarily requesting a type dictionary for any type variable that does not
require one using static analysis (see :ref:`ref-typedict-demand`).
.. note::
Data constructor functions are *never* augmented (as they never need
to reference these methods). It doesn't complicate the
transformation to make this exception (in fact it would complicate
it to *not* make this exception, since data constructors are
special-cased).
Every user-defined function (not including the special functions generated for
the type dictionaries themselves) is augmented with an extra argument at the
start, for each type variable in the function's head which will statically
have :func:`show` or :func:`eq` called on it by this function or its
descendants. For example, the function::
def remove_all(a :: Array(a), e :: a) :: Array(a):
var i :: Num
i = array_length(a) - 1
while i >= 0:
if eq(array_ref(a, i), e):
a = array_remove(a, i)
i = i - 1
return a
would be augmented as follows::
def remove_all(dict:a :: Dict(a), a :: Array(a), e :: a) :: Array(a)
This means that any Mars code can produce the type dictionary for any type,
whether it is known or unknown.
Calls to polymorphic functions are augmented with type dictionaries for each
type parameter. This is trivial for known types. For example, the call::
remove_all([1, 2, 3, 2], 2)
is augmented as follows::
remove_all(dict:Num, [1, 2, 3, 2], 2)
All references to the :func:`show` and :func:`eq` functions are replaced with
a field access to the type dictionary of the known type. For example, the full
augmentation for :func:`remove_all` is::
def remove_all(dict:a :: Dict(a), a :: Array(a), e :: a) :: Array(a):
var i :: Num
i = array_length(a) - 1
while i >= 0:
if dict:a.eq(array_ref(a, i), e):
a = array_remove(a, i)
i = i - 1
return a
Augmenting computable global constants
--------------------------------------
Computable global constants with a polymorphic type will be turned into
functions in order to augment them with type dictionaries. This has a
practical implication that caching (which implementations are allowed, but not
required to perform on CGCs) will be disabled. This is similar to what happens
in Haskell (but only if the type annotation is given, and if a type class is
used). For example::
def my_show :: a -> Array(Num):
print_string("my_show\n")
return show
This is a CGC which returns the :func:`show` function (the
:func:`print_string` simply makes it easy to see when the function is being
called; it could represent an expensive computation). An implementation which
dynamically looks up the type of its argument, such as the built-in
interpreter, can have a single :func:`show` implementation which is returned
by :func:`my_show`. The body of :func:`my_show` therefore only has to be
called once.
An implementation using type dictionaries, however, must return a
type-specific implementation of :func:`show` during the body of
:func:`my_show` (since all function values are monomorphic). Therefore,
:func:`my_show` is augmented as follows::
def my_show(dict:a :: Dict(a)) :: a -> Array(Num):
print_string("my_show\n")
return dict:a.show
Now every reference to :func:`my_show` will execute its body; as it is not a
CGC it will not be cached. Consider a function which calls it::
print_string(my_show(4))
put_char('\n')
print_string(my_show([1,2,3]))
put_char('\n')
.. highlight:: none
Using a caching implementation without type dictionaries, this would print::
my_show
4
[1, 2, 3]
.. highlight:: mars
It would be augmented as follows::
print_string(my_show(dict_Int)(4))
put_char('\n')
print_string(my_show(dict_Array(dict_Int))([1,2,3]))
put_char('\n')
.. highlight:: none
The augmented code prints::
my_show
4
my_show
[1, 2, 3]
.. highlight:: mars
Lifting polymorphic functions
=============================
Recall that Mars functions are only polymorphic if they are global; local
functions are always monomorphic (see :ref:`ref-type-unification`). This
neatly simplifies the type dictionary transformation, because only global
functions need to be augmented. First-class function objects never take type
dictionaries, because they will have already had their types resolved. A
polymorphic function which is "lifted" into a first-class function object will
have all of its type dictionaries curried into it at the time of lifting; this
applies to partial application.
For this reason, it is never necessary to curry some but not all type
dictionaries during partial application -- all types are resolved during
partial application, regardless of whether any arguments of that type have
been supplied. This also applies to function arguments; we can assume that the
functions have already had their type dictionaries curried and therefore do
not need to supply type dictionaries when calling these functions.
Consider the example :func:`twomaps`, discussed in
:ref:`ref-type-unification`. The type error can be corrected by introducing
two separate local variables, one for each usage of :func:`array_map`.
Therefore, this version has no type error::
def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
var mymap1 :: (a -> b, Array(a)) -> Array(b)
var mymap2 :: (a -> c, Array(a)) -> Array(c)
var y :: Array(b)
var z :: Array(c)
mymap1 = array_map
mymap2 = array_map
y = mymap1(f, x)
z = mymap2(g, x)
return Pair(y, z)
This would be augmented as follows::
def twomaps(dict:a :: Dict(a), dict:b :: Dict(b), dict:c :: Dict(c), f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
var mymap1 :: (a -> b, Array(a)) -> Array(b)
var mymap2 :: (a -> c, Array(a)) -> Array(c)
var y :: Array(b)
var z :: Array(c)
mymap1 = array_map(dict:a, dict:b, ...)
mymap2 = array_map(dict:a, dict:c, ...)
y = mymap1(f, x)
z = mymap2(g, x)
return Pair(y, z)
.. note::
This isn't a very good example, as in practice, the type variables
'a' and 'b' don't need type dictionaries, so the analysis would not
augment this function at all.
Note that the only augmentation happens during the *lifting* of
:func:`array_map` from a polymorphic global function to a monomorphic local
function. The local functions `mymap1` and `mymap2` have unaugmented types;
those functions have the type dictionaries encoded into them. Calls to these
functions are also unaugmented. Similarly, the function arguments *f* and *g*
are unaugmented and do not need to be given type dictionaries; the caller of
:func:`twomaps` must supply functions which have already had type dictionaries
curried in, if necessary.
Dictionaries for free variables
===============================
As discussed in :ref:`ref-types`, a type variable is either **free**,
**bound** or **rigid**. Rigid type variables are those which have type
dictionary arguments supplied. We can look up the type dictionary of a bound
type variable by dereferencing it (resulting in either a rigid type variable
or a concrete type). Now **free** variables typically become bound by the end
of a function's type analysis, so very rarely show up in code analysis (after
type checking), but there are valid situations in which a variable remains
free, including:
* When a constructor is used which does not include one of the typedef's type
parameters (e.g., :func:`Nil`, the empty list, has type :type:`List(a)`,
and may be manipulated without ever resolving the type :type:`a`).
* The empty array, ``[]``. Like the empty list, this has type
:type:`Array(a)`.
* The result of :func:`error`, of type :type:`a`. Obviously, such situations
cause the program to terminate, but they still need to type check and
compile properly, even if the result type is not unified with anything (for
example, if the result is passed to :func:`show`, a type dictionary will be
required).
Let us take the example of the empty array::
def empty_array :: Array(a) = []
def print_empty() :: Num:
r = empty_array
return print_value(r)
.. note::
:func:`print_value` is defined as ``print_string(show(val))``.
The local variable *r* has type :type:`a`, where *a* is a free type variable.
In augmenting this function, it is unclear which type dictionary to pass for
*a*. Clearly, calling :func:`show` on a value with an unknown type cannot
possibly resolve to a concrete implementation of *show*. Fortunately, it
doesn't have to -- there will never be a concrete value at runtime whose type
is that of a free variable (in this case, the array has no elements, so
:func:`show:Array` will never call its ``dict:a.show`` method).
.. note::
This would be untrue if Mars supported type classes, or provided any
built-ins which produce a value that depends on the type (such as
:func:`read`). In that case, unresolved free variables would be a
legitimate compiler error in certain cases (for example,
``show(read(S))`` has no reasonable semantics as there is no type
for the result of :func:`read`).
Having said that, Mercury's :func:`io.read` predicate can be used in
this truly-ambiguous way without error (only a warning). It is
handled the same way as Mars does -- by unifying with a dummy type
:type:`void`, resulting in no valid parses.
Since we can deduce that any call to :func:`show` or :func:`eq` with a
variable of a free type will either be unreachable (in the :func:`error`
case), or will never access a value of the free type (in the empty list/array
cases), the type dictionary doesn't need to resolve to any particular concrete
implementation.
Statically, however, it needs to resolve to something, and it can't impose any
additional type constraints (so simply passing :func:`dict:Num` as a fallback
isn't valid, as it will cause the free variable to become bound to
:type:`Num`). Therefore, we provide a special type dictionary for free-typed
variables which simply raises an error if its *show* or *eq* methods are
called (which they should never be)::
def show:free(x :: a) :: Array(Num):
return error("show:free")
def eq:free(x :: a, y :: a) :: Num:
return error("eq:free")
def dict:free :: Dict(a) = Dict(show:free, eq:free)
Therefore, we can augment the above code as follows::
def empty_array(dict:a :: Dict(a)) :: Array(a) = []
def print_empty() :: Num:
r = empty_array(dict:free)
return print_value(dict:Array(dict:free), r)
The call to :func:`print_value` will call its dictionary's `show` method,
``show:Array(show:free)``, but that method will never call *its* dictionary's
`show` method, :func:`show:free`, because the array is empty. If the array was
not empty, we wouldn't have been able to keep the type variable free.
.. note::
Again, this isn't a very good example, as :func:`empty_array` would
not have a type dictionary augmented.
.. warning::
There is actually a problem here: it is possible for
:func:`show:free` and :func:`eq:free` to be called by the user by
abusing a problem with the interactive prompt. Since interactive
local variables may have monomorphic types with free variables,
it is possible to assign, say, :func:`show` to a local. Since the
type is not yet resolved, it will be augmented as
:func:`show:free`, but could then be called with, say, a
:type:`Num` argument, causing :func:`show:free` to be executed.
This is discussed in the :ref:`FAQ `.
.. highlight:: haskell
.. note::
In Haskell, this situation is simply an error. The equivalent::
empty_list :: Show a => [a]
empty_list = []
print_empty :: IO ()
print_empty =
let r = empty_list
in putStrLn $ show r
The expression ``show r`` has an ambiguous type :type:`∀a. Show a
=> String`, and is therefore a type error. Note that this is rarer
in Haskell than in Mars, because it only arises due to the type
class constraint. It is perfectly acceptable to have
ambiguously-typed values without type class constraints; in this
case, the question of which type dictionary to use never comes up.
In Mars, it is much more common because every type variable
implicitly has a type class constraint.
If we *had* type classes in Mars, we would need to make it a
compiler error, as we could no longer make the guarantee that
values of unresolved types would never be generated at runtime.
.. highlight:: prolog
In Mercury (with a closer type system to Mars), this situation is
handled the same way as Mars, only the compiler produces a warning::
:- func empty_list = list(T).
empty_list = [].
:- pred print_empty(io::di, io::uo) is det.
print_empty(!IO) :-
R = empty_list,
io.write_string(string(R), !IO).
.. highlight:: none
This produces the warning::
warning: unresolved polymorphism.
The variable with an unbound type was:
R: list.list(T)
The unbound type variable will be implicitly bound to the
builtin type `void'.
.. highlight:: mars
.. _ref-typedict-demand:
Type dictionaries on demand
===========================
There is now an analysis which determines precisely which type variables in a
function require a type dictionary. This is precisely those type variables
which are referenced during the body of a function (i.e., to look up a method,
or to pass to another function). In other words, there are no type dictionary
parameters which are unused in the body of the function.
This analysis is necessarily a fixpoint analysis, because the analysis of one
function is dependent upon all of its callees, which can include mutual or
self recursion.
The approach taken is simply to start off by assigning *no* type dictionary
parameters to any function (with a special case for :func:`show` and
:func:`eq` for obvious reasons). Then, determine the strongly connected
components of the call graph and augment them in topological order (from the
bottom up). Therefore, when each function is being augmented, all of its
callees have already been finalised, with the exception of itself and
mutually-recursive functions (those functions in the same SCC).
The function is augmented, and every time an attempt is made to use a typedict
parameter which does not exist, the missing parameter is logged. Once the
whole SCC is augmented, a check is made to see whether any missing parameters
were logged.
If not, the whole SCC is completely augmented, and any type variables which
did not receive a type dictionary parameter simply did not need one. If there
were missing parameters, the SCC is reset back to the unaugmented version. The
missing parameters are augmented into the functions' headers, and the SCC is
re-augmented, hopefully this time not missing any. It is possible that on the
second iteration, some more parameters will be requested that weren't the
previous time, and more errors will be logged. The process continues until
there are no more problems.