Type dictionaries

Mars has no type classes per se, but the built-in functions show and eq are defined to work on any type. This means that polymorphic functions need to know something about the types of their arguments in order to perform the correct string conversion or equality check. There are three main ways to deal with this problem:

  1. Every object implicitly knows enough about its type to perform the operation,
  2. Polymorphic functions are told about the types which are unknown at compile time when they are called,
  3. All polymorphism is specialised at compile-time, and each specialised version intrinsically knows how to handle the concrete types.

Approach #1 is used by object-oriented languages to perform dynamic method lookup; since at best, containers can have heterogeneous subtypes, and at worst, all objects are dynamically typed, each object must individually track its own type (usually with a vtable). This approach is used by the built-in Mars interpreter.

Approach #2 is used by the implementation of Haskell, and the Mars LLVM backend. It is far more efficient than the first approach, but only works for typeclasses with parametric polymorphism (note that object-oriented languages use approach #1 to solve a different problem, dynamic method dispatch).

Approach #3 is used by C++ for templates (its equivalent of parametric polymorphism), but can generate a lot of excess code and has significant problems for modularity (requires code generation at call time).

The LLVM backend uses approach 2, passing type dictionaries (or in C++ terminology, vtables) to keep track of the implicit type classes Show and Eq (in Haskell parlance).

Code generation

To simplify backends which require type dictionaries, this entire system is implemented as a source-to-source translation. Any Mars program can be translated into another valid and equivalent Mars program which does not call the show or eq built-ins, passing type dictionaries instead. In other words, type dictionaries require no special features, except for special functions for showing and comparing built-in types.

Hence, the type dictionary transformation produces normal Mars code, with one exception: all new identifiers generated by the transformation contain a colon, to avoid collisions with existing identifiers. (Mars source-to-source transformations are allowed to introduce identifiers with colons, whereas user code cannot.)

There are three parts to the code generation:

  • Boilerplate built-in code. This code is (almost) the same for every program. It includes the basic definition of the type dictionary, as well as type dictionaries for the built-in types. It only differs from program to program in its selection of which function arities to support (based on which function arities the program uses).
  • User type dictionary code. The compiler generates some unique code for each user-defined type, comprising a type dictionary for that type.
  • Function augmentation. All polymorphic functions and calls to polymorphic functions are augmented (modified) to support type dictionary passing.

The type dictionary

A type dictionary is a runtime object which provides information about a particular type. In the current implementation, the only “information” we need is an implementation of the show and eq functions for the type. Therefore, a type “dictionary” is just a pair of a -> Array(Num) and (a, a) -> Num. This is available to the generated code as the special user-defined type:

type Dict(a):
    Dict(show :: a -> Array(Num), eq :: (a, a) -> Num)

Note that type dictionaries correspond to data types only (see Types). Hence, there is a type dictionary for the Num data type, and the Array(Num) data type (and all other instantiations of the Array type), but there is no single type dictionary for the Array type constructor.

At compile-time, a show and an eq function, and a type-dictionary constructor function is generated for each type constructor. For non-parameterised types, this means two simple functions, and a computable global constant for the dictionary.

For example, a user-defined type NumBox:

type NumBox:
    NumBox(Num)

would have the following type dictionary generated:

def show:NumBox(x :: NumBox) :: Array(Num):
    var res :: Array(Num)
    switch x:
        case NumBox(f_0):
            res = "NumBox("
            res = array_concat(res, dict:Num.show(f_0))
            res = array_concat(res, ")")
            return res

def eq:NumBox(x :: NumBox, y :: NumBox) :: Num:
    switch x:
        case Pair(x_f_0):
            switch y:
                case Pair(y_f_0):
                    if dict:Num.eq(x_f_0):
                        pass
                    else:
                        return 0
    return 1

def dict:NumBox :: Dict(NumBox) = Dict(show:NumBox, eq:NumBox)

The show and eq functions for types with parameters take type dictionaries as arguments, which are used to recurse on the members of the type. There is no constant type dictionary for such types; instead, there is a type dictionary constructor function which takes a type dictionary for each parameter, and curries it into the show and eq functions, producing a dictionary.

For example, the prelude type Pair:

type Pair(a, b):
    Pair(a, b)

would have the following type dictionary generated:

def show:Pair(dict:a :: Dict(a), dict:b :: Dict(b), x :: Pair(a, b)) :: Array(Num):
    var res :: Array(Num)
    switch x:
        case Pair(f_0, f_1):
            res = "Pair("
            res = array_concat(res, dict:a.show(f_0))
            res = array_concat(res, ", ")
            res = array_concat(res, dict:b.show(f_1))
            res = array_concat(res, ")")
            return res

def eq:Pair(dict:a :: Dict(a), dict:b :: Dict(b), x :: Pair(a, b), y :: Pair(a, b)) :: Num:
    switch x:
        case Pair(x_f_0, x_f_1):
            switch y:
                case Pair(y_f_0, y_f_1):
                    if dict:a.eq(x_f_0, y_f_1):
                        pass
                    else:
                        return 0
                    if dict:b.eq(x_f_1, y_f_1):
                        pass
                    else:
                        return 0
    return 1

def dict:Pair(dict:a :: Dict(a), dict:b :: Dict(b)) :: Dict(Pair(a, b)):
    return Dict(show:Pair(dict:a, dict:b, ...), eq:Pair(dict:a, dict:b, ...))

Notice that the show and eq functions recurse by looking up the show and eq functions in the appropriate type dictionaries. The dict:Pair function can be used to create a type dictionary for a Pair, given the type dictionaries of the Pair’s type parameters.

Dictionaries of built-in types

The built-in type Num has special built-in functions show:Num and eq:Num which convert an number into a string, and compare two numbers, respectively. Its type dictionary is defined as:

def dict:Num :: Dict(Num) = Dict(show:Num, eq:Num)

The built-in type Array has non-primitive show and eq functions:

def show:Array(dict:a :: Dict(a), x :: Array(a)) :: Array(Num):
    var res :: Array(Num)
    var i :: Num
    res = "["
    if 0 < array_length(x):
        res = array_concat(res, dict:a.show(array_ref(x, 0)))
    i = 1
    while i < array_length(x):
        res = array_concat(array_concat(res, ", "), dict:a.show(array_ref(x, i)))
        i = i + 1
    res = array_concat(res, "]")
    return res

def eq:Array(dict:a :: Dict(a), x :: Array(a), y :: Array(a)) :: Num:
    var i :: Num
    i = 0
    if eq:Num(array_length(x), array_length(y)):
        pass
    else:
        return 0
    while i < array_length(x):
        if dict:a.eq(array_ref(x, i), array_ref(y, i)):
            pass
        else:
            return 0
        i = i + 1
    return 1

def dict:Array(dict:a :: Dict(a)) :: Dict(Array(a)):
   return Dict(show:Array(dict:a, ...), eq:Array(dict:a, ...))

There is a special built-in function show:func of type a -> Array(Num), which displays a function in an implementation-defined manner. Functions have a non-primitive eq function:

def eq:func(x :: a, y :: a) :: Num:
    return error("Functions cannot be compared for equality")

The type dictionary for functions is complicated by the fact that they can have any arity. A separate show, eq and dictionary needs to be generated for each arity (these are generated as needed, for functions of any arity). For example, if there is any function of arity 2, the following code is generated:

def dict:func:2(dict:a :: Dict(a), dict:b :: Dict(b), dict:c :: Dict(c)) :: Dict((a, b) -> c):
    return Dict(show:func, eq:func)

Augmenting polymorphic functions

Any polymorphic function in the program may be augmented with type dictionaries (one type dictionary per type variable). In practice, we avoid unnecessarily requesting a type dictionary for any type variable that does not require one using static analysis (see Type dictionaries on demand).

Note

Data constructor functions are never augmented (as they never need to reference these methods). It doesn’t complicate the transformation to make this exception (in fact it would complicate it to not make this exception, since data constructors are special-cased).

Every user-defined function (not including the special functions generated for the type dictionaries themselves) is augmented with an extra argument at the start, for each type variable in the function’s head which will statically have show or eq called on it by this function or its descendants. For example, the function:

def remove_all(a :: Array(a), e :: a) :: Array(a):
    var i :: Num
    i = array_length(a) - 1
    while i >= 0:
        if eq(array_ref(a, i), e):
            a = array_remove(a, i)
        i = i - 1
    return a

would be augmented as follows:

def remove_all(dict:a :: Dict(a), a :: Array(a), e :: a) :: Array(a)

This means that any Mars code can produce the type dictionary for any type, whether it is known or unknown.

Calls to polymorphic functions are augmented with type dictionaries for each type parameter. This is trivial for known types. For example, the call:

remove_all([1, 2, 3, 2], 2)

is augmented as follows:

remove_all(dict:Num, [1, 2, 3, 2], 2)

All references to the show and eq functions are replaced with a field access to the type dictionary of the known type. For example, the full augmentation for remove_all is:

def remove_all(dict:a :: Dict(a), a :: Array(a), e :: a) :: Array(a):
    var i :: Num
    i = array_length(a) - 1
    while i >= 0:
        if dict:a.eq(array_ref(a, i), e):
            a = array_remove(a, i)
        i = i - 1
    return a

Augmenting computable global constants

Computable global constants with a polymorphic type will be turned into functions in order to augment them with type dictionaries. This has a practical implication that caching (which implementations are allowed, but not required to perform on CGCs) will be disabled. This is similar to what happens in Haskell (but only if the type annotation is given, and if a type class is used). For example:

def my_show :: a -> Array(Num):
    print_string("my_show\n")
    return show

This is a CGC which returns the show function (the print_string simply makes it easy to see when the function is being called; it could represent an expensive computation). An implementation which dynamically looks up the type of its argument, such as the built-in interpreter, can have a single show implementation which is returned by my_show. The body of my_show therefore only has to be called once.

An implementation using type dictionaries, however, must return a type-specific implementation of show during the body of my_show (since all function values are monomorphic). Therefore, my_show is augmented as follows:

def my_show(dict:a :: Dict(a)) :: a -> Array(Num):
    print_string("my_show\n")
    return dict:a.show

Now every reference to my_show will execute its body; as it is not a CGC it will not be cached. Consider a function which calls it:

print_string(my_show(4))
put_char('\n')
print_string(my_show([1,2,3]))
put_char('\n')

Using a caching implementation without type dictionaries, this would print:

my_show
4
[1, 2, 3]

It would be augmented as follows:

print_string(my_show(dict_Int)(4))
put_char('\n')
print_string(my_show(dict_Array(dict_Int))([1,2,3]))
put_char('\n')

The augmented code prints:

my_show
4
my_show
[1, 2, 3]

Lifting polymorphic functions

Recall that Mars functions are only polymorphic if they are global; local functions are always monomorphic (see Type unification). This neatly simplifies the type dictionary transformation, because only global functions need to be augmented. First-class function objects never take type dictionaries, because they will have already had their types resolved. A polymorphic function which is “lifted” into a first-class function object will have all of its type dictionaries curried into it at the time of lifting; this applies to partial application.

For this reason, it is never necessary to curry some but not all type dictionaries during partial application – all types are resolved during partial application, regardless of whether any arguments of that type have been supplied. This also applies to function arguments; we can assume that the functions have already had their type dictionaries curried and therefore do not need to supply type dictionaries when calling these functions.

Consider the example twomaps, discussed in Type unification. The type error can be corrected by introducing two separate local variables, one for each usage of array_map. Therefore, this version has no type error:

def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
    var mymap1 :: (a -> b, Array(a)) -> Array(b)
    var mymap2 :: (a -> c, Array(a)) -> Array(c)
    var y :: Array(b)
    var z :: Array(c)
    mymap1 = array_map
    mymap2 = array_map
    y = mymap1(f, x)
    z = mymap2(g, x)
    return Pair(y, z)

This would be augmented as follows:

def twomaps(dict:a :: Dict(a), dict:b :: Dict(b), dict:c :: Dict(c), f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
    var mymap1 :: (a -> b, Array(a)) -> Array(b)
    var mymap2 :: (a -> c, Array(a)) -> Array(c)
    var y :: Array(b)
    var z :: Array(c)
    mymap1 = array_map(dict:a, dict:b, ...)
    mymap2 = array_map(dict:a, dict:c, ...)
    y = mymap1(f, x)
    z = mymap2(g, x)
    return Pair(y, z)

Note

This isn’t a very good example, as in practice, the type variables ‘a’ and ‘b’ don’t need type dictionaries, so the analysis would not augment this function at all.

Note that the only augmentation happens during the lifting of array_map from a polymorphic global function to a monomorphic local function. The local functions mymap1 and mymap2 have unaugmented types; those functions have the type dictionaries encoded into them. Calls to these functions are also unaugmented. Similarly, the function arguments f and g are unaugmented and do not need to be given type dictionaries; the caller of twomaps must supply functions which have already had type dictionaries curried in, if necessary.

Dictionaries for free variables

As discussed in Types, a type variable is either free, bound or rigid. Rigid type variables are those which have type dictionary arguments supplied. We can look up the type dictionary of a bound type variable by dereferencing it (resulting in either a rigid type variable or a concrete type). Now free variables typically become bound by the end of a function’s type analysis, so very rarely show up in code analysis (after type checking), but there are valid situations in which a variable remains free, including:

  • When a constructor is used which does not include one of the typedef’s type parameters (e.g., Nil, the empty list, has type List(a), and may be manipulated without ever resolving the type a).
  • The empty array, []. Like the empty list, this has type Array(a).
  • The result of error, of type a. Obviously, such situations cause the program to terminate, but they still need to type check and compile properly, even if the result type is not unified with anything (for example, if the result is passed to show, a type dictionary will be required).

Let us take the example of the empty array:

def empty_array :: Array(a) = []

def print_empty() :: Num:
    r = empty_array
    return print_value(r)

Note

print_value is defined as print_string(show(val)).

The local variable r has type a, where a is a free type variable. In augmenting this function, it is unclear which type dictionary to pass for a. Clearly, calling show on a value with an unknown type cannot possibly resolve to a concrete implementation of show. Fortunately, it doesn’t have to – there will never be a concrete value at runtime whose type is that of a free variable (in this case, the array has no elements, so show:Array will never call its dict:a.show method).

Note

This would be untrue if Mars supported type classes, or provided any built-ins which produce a value that depends on the type (such as read). In that case, unresolved free variables would be a legitimate compiler error in certain cases (for example, show(read(S)) has no reasonable semantics as there is no type for the result of read).

Having said that, Mercury’s io.read predicate can be used in this truly-ambiguous way without error (only a warning). It is handled the same way as Mars does – by unifying with a dummy type void, resulting in no valid parses.

Since we can deduce that any call to show or eq with a variable of a free type will either be unreachable (in the error case), or will never access a value of the free type (in the empty list/array cases), the type dictionary doesn’t need to resolve to any particular concrete implementation.

Statically, however, it needs to resolve to something, and it can’t impose any additional type constraints (so simply passing dict:Num as a fallback isn’t valid, as it will cause the free variable to become bound to Num). Therefore, we provide a special type dictionary for free-typed variables which simply raises an error if its show or eq methods are called (which they should never be):

def show:free(x :: a) :: Array(Num):
    return error("show:free")

def eq:free(x :: a, y :: a) :: Num:
    return error("eq:free")

def dict:free :: Dict(a) = Dict(show:free, eq:free)

Therefore, we can augment the above code as follows:

def empty_array(dict:a :: Dict(a)) :: Array(a) = []

def print_empty() :: Num:
    r = empty_array(dict:free)
    return print_value(dict:Array(dict:free), r)

The call to print_value will call its dictionary’s show method, show:Array(show:free), but that method will never call its dictionary’s show method, show:free, because the array is empty. If the array was not empty, we wouldn’t have been able to keep the type variable free.

Note

Again, this isn’t a very good example, as empty_array would not have a type dictionary augmented.

Warning

There is actually a problem here: it is possible for show:free and eq:free to be called by the user by abusing a problem with the interactive prompt. Since interactive local variables may have monomorphic types with free variables, it is possible to assign, say, show to a local. Since the type is not yet resolved, it will be augmented as show:free, but could then be called with, say, a Num argument, causing show:free to be executed. This is discussed in the FAQ.

Note

In Haskell, this situation is simply an error. The equivalent:

empty_list :: Show a => [a]
empty_list = []

print_empty :: IO ()
print_empty =
    let r = empty_list
    in  putStrLn $ show r

The expression show r has an ambiguous type ∀a. Show a => String, and is therefore a type error. Note that this is rarer in Haskell than in Mars, because it only arises due to the type class constraint. It is perfectly acceptable to have ambiguously-typed values without type class constraints; in this case, the question of which type dictionary to use never comes up. In Mars, it is much more common because every type variable implicitly has a type class constraint.

If we had type classes in Mars, we would need to make it a compiler error, as we could no longer make the guarantee that values of unresolved types would never be generated at runtime.

In Mercury (with a closer type system to Mars), this situation is handled the same way as Mars, only the compiler produces a warning:

:- func empty_list = list(T).
empty_list = [].

:- pred print_empty(io::di, io::uo) is det.
print_empty(!IO) :-
    R = empty_list,
    io.write_string(string(R), !IO).

This produces the warning:

warning: unresolved polymorphism.
The variable with an unbound type was:
  R: list.list(T)
The unbound type variable will be implicitly bound to the
builtin type `void'.

Type dictionaries on demand

There is now an analysis which determines precisely which type variables in a function require a type dictionary. This is precisely those type variables which are referenced during the body of a function (i.e., to look up a method, or to pass to another function). In other words, there are no type dictionary parameters which are unused in the body of the function.

This analysis is necessarily a fixpoint analysis, because the analysis of one function is dependent upon all of its callees, which can include mutual or self recursion.

The approach taken is simply to start off by assigning no type dictionary parameters to any function (with a special case for show and eq for obvious reasons). Then, determine the strongly connected components of the call graph and augment them in topological order (from the bottom up). Therefore, when each function is being augmented, all of its callees have already been finalised, with the exception of itself and mutually-recursive functions (those functions in the same SCC).

The function is augmented, and every time an attempt is made to use a typedict parameter which does not exist, the missing parameter is logged. Once the whole SCC is augmented, a check is made to see whether any missing parameters were logged.

If not, the whole SCC is completely augmented, and any type variables which did not receive a type dictionary parameter simply did not need one. If there were missing parameters, the SCC is reset back to the unaugmented version. The missing parameters are augmented into the functions’ headers, and the SCC is re-augmented, hopefully this time not missing any. It is possible that on the second iteration, some more parameters will be requested that weren’t the previous time, and more errors will be logged. The process continues until there are no more problems.