.. Mars Documentation Copyright (C) 2011 Matt Giuca .. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. .. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. .. You should have received a copy of the GNU General Public License along with this program. If not, see . .. _ref-typedicts: ***************** Type dictionaries ***************** .. sectionauthor:: Matt Giuca Mars has no type classes per se, but the built-in functions :func:`show` and :func:`eq` are defined to work on any type. This means that polymorphic functions need to know something about the types of their arguments in order to perform the correct string conversion or equality check. There are three main ways to deal with this problem: 1. Every object implicitly knows enough about its type to perform the operation, 2. Polymorphic functions are told about the types which are unknown at compile time when they are called, 3. All polymorphism is specialised at compile-time, and each specialised version intrinsically knows how to handle the concrete types. Approach #1 is used by object-oriented languages to perform dynamic method lookup; since at best, containers can have heterogeneous subtypes, and at worst, all objects are dynamically typed, each object must individually track its own type (usually with a vtable). This approach is used by the built-in Mars interpreter. Approach #2 is used by the implementation of Haskell, and the Mars LLVM backend. It is far more efficient than the first approach, but only works for typeclasses with parametric polymorphism (note that object-oriented languages use approach #1 to solve a different problem, dynamic method dispatch). Approach #3 is used by C++ for templates (its equivalent of parametric polymorphism), but can generate a lot of excess code and has significant problems for modularity (requires code generation at call time). The LLVM backend uses approach 2, passing type dictionaries (or in C++ terminology, vtables) to keep track of the implicit type classes :type:`Show` and :type:`Eq` (in Haskell parlance). Code generation =============== To simplify backends which require type dictionaries, this entire system is implemented as a source-to-source translation. Any Mars program can be translated into another valid and equivalent Mars program which does not call the :type:`show` or :type:`eq` built-ins, passing type dictionaries instead. In other words, type dictionaries require no special features, except for special functions for showing and comparing built-in types. Hence, the type dictionary transformation produces normal Mars code, with one exception: all new identifiers generated by the transformation contain a colon, to avoid collisions with existing identifiers. (Mars source-to-source transformations are allowed to introduce identifiers with colons, whereas user code cannot.) There are three parts to the code generation: * Boilerplate built-in code. This code is (almost) the same for every program. It includes the basic definition of the type dictionary, as well as type dictionaries for the built-in types. It only differs from program to program in its selection of which function arities to support (based on which function arities the program uses). * User type dictionary code. The compiler generates some unique code for each user-defined type, comprising a type dictionary for that type. * Function augmentation. All polymorphic functions and calls to polymorphic functions are augmented (modified) to support type dictionary passing. The type dictionary =================== A *type dictionary* is a runtime object which provides information about a particular type. In the current implementation, the only "information" we need is an implementation of the :func:`show` and :func:`eq` functions for the type. Therefore, a type "dictionary" is just a pair of :type:`a -> Array(Num)` and :type:`(a, a) -> Num`. This is available to the generated code as the special user-defined type:: type Dict(a): Dict(show :: a -> Array(Num), eq :: (a, a) -> Num) Note that type dictionaries correspond to *data types* only (see :ref:`ref-types`). Hence, there is a type dictionary for the :type:`Num` data type, and the :type:`Array(Num)` data type (and all other instantiations of the :type:`Array` type), but there is no single type dictionary for the :type:`Array` type constructor. At compile-time, a *show* and an *eq* function, and a type-dictionary constructor function is generated for each type *constructor*. For non-parameterised types, this means two simple functions, and a computable global constant for the dictionary. For example, a user-defined type :type:`NumBox`:: type NumBox: NumBox(Num) would have the following type dictionary generated:: def show:NumBox(x :: NumBox) :: Array(Num): var res :: Array(Num) switch x: case NumBox(f_0): res = "NumBox(" res = array_concat(res, dict:Num.show(f_0)) res = array_concat(res, ")") return res def eq:NumBox(x :: NumBox, y :: NumBox) :: Num: switch x: case Pair(x_f_0): switch y: case Pair(y_f_0): if dict:Num.eq(x_f_0): pass else: return 0 return 1 def dict:NumBox :: Dict(NumBox) = Dict(show:NumBox, eq:NumBox) The *show* and *eq* functions for types with parameters take type dictionaries as arguments, which are used to recurse on the members of the type. There is no constant type dictionary for such types; instead, there is a type dictionary *constructor function* which takes a type dictionary for each parameter, and curries it into the *show* and *eq* functions, producing a dictionary. For example, the prelude type :type:`Pair`:: type Pair(a, b): Pair(a, b) would have the following type dictionary generated:: def show:Pair(dict:a :: Dict(a), dict:b :: Dict(b), x :: Pair(a, b)) :: Array(Num): var res :: Array(Num) switch x: case Pair(f_0, f_1): res = "Pair(" res = array_concat(res, dict:a.show(f_0)) res = array_concat(res, ", ") res = array_concat(res, dict:b.show(f_1)) res = array_concat(res, ")") return res def eq:Pair(dict:a :: Dict(a), dict:b :: Dict(b), x :: Pair(a, b), y :: Pair(a, b)) :: Num: switch x: case Pair(x_f_0, x_f_1): switch y: case Pair(y_f_0, y_f_1): if dict:a.eq(x_f_0, y_f_1): pass else: return 0 if dict:b.eq(x_f_1, y_f_1): pass else: return 0 return 1 def dict:Pair(dict:a :: Dict(a), dict:b :: Dict(b)) :: Dict(Pair(a, b)): return Dict(show:Pair(dict:a, dict:b, ...), eq:Pair(dict:a, dict:b, ...)) Notice that the *show* and *eq* functions recurse by looking up the *show* and *eq* functions in the appropriate type dictionaries. The :func:`dict:Pair` function can be used to create a type dictionary for a Pair, given the type dictionaries of the Pair's type parameters. Dictionaries of built-in types ------------------------------ The built-in type :type:`Num` has special built-in functions :func:`show:Num` and :func:`eq:Num` which convert an number into a string, and compare two numbers, respectively. Its type dictionary is defined as:: def dict:Num :: Dict(Num) = Dict(show:Num, eq:Num) The built-in type :type:`Array` has non-primitive *show* and *eq* functions:: def show:Array(dict:a :: Dict(a), x :: Array(a)) :: Array(Num): var res :: Array(Num) var i :: Num res = "[" if 0 < array_length(x): res = array_concat(res, dict:a.show(array_ref(x, 0))) i = 1 while i < array_length(x): res = array_concat(array_concat(res, ", "), dict:a.show(array_ref(x, i))) i = i + 1 res = array_concat(res, "]") return res def eq:Array(dict:a :: Dict(a), x :: Array(a), y :: Array(a)) :: Num: var i :: Num i = 0 if eq:Num(array_length(x), array_length(y)): pass else: return 0 while i < array_length(x): if dict:a.eq(array_ref(x, i), array_ref(y, i)): pass else: return 0 i = i + 1 return 1 def dict:Array(dict:a :: Dict(a)) :: Dict(Array(a)): return Dict(show:Array(dict:a, ...), eq:Array(dict:a, ...)) There is a special built-in function :func:`show:func` of type :type:`a -> Array(Num)`, which displays a function in an implementation-defined manner. Functions have a non-primitive *eq* function:: def eq:func(x :: a, y :: a) :: Num: return error("Functions cannot be compared for equality") The type dictionary for functions is complicated by the fact that they can have any arity. A separate *show*, *eq* and dictionary needs to be generated for each arity (these are generated as needed, for functions of any arity). For example, if there is any function of arity 2, the following code is generated:: def dict:func:2(dict:a :: Dict(a), dict:b :: Dict(b), dict:c :: Dict(c)) :: Dict((a, b) -> c): return Dict(show:func, eq:func) Augmenting polymorphic functions ================================ Any polymorphic function in the program may be augmented with type dictionaries (one type dictionary per type variable). In practice, we avoid unnecessarily requesting a type dictionary for any type variable that does not require one using static analysis (see :ref:`ref-typedict-demand`). .. note:: Data constructor functions are *never* augmented (as they never need to reference these methods). It doesn't complicate the transformation to make this exception (in fact it would complicate it to *not* make this exception, since data constructors are special-cased). Every user-defined function (not including the special functions generated for the type dictionaries themselves) is augmented with an extra argument at the start, for each type variable in the function's head which will statically have :func:`show` or :func:`eq` called on it by this function or its descendants. For example, the function:: def remove_all(a :: Array(a), e :: a) :: Array(a): var i :: Num i = array_length(a) - 1 while i >= 0: if eq(array_ref(a, i), e): a = array_remove(a, i) i = i - 1 return a would be augmented as follows:: def remove_all(dict:a :: Dict(a), a :: Array(a), e :: a) :: Array(a) This means that any Mars code can produce the type dictionary for any type, whether it is known or unknown. Calls to polymorphic functions are augmented with type dictionaries for each type parameter. This is trivial for known types. For example, the call:: remove_all([1, 2, 3, 2], 2) is augmented as follows:: remove_all(dict:Num, [1, 2, 3, 2], 2) All references to the :func:`show` and :func:`eq` functions are replaced with a field access to the type dictionary of the known type. For example, the full augmentation for :func:`remove_all` is:: def remove_all(dict:a :: Dict(a), a :: Array(a), e :: a) :: Array(a): var i :: Num i = array_length(a) - 1 while i >= 0: if dict:a.eq(array_ref(a, i), e): a = array_remove(a, i) i = i - 1 return a Augmenting computable global constants -------------------------------------- Computable global constants with a polymorphic type will be turned into functions in order to augment them with type dictionaries. This has a practical implication that caching (which implementations are allowed, but not required to perform on CGCs) will be disabled. This is similar to what happens in Haskell (but only if the type annotation is given, and if a type class is used). For example:: def my_show :: a -> Array(Num): print_string("my_show\n") return show This is a CGC which returns the :func:`show` function (the :func:`print_string` simply makes it easy to see when the function is being called; it could represent an expensive computation). An implementation which dynamically looks up the type of its argument, such as the built-in interpreter, can have a single :func:`show` implementation which is returned by :func:`my_show`. The body of :func:`my_show` therefore only has to be called once. An implementation using type dictionaries, however, must return a type-specific implementation of :func:`show` during the body of :func:`my_show` (since all function values are monomorphic). Therefore, :func:`my_show` is augmented as follows:: def my_show(dict:a :: Dict(a)) :: a -> Array(Num): print_string("my_show\n") return dict:a.show Now every reference to :func:`my_show` will execute its body; as it is not a CGC it will not be cached. Consider a function which calls it:: print_string(my_show(4)) put_char('\n') print_string(my_show([1,2,3])) put_char('\n') .. highlight:: none Using a caching implementation without type dictionaries, this would print:: my_show 4 [1, 2, 3] .. highlight:: mars It would be augmented as follows:: print_string(my_show(dict_Int)(4)) put_char('\n') print_string(my_show(dict_Array(dict_Int))([1,2,3])) put_char('\n') .. highlight:: none The augmented code prints:: my_show 4 my_show [1, 2, 3] .. highlight:: mars Lifting polymorphic functions ============================= Recall that Mars functions are only polymorphic if they are global; local functions are always monomorphic (see :ref:`ref-type-unification`). This neatly simplifies the type dictionary transformation, because only global functions need to be augmented. First-class function objects never take type dictionaries, because they will have already had their types resolved. A polymorphic function which is "lifted" into a first-class function object will have all of its type dictionaries curried into it at the time of lifting; this applies to partial application. For this reason, it is never necessary to curry some but not all type dictionaries during partial application -- all types are resolved during partial application, regardless of whether any arguments of that type have been supplied. This also applies to function arguments; we can assume that the functions have already had their type dictionaries curried and therefore do not need to supply type dictionaries when calling these functions. Consider the example :func:`twomaps`, discussed in :ref:`ref-type-unification`. The type error can be corrected by introducing two separate local variables, one for each usage of :func:`array_map`. Therefore, this version has no type error:: def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)): var mymap1 :: (a -> b, Array(a)) -> Array(b) var mymap2 :: (a -> c, Array(a)) -> Array(c) var y :: Array(b) var z :: Array(c) mymap1 = array_map mymap2 = array_map y = mymap1(f, x) z = mymap2(g, x) return Pair(y, z) This would be augmented as follows:: def twomaps(dict:a :: Dict(a), dict:b :: Dict(b), dict:c :: Dict(c), f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)): var mymap1 :: (a -> b, Array(a)) -> Array(b) var mymap2 :: (a -> c, Array(a)) -> Array(c) var y :: Array(b) var z :: Array(c) mymap1 = array_map(dict:a, dict:b, ...) mymap2 = array_map(dict:a, dict:c, ...) y = mymap1(f, x) z = mymap2(g, x) return Pair(y, z) .. note:: This isn't a very good example, as in practice, the type variables 'a' and 'b' don't need type dictionaries, so the analysis would not augment this function at all. Note that the only augmentation happens during the *lifting* of :func:`array_map` from a polymorphic global function to a monomorphic local function. The local functions `mymap1` and `mymap2` have unaugmented types; those functions have the type dictionaries encoded into them. Calls to these functions are also unaugmented. Similarly, the function arguments *f* and *g* are unaugmented and do not need to be given type dictionaries; the caller of :func:`twomaps` must supply functions which have already had type dictionaries curried in, if necessary. Dictionaries for free variables =============================== As discussed in :ref:`ref-types`, a type variable is either **free**, **bound** or **rigid**. Rigid type variables are those which have type dictionary arguments supplied. We can look up the type dictionary of a bound type variable by dereferencing it (resulting in either a rigid type variable or a concrete type). Now **free** variables typically become bound by the end of a function's type analysis, so very rarely show up in code analysis (after type checking), but there are valid situations in which a variable remains free, including: * When a constructor is used which does not include one of the typedef's type parameters (e.g., :func:`Nil`, the empty list, has type :type:`List(a)`, and may be manipulated without ever resolving the type :type:`a`). * The empty array, ``[]``. Like the empty list, this has type :type:`Array(a)`. * The result of :func:`error`, of type :type:`a`. Obviously, such situations cause the program to terminate, but they still need to type check and compile properly, even if the result type is not unified with anything (for example, if the result is passed to :func:`show`, a type dictionary will be required). Let us take the example of the empty array:: def empty_array :: Array(a) = [] def print_empty() :: Num: r = empty_array return print_value(r) .. note:: :func:`print_value` is defined as ``print_string(show(val))``. The local variable *r* has type :type:`a`, where *a* is a free type variable. In augmenting this function, it is unclear which type dictionary to pass for *a*. Clearly, calling :func:`show` on a value with an unknown type cannot possibly resolve to a concrete implementation of *show*. Fortunately, it doesn't have to -- there will never be a concrete value at runtime whose type is that of a free variable (in this case, the array has no elements, so :func:`show:Array` will never call its ``dict:a.show`` method). .. note:: This would be untrue if Mars supported type classes, or provided any built-ins which produce a value that depends on the type (such as :func:`read`). In that case, unresolved free variables would be a legitimate compiler error in certain cases (for example, ``show(read(S))`` has no reasonable semantics as there is no type for the result of :func:`read`). Having said that, Mercury's :func:`io.read` predicate can be used in this truly-ambiguous way without error (only a warning). It is handled the same way as Mars does -- by unifying with a dummy type :type:`void`, resulting in no valid parses. Since we can deduce that any call to :func:`show` or :func:`eq` with a variable of a free type will either be unreachable (in the :func:`error` case), or will never access a value of the free type (in the empty list/array cases), the type dictionary doesn't need to resolve to any particular concrete implementation. Statically, however, it needs to resolve to something, and it can't impose any additional type constraints (so simply passing :func:`dict:Num` as a fallback isn't valid, as it will cause the free variable to become bound to :type:`Num`). Therefore, we provide a special type dictionary for free-typed variables which simply raises an error if its *show* or *eq* methods are called (which they should never be):: def show:free(x :: a) :: Array(Num): return error("show:free") def eq:free(x :: a, y :: a) :: Num: return error("eq:free") def dict:free :: Dict(a) = Dict(show:free, eq:free) Therefore, we can augment the above code as follows:: def empty_array(dict:a :: Dict(a)) :: Array(a) = [] def print_empty() :: Num: r = empty_array(dict:free) return print_value(dict:Array(dict:free), r) The call to :func:`print_value` will call its dictionary's `show` method, ``show:Array(show:free)``, but that method will never call *its* dictionary's `show` method, :func:`show:free`, because the array is empty. If the array was not empty, we wouldn't have been able to keep the type variable free. .. note:: Again, this isn't a very good example, as :func:`empty_array` would not have a type dictionary augmented. .. warning:: There is actually a problem here: it is possible for :func:`show:free` and :func:`eq:free` to be called by the user by abusing a problem with the interactive prompt. Since interactive local variables may have monomorphic types with free variables, it is possible to assign, say, :func:`show` to a local. Since the type is not yet resolved, it will be augmented as :func:`show:free`, but could then be called with, say, a :type:`Num` argument, causing :func:`show:free` to be executed. This is discussed in the :ref:`FAQ `. .. highlight:: haskell .. note:: In Haskell, this situation is simply an error. The equivalent:: empty_list :: Show a => [a] empty_list = [] print_empty :: IO () print_empty = let r = empty_list in putStrLn $ show r The expression ``show r`` has an ambiguous type :type:`∀a. Show a => String`, and is therefore a type error. Note that this is rarer in Haskell than in Mars, because it only arises due to the type class constraint. It is perfectly acceptable to have ambiguously-typed values without type class constraints; in this case, the question of which type dictionary to use never comes up. In Mars, it is much more common because every type variable implicitly has a type class constraint. If we *had* type classes in Mars, we would need to make it a compiler error, as we could no longer make the guarantee that values of unresolved types would never be generated at runtime. .. highlight:: prolog In Mercury (with a closer type system to Mars), this situation is handled the same way as Mars, only the compiler produces a warning:: :- func empty_list = list(T). empty_list = []. :- pred print_empty(io::di, io::uo) is det. print_empty(!IO) :- R = empty_list, io.write_string(string(R), !IO). .. highlight:: none This produces the warning:: warning: unresolved polymorphism. The variable with an unbound type was: R: list.list(T) The unbound type variable will be implicitly bound to the builtin type `void'. .. highlight:: mars .. _ref-typedict-demand: Type dictionaries on demand =========================== There is now an analysis which determines precisely which type variables in a function require a type dictionary. This is precisely those type variables which are referenced during the body of a function (i.e., to look up a method, or to pass to another function). In other words, there are no type dictionary parameters which are unused in the body of the function. This analysis is necessarily a fixpoint analysis, because the analysis of one function is dependent upon all of its callees, which can include mutual or self recursion. The approach taken is simply to start off by assigning *no* type dictionary parameters to any function (with a special case for :func:`show` and :func:`eq` for obvious reasons). Then, determine the strongly connected components of the call graph and augment them in topological order (from the bottom up). Therefore, when each function is being augmented, all of its callees have already been finalised, with the exception of itself and mutually-recursive functions (those functions in the same SCC). The function is augmented, and every time an attempt is made to use a typedict parameter which does not exist, the missing parameter is logged. Once the whole SCC is augmented, a check is made to see whether any missing parameters were logged. If not, the whole SCC is completely augmented, and any type variables which did not receive a type dictionary parameter simply did not need one. If there were missing parameters, the SCC is reset back to the unaugmented version. The missing parameters are augmented into the functions' headers, and the SCC is re-augmented, hopefully this time not missing any. It is possible that on the second iteration, some more parameters will be requested that weren't the previous time, and more errors will be logged. The process continues until there are no more problems.