.. Mars Documentation
Copyright (C) 2014 Matt Giuca
2014-07-05
.. This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.. This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.. You should have received a copy of the GNU General Public License
along with this program. If not, see .
.. Note the careful use of ..highlight for switching between the 'mars' and
'marscon' languages. 'mars' should be used for code examples, and 'marscon'
for Bash and Mars console session examples.
.. _destructive:
Automatic destructive update
============================
.. sectionauthor:: Matt Giuca
One of the exciting features of Mars is that it supports *automatic
destructive update* for array operations. The compiler can, in some
circumstances, turn inefficient copy-and-update operations into in-place
destructive operations. Mars will do this behind the scenes, without you
needing to worry about it, but knowing some of the details of this feature can
help you make sure you get the best performance out of your array operations.
Because Mars is a purely declarative language, functions cannot normally
modify their arguments, which means that updating data structures ---
particularly arrays --- can be inefficient. Recall this code from the previous
chapter::
def array_increment(a :: Array(Num)) :: Array(Num):
i = 0
while i < array_length(a):
a = array_replace(a, i, array_ref(a, i) + 1)
i = i + 1
return a
In a naïve implementation, this algorithm has quadratic running time, because
every call to :func:`array_replace` makes a full copy of the array. But what
if the compiler was smart enough to realise that the array did not always need
to be copied, because the old value of the array would never be used again?
In *certain cases*, the Mars compiler is clever enough to optimize away the
copy operations. Before we discuss this example, let's look at some simpler
cases.
Direct calls to array operations
--------------------------------
Consider a simple Mars program that performs two updates to an array::
import prelude
def main() :: io Num:
a = [1, 2, 3]
print_value(array_replace(a, 0, 10))
put_char('\n')
print_value(array_replace(a, 1, 20))
put_char('\n')
return 0
.. highlight:: marscon
The output of this program is::
[10, 2, 3]
[1, 20, 3]
.. highlight:: mars
It is important to note that :func:`array_replace` never changes *a*, so the
second line has the original value of the first array element (1, not 10). You
might realise, however, that *a* is never used again after that second call to
:func:`array_replace`. The Mars compiler realises this too, and takes the
liberty of automatically telling :func:`array_replace` to just mutate the
array it is given, and not bother copying it (effectively transforming it into
a call to :func:`array_set`).
You won't be able to tell the difference in the output, but if you do a lot of
array updates, you might notice the performance difference. Mars automatically
transforms calls to the following array functions:
* :func:`array_replace`
* :func:`array_add`
* :func:`array_concat` (destructive on the first argument only)
* :func:`array_remove`
In principle, we could apply the same techniques to updating other data
structures as well, but for the time being, Mars only applies this
optimization to arrays.
.. highlight:: marscon
But, aside from profiling, how do you know when Mars has transformed an update
operation? In the Mars interactive mode, you can use the ":uc" command to see
which calls to a particular function are destructive. For example::
$ mars -i test.mar
?> :uc main:array_replace
[N]
[D]
Here, we are asking the compiler: show the call mode of every call to
:func:`array_replace` from :func:`main`. As there are two calls to
:func:`array_replace`, it prints two lines. The first is "[N]", meaning
non-destructive or normal --- we cannot update *a* because it will be used
later on. The second is "[D]", meaning destructive. This shows that the
compiler is indeed destructively updating the array *a* on the second call to
:func:`array_replace`.
.. highlight:: mars
Calls to functions that update arrays
-------------------------------------
Mars is clever enough to track usage information across function calls, so the
analysis still works if a function tries to update an array it was given as an
argument.
Consider this program, which generates an array of numbers from 0 to 99::
import prelude
# Appends the length of array as a new element to the end.
def push_length(array :: Array(Num)) :: Array(Num):
len = array_length(array)
return array_add(array, len)
def main() :: io Num:
a = []
while array_length(a) < 100:
a = push_length(a)
print_value(a)
put_char('\n')
return 0
.. highlight:: marscon
Can Mars optimize the call to :func:`array_add` in :func:`push_length`?
The answer is "it depends". The function is modifying an array it was given as
an argument, so it cannot perform destructive update unless it is sure that
the *caller* is not going to need it again. Let's see what ":uc" says::
?> :uc push_length:array_add
[{array}]
The answer is neither N nor D, but "{array}", which means "the call to
:func:`array_add` will be destructive if the caller does not need *array* any
more." Essentially, :func:`push_length` has two modes, just like the primitive
array operations.
So, to find out whether it will be destructive, we must ask :func:`main`::
?> :uc main:push_length
[D]
.. highlight:: mars
So even though :func:`main` is going to call :func:`push_length` one hundred
times, it only has a single static call to the function, and the "D" tells us
that it will call the destructive mode of :func:`push_length`. This means that
our program will be highly efficient --- it will create only a single array
and modify it one hundred times.
But what about another program that calls :func:`push_length`::
def main() :: io Num:
a = [1, 2, 3]
b = push_length(a)
print_string("Length was ")
print_value(array_length(a))
print_string("; now it is ")
print_value(array_length(b))
put_char('\n')
return 0
.. highlight:: marscon
Notice that this program reads from *a* after calling :func:`push_length`.
Unfortunately, this disqualifies it from using the destructive version of
:func:`push_length` (or else ``array_length(a)`` would return the wrong
result). As we can see, the program is now calling the non-destructive
version::
?> :uc main:push_length
[N]
.. highlight:: mars
To fix this problem, we can make sure there are no usages of *a* after
:func:`push_length` is called, by pre-caching its original length::
def main() :: io Num:
a = [1, 2, 3]
a_len = array_length(a) # Must be before push_length(a).
b = push_length(a)
print_string("Length was ")
print_value(a_len)
print_string("; now it is ")
print_value(array_length(b))
put_char('\n')
return 0
.. highlight:: marscon
Now the array is destructively updated::
?> :uc main:push_length
[D]
.. highlight:: mars
Iteration inside a function
---------------------------
Now let's return to our original :func:`array_increment` example::
def array_increment(a :: Array(Num)) :: Array(Num):
i = 0
while i < array_length(a):
a = array_replace(a, i, array_ref(a, i) + 1)
i = i + 1
return a
.. highlight:: marscon
Can this quadratic-time algorithm be automatically made into a linear-time
one? Unfortunately, the answer, for the moment, is "it depends"::
?> :uc array_increment:array_replace
[{a}]
.. highlight:: mars
As with the :func:`push_length` example, :func:`array_increment` is
destructive only if the array is never used again by the caller. That means we
have perfect linear behaviour for a simple caller like this::
def main() :: io Num:
a = [1, 2, 3]
print_value(array_increment(a)) # [D]
put_char('\n')
But if a caller needs to use the original *a* again after the call to
:func:`array_increment`, there will be no optimization::
def main() :: io Num:
a = [1, 2, 3]
print_value(array_increment(a)) # [N]
put_char('\n')
print_value(array_add(a, 4)) # [D]
put_char('\n')
This is not ideal: the fact that the caller needs *a* again means that
:func:`array_increment` needs to copy it once, but it doesn't need to copy it
every time around the loop! There is an inherent limitation in the compiler
that each call site has a fixed mode: it can't behave one way on the first
iteration of the loop, and another way for each subsequent iteration.
Armed with the knowledge from ":uc", we can manually work around this
limitation to always get linear behaviour. We simply need to manually unroll
the loop, so the first iteration is separate from the main loop::
def array_increment(a :: Array(Num)) :: Array(Num):
if array_length(a) == 0:
return a
# Manually unrolled for destructive update.
a = array_replace(a, 0, array_ref(a, 0) + 1)
i = 1
while i < array_length(a):
a = array_replace(a, i, array_ref(a, i) + 1)
i = i + 1
return a
.. highlight:: marscon
Now we can ask the compiler how it has optimized the function::
?> :uc array_increment:array_replace
[{a}]
[D]
.. highlight:: mars
Success! The *first* update will be destructive if the caller allows it, or
make a copy if necessary. Either way, the array now belongs to us, and we can
update it however we wish. All calls to :func:`array_replace` in the body of
the loop are always destructive, regardless of the caller's instructions.
Either way, :func:`array_increment` is now a linear-time operation.
.. note::
Future versions of the Mars compiler may perform this unrolling
automatically, but for now, such functions must be hand-optimized.
Aliases and advanced troubleshooting
------------------------------------
So far, we have used the rather hand-wavy "if it will never be used again" to
describe when it is safe to destructively update. In reality, deciding whether
a variable will never be used again is the most complex part of the process,
due to *aliases* --- other variables that point to the same object as the
variable under consideration.
.. note::
Aliases are not normally something you must worry about in Mars. It is more
or less the whole point of a purely declarative language that you should
not need to care whether a two variables are aliased or just contain the
same value.
Consider the following example::
import prelude
def main() :: io Num:
a = [1, 2, 3]
b = a
print_value(array_replace(a, 0, 10)) # [N]
put_char('\n')
print_value(array_replace(b, 1, 20)) # [D]
put_char('\n')
return 0
In this example, the first call to :func:`array_replace` is non-destructive.
Why? It is the last time *a* is mentioned, but that is not the only
consideration: *a* and *b* are pointers to the same array object, so mutating
*a* would inadvertently cause *b* to change as well. Luckily, the Mars
compiler is clever enough to detect that *a* and *b* are aliased, so it
prevents the destructive update in this case.
In general, there are four conditions for optimizing the update to a variable
when calling a function:
* The variable in question must not be used again in the remainder of the
function.
* The variable must not be aliased to another variable that will be used in
the remainder of the function.
* If the variable is an argument to this function, or is aliased to an
argument, the caller must have authorized its update.
* The variable must not appear as two arguments to this function call.
If you are finding that a function call is not being optimized as it should
be, look for other variables that could potentially be aliased to the variable
you are trying to update.
The Mars aliasing inference system is fairly sophisticated; it can precisely
detect aliases across function calls and inside complex data structures.
However, the compiler is not perfect, and might think variables could be
aliased when in fact they can't be. You might have to try rearranging the code
to remove any doubt.
The best way to get a feel for what can be optimized is to experiment with
programs and the ":uc" command. If you think that the Mars compiler is not
optimizing some code that it could optimize, make your example as simple as
possible and `file a bug `_. If the
Mars compiler is doing destructive update on a variable that is still alive
(giving incorrect behaviour), then that is definitely a bug and should be
reported. If you need to disable automatic destructive update for some reason,
you can pass the ``-d-`` command-line argument to either :command:`mars` or
:command:`marsc`.