.. Mars Documentation
   Copyright (C) 2010 Matt Giuca
   19/1/2010

.. This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.

.. This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

.. You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.

.. highlight:: none
.. _ref-switchfactor:

**************************
Switch factoring algorithm
**************************

.. sectionauthor:: Matt Giuca <matt.giuca@gmail.com>

This is the algorithm in Mars for transforming an AST-level switch statement
to a CFG-level one. This flattens patterns out into multi-level switch
statements, so it is a non-recursive data structure (like the rest of CFG),
and represents the actual execution paths taken by the machine. It also
generates errors and warnings for malformed switch statements in the source.

First, we transform each case into a ":token:`matcher`", a temporary data
structure representing a single case in an equivalent, but flattened, form.
For each case, recurse through the pattern structure, and generate a flat
"matcher".  This results in a list of matchers, corresponding to the original
list of cases.

.. productionlist::
   matcher: `pair`(`list`(`match`), `stmt_block`)

A matcher list is an assoc list associating sequences of ":token:`match`"es
with blocks, just like a switch is an assoc list associating patterns with
blocks. A ":token:`match`" is a pattern without arguments (ctors have arities
rather than arguments) -- unlike patterns, matches don't recurse.

.. productionlist::
   match: `match_any` | `match_var` | `match_ctor`(Name, Arity) | `match_intlit`(Val)

A key operation in this algorithm is **generate_matcher_code**, which
generates low-level code from a matcher list. This recursive operation
performs most of the work, once the matcher list is generated. It is called
as::

 generate_matcher_code(arg_list, matcher_list)

Where `arg_list` is a list of atoms (variable names), denoting the variables
holding the switch values for the match list. The `arg_list` is always shorter
or equal in length to the `matcher_list`, as arg_list only has the arguments
for the current level, while `matcher_list` is fully expanded out.

.. highlight:: mars

Example::

 switch x:
     case Pair(A, 1)
     case Pair(B, 2)
     case Pair(A, 3)
     case Pair(q, _)
     case Pair(A, 5)

We want the final output to be::

 switch x:
     case Pair:
         a = x.Pair(0)
         n = x.Pair(1)
         switch a:
             case A:
                 switch n:
                     case 1
                     case 3
                     default: q = a
             case B:
                 switch n:
                     case 2
                     default: q = a
             default: q = a

.. highlight:: none

The arguments to pair are patterns, so this data structure is recursive. We
want to flatten it out. Therefore, the matchers we generate are:

    [(Pair/2,A,1)-block1, (Pair/2,B,2)-block2, (Pair/2,A,3)-block3, (Pair/2,q,_)-block4, (Pair/2,A,5)-block5]

(All the match arguments would have an arity, such as A/0 and 1/0, but we
don't show /0 in the list above. But not the default arguments, such as q.)

Now the constructors have no arguments -- we have a flat list of patterns. Note
that we recursively flatten this data, so if A had two arguments, we could see
a matcher like:

    (Pair/2,A/2,J,K,1)-block1

This reads operationally, "match a Pair, match the Pair's first argument with
A, match the A's first argument with J and its second with K, then match the
Pair's second argument with 1." So matchers are just a flat data structure for
representing the same information as the case statement, but with an
additional operational reading. Not all matchers have the same number of
steps, as different ctors have different arities.

Now before we generate code for the matchers, we add a layer of abstraction
over the blocks, to avoid duplicating blocks when we duplicate matchers.
Hence, our matcher list becomes:

    [(Pair/2,A,1)-"block1", (Pair/2,B,2)-"block2", (Pair/2,A,3)-"block3", (Pair/2,q,_)-"block4", (Pair/2,A,5)-"block5"]

    "block1": block1

    "block2": block2

    "block3": block3

    "block4": block4

    "block5": block5

(The unquoted block names represent the full block code; the quoted ones
represent a thin reference to the block name).

The overall algorithm, outside of generating matcher code, goes, for
switch(`switchvar`, `cases`):

1. Construct the matcher list, `matcher_list`, from cases, by flattening out
   the switch statement.

   * Generate the code for each target block, and store the Block Ids in the
     matcher list, not the blocks themselves.
2. Call :func:`generate_matcher_code` ([`switchvar`], `matcher_list`).

The result is a CFG-level switch statement. CFG-level switch statements are
very simple jump tables. They switch over an atom, and map ints and ctor names
to block IDs. They have an optional default clause, which is required to be
last. It is up to the backend to decide what to do if there is no default
clause, but the backend is free to exhibit undefined behaviour if a switch
doesn't match and there is no default clause, as the code generator will never
generate such an instruction due to exhaustive check, described below.

In this example, we call::

 generate_matcher_code(["x"], [(Pair/2,A,1)-"block1", (Pair/2,B,2)-"block2",
                               (Pair/2,A,3)-"block3", (Pair/2,q,_)-"block4",
                               (Pair/2,A,5)-"block5"])

======================================
Generating code for a list of matchers
======================================

.. py:function:: generate_matcher_code(arg_list, matcher_list)

Summary of the algorithm: Focus only on the first argument for each matcher --
the rest will be handled by recursion. Convert variable-binding matchers into
wildcard/default matchers which bind a variable in their code. For every
relevant head value, group together every possible matcher (those with an
exact match, and wildcard/default matchers). Make these groupings into switch
statements, and recurse to generate the body.

Note that we assume we will switch on arguments from left to right. We could
also pick a more intelligent order, such as choosing the argument with the
most distinct cases first (which means earlier switches eliminate the most
cases). In this case, the example below would choose the 2nd argument first,
but we'll assume left-to-right for now. Note that the flattening means we have
to make this decision *before* generating the matcher list.

Each call to :func:`generate_matcher_code` is given a block to write to (this
block will be completed by the end of the call). When a choice is required,
the block will be finished with a switch statement, and new blocks will be
created for each case (and finished off by recursive calls). When no choice is
required, some code may be appended to the block, and then the (single)
recursive call will complete the same block.

Calls to :func:`generate_matcher_code` may also append phi instructions to the
target blocks (which is the mechanism used to make explicit variable
bindings).

The function also needs to track two tables, for keeping variable bindings.
Ultimately, variable bindings are stored in phi instructions in the target
blocks, but during the code generation, we track them separately, because we
can't build phi nodes piece-by-piece. The **binding table** only moves down
through the stack, so any changes made to it are undone on the way up. This
maps :type:`blockid` to :type:`varname` to :type:`varname`, such that if
BB-BoundVar maps to SourceVar, it says "if BB wants BoundVar, it is currently
available in SourceVar." The **phi table** moves down and up through the
stack, so changes made to it are permanent throughout the algorithm. This is a
multimap mapping :type:`blockid` to :type:`varname` to
:type:`pair(blockid,varname)`, such that if TargetBlock-BoundVar maps to
SourceBlock-SourceVar, it says "TargetBlock should have a phi node for
BoundVar, where one entry maps SourceBlock to SourceVar." Both tables start
out empty.

Base case
---------

When `arg_list` is empty. It should always be the case that
`matcher_list` contains exactly one element (but assume it has multiple and
pick the first), whose match list is empty. In this case, generate a branch to
the block named by the first element of `matcher_list`.

Also add entries to the phi table for the target block. Get the target block
out of the binding table. For each entry in this table (BoundVar-SourceVar),
create a phi table entry (TargetBlock-BoundVar) = (CurrentBlock-SourceVar).

Step 1 -- Update binding table with explicit variable bindings
--------------------------------------------------------------

The first step of the operation (which in the current example will do nothing,
but we'll see it working later) is to update the binding table for explicit
variable bindings (as distinct from the implicit temporary variables, which
are created during switch generation). For each matcher whose first argument
is a variable name:

1. Add an entry in the binding table mapping (that matcher's block - the first
   variable in `arg_list`) to the given variable name. For example, if
   `arg_list` [1] is "a" and the first entry in a matcher is "q", then the
   variable "q" must be bound to the first switch value. Therefore, add to the
   binding table (``<matcher's block>`` - `q`) -> `a`.
2. Replace the first argument of the matcher with match_any ("_"). This is
   sound, as variables match anything, and we no longer need to worry about
   the variable binding, as it will be taken care of as soon as we branch to
   the code.

Step 2 -- Create association list
---------------------------------

The next step is to create an assoc list for every head value of interest, for
the first argument. The assoc list maps "match"es onto match lists. "Head
value" means the ctor without its arguments, or the number. "Of interest"
means the head values referred to by any matcher. Head values which are never
referred to (e.g., all the numbers which aren't explicitly mentioned) are all
the same as far as we're concerned.

Now we have to be *very careful* to preserve order, because default matchers
which are before more specific matchers need to supersede the specific ones.
(This assumes a language like Haskell, where the first match is chosen, not
Prolog/Mercury, where the order technically doesn't matter.) The grouping of
matchers with the same head will conceptually change the order, but this
*never* matters, because when we make a grouping, it is a committed choice. In
other words, once a match is made, we cover all possible subsequent cases
inside the match, and never fall back out to a default case. So we don't
re-order within a group, but once we have selected the groupings, their order
doesn't matter (though we keep them in order for the sake of predictable code
generation).

Let's return to our example, with the matcher list:

    [(Pair/2,A,1)-"block1", (Pair/2,B,2)-"block2", (Pair/2,A,3)-"block3", (Pair/2,q,_)-"block4", (Pair/2,A,5)-"block5"]

The resulting assoc list is very basic at this level, since all of the
matchers have the same head, Pair/2:

    Pair/2: [(A,1)-"block1", (B,2)-"block2", (A,3)-"block3", (q,_)-"block4", (A,5)-"block5"]

It's difficult to see at this stage, but the operation is to take each head
value of interest, and give it its own entry in the assoc list. Then, place in
the assoc list any matcher which begins with either the head value, or a "_".

* For matchers which begin with the head value, remove the first column of the
  matcher (so the matched head value no longer appears in the match list).
* For matchers which begin with a "_", replace the first column of the
  matcher (the "_") with one "_" for each argument of the head value (0 or
  more). For example, if placing a default matcher in the Cons/2 row, replace
  its front "_" with "_", "_" -- one for each argument. If placing a default
  matcher in the Nil row, remove its front "_".

If there are any default cases, also create an assoc list entry for "_", and
only place in the assoc list any matcher which begins with a "_". Remove the
first column of such matchers, as if matching a 0-arity head value.

**Horizontal rule**: For any matcher (looking at the remaining matches), if it
is *covered* by all of the matchers already on a row, remove it, as it is
redundant. By "covered", I mean for all of the remaining matches, if it is an
exact match, it already appears in a previous matcher, or if it is a default,
there is already a default in the previous matchers or between them they
include all constructors of that type. When removing a matcher, if the first
argument is not a default match, give a warning, as it is a useless clause.

**Vertical rule**: For any assoc list entry, if its head is *covered* by all
of the entries already in the table, remove it, as it is redundant. By
"covered", I mean if it is an exact match, that head already appeared in the
assoc list. If it is a default, there is already a default in the assoc list
or between them, all entries include all constructors of that type. When
removing an assoc list entry, give a warning for each matcher, as they are all
useless. (Perhaps do not give a warning for a default matcher covered by all
constructors, as that probably represents defensive programming. For example,
the programmer might just be using the default rule as an assert because he
doesn't trust the compiler to generate "not exhaustive" errors -- then again,
maybe it should be a warning because he should trust the compiler!) When a
default entry is placed in the assoc list, it will always be the last (this is
a structural requirement of the CFG-level switch statement).

Once this grouping is done (at each level), test that there is *either* a
default case, or that all ctors of the switch type have a case. If not, it is
a compiler error ("switch not exhaustive"). The error message should include
all of the missing cases ("requires default case or cases for A, B, C"), if it
is an ADT. If it is an number type, it must have a default case, so the error
message should just read "switch not exhaustive: default case required for
switch on number".

Step 3 -- Generate switch statement and implicit unpacking instructions
-----------------------------------------------------------------------

This enables us to generate the top-level switch statement. Switch over the
first variable of `arg_list`, and generate a case statement for each assoc
list entry. If the assoc list entry is default, generate the "default" clause
of the switch statement (which must be last). Within each case statement, we
unpack all arguments, giving them a temporary name.

**Unnecessary optimisation**: If the variable's original pattern is _ in all
matchers, don't unpack it -- note that to know this, you actually shouldn't
erase the variable name in step 1, or you won't know which are needed and
which aren't. Also, if their pattern is a variable name in all matchers, we
can use that variable name and avoid the mov instruction on a later recursion
of step 1, but for an optimising backend like LLVM, that won't save anything.

::

 <continue current block>
     switch x:
         case Pair: goto casepair

 casepair:
     a = x.Pair(0)
     n = x.Pair(1)
     <generate_matcher_code([a,n], [(A,1)-"block1", (B,2)-"block2",
                                    (A,3)-"block3", (q,_)-"block4",
                                    (A,5)-"block5"])>

**Optimisation**: If `arg_list` has a single element, and there were no
variables unpacked, then all recursive calls will be base cases with a simple
goto. Avoid generating a bunch of new goto-only blocks by making the case
statements directly branch to the first target block in each case, as the base
case would. Must also update the phi table as the base case does.

**Optimisation**: I naively generated a single-case switch statement. Instead,
we should avoid generating such a thing. If there is exactly one entry in the
assoc list, just generate the unpacking code directly::

 <continue current block>
     a = x.Pair(0)
     n = x.Pair(1)
     <generate_matcher_code([a,n], [(A,1)-"block1", (B,2)-"block2",
                                    (A,3)-"block3", (q,_)-"block4",
                                   (A,5)-"block5"])>

Step 4 -- Recurse
-----------------

Now we recurse on :func:`generate_matcher_code`. The recursive calls are shown
in the example above. Note that we recurse once for each assoc list entry (so
we iterate, then recurse).

The `arg_list` on the recursive calls is the current `arg_list` with the first
element removed, and the unpacked temporary variable names prepended. These
new arguments correspond to the first entries in the new matcher lists. The
`matcher_list` on the recursive calls is simply the corresponding value in the
assoc list, for this head.

================================================================
Afterwards -- Generating phi code for explicit variable bindings
================================================================

After all calls to :func:`generate_matcher_code` have ended, the phi table has
been built. This describes exactly where each target block should read the
variable bindings from, conditionally dependent on the block it came from.

These should be converted into phi instructions for each block, in the obvious
way. **Unnecessary optimisation**: Any entries in the phi table which have the
same source variable name for every source block can be converted into mov
instructions instead.

======================
Completing the example
======================

I now present the rest of the worked example, stepping through the recursive
calls. ::

 generate_matcher_code([a,n], [(A,1)-"block1", (B,2)-"block2", (A,3)-"block3",
                               (q,_)-"block4", (A,5)-"block5"])

First, we apply step 1, binding q for the matcher for (q,_).

1. prepend to the actual code of block 4, the instruction "q = a" (where a is
   the first variable we are matching against).
2. Replace q with _ in the matchers list. Hence:

   [(A,1)-"block1", (B,2)-"block2", (A,3)-"block3", (_,_)-"block4", (A,5)-"block5"]

Now we again group together the matchers by their first argument -- this time
we have multiple ctors and default matchers in the first argument. For every
matched ctor, we duplicate all of the default matchers. Thus, the groups are:

 A: [(1)-"block1", (3)-"block3", (_)-"block4", (5)-"block5"]

 B: [(2)-"block2", (_)-"block4"]

 _: [(_)-"block4"]

Note that block4 was duplicated for the A group, even though it did not
explicitly match A. This preserves the semantics of the switch statement -- if
we had not duplicated it, then matching A would discount that default case. As
explained above, when duplicating a default value, first test to see if it is
*covered* by all of the matchers before it, and if so, do not duplicate.

This allows us to generate the next switch statement::

 switch a:
     case A:
         <generate_matcher_code([n], [(1)-"block1", (3)-"block3",
                                      (_)-"block4", (5)-"block5"])>
     case B:
         <generate_matcher_code([n], [(2)-"block2", (_)-"block4"])>
     default:
         <generate_matcher_code([n], [(_)-"block4"])>

Now that we have three cases, we iterate over all of them, and recurse three
times.

The assoc list for case A:

    1: [()-"block1"]

    3: [()-"block3"]

    _: [()-"block4"]

(Applying the vertical rule to remove the entry "5: [()-"block5"]").

The assoc list for case B:

    2: [()-"block2"]

    _: [()-"block4"]

The assoc list for the default case:

    _: [()-"block4"]

This brings us to seven base cases, of <generate_matcher_code([],
[()-"blockn"])>, which just generates the goto statement to the given block.

.. highlight:: mars

We end up with this code::

 switch a:
     case A:
         switch n:
             case 1: goto "block1"
             case 3: goto "block3"
             default: goto "block4"
     case B:
         switch n:
             case 2: goto "block2"
             default: goto "block4"
     default: goto "block4"

Then we generate all of the code for each block. Note that each block only has
code generated once, but may be targeted by multiple case statements, such as
block4.

================
Further examples
================

Above example, continued
------------------------

Now, let us imagine a slightly different scenario, where we had these
matchers:

    A: [(1,_)-"block1", (3,7)-"block3", (_,_)-"block4", (5,_)-"block5"]

(Same as above for case A, but with an extra argument.) This means, by the
rule above, we need to copy the default match for each concrete match. Hence,
we generate the following grouping for case A:

    1: [(_)-"block1", (_)-"block4"]

    3: [(7)-"block3", (_)-"block4"]

    _: [(_)-"block4"]

    5: [(_)-"block5", (_)-"block4"]

This is inefficient, but perfectly correct. Operationally, it reads: "If you
match 1, match all and go to block1, and if that fails, match all and go to
block4." (Obviously, we will pick block1). Then, "If you match 3, match 7 and
go to block3, or match all and go to block4". Then it says "Match all and go
to block4." Finally, "If you match 5, match all and go to block5, and if that
fails, match all and go to block4." There are two forms of redundancy here.

First, once a default case occurs, all subsequent cases can be removed
(preferably with a warning):

    1: [(_)-"block1", (_)-"block4"]

    3: [(7)-"block3", (_)-"block4"]

    _: [(_)-"block4"]

Second, once a matcher which *cannot fail* is seen, all subsequent matchers
can be removed:

    1: [(_)-"block1"]

    3: [(7)-"block3", (_)-"block4"]

    _: [(_)-"block4"]

This still perfectly preserves the semantics of the above matcher list --
there is no possibility of going to block5, or block4 from a match of 1.

However, we would like to generate a warning in this last case as well, which
would be spurious if we duplicated any defaults then eliminated them again. So
we need to avoid duplicating in the first place.

Therefore: When duplicating a default value, first test to see if any of the
matchers before it will always succeed, and if so, do not duplicate.

Now, we can be even more judicious in preventing duplication. First, don't
just assume a matcher set cannot fail if it has a default case -- also check
for exhaustive ctor coverage. Secondly, it's not necessary for the matchers to
be unable to fail, it is merely sufficient for them to cover the duplicated
patterns. So: When duplicating a default value, first test to see if it is
covered by all of the matchers before it, and if so, do not duplicate.

Also: Do not think of it as "duplicating" the default values. Simpler to think
of it as "projecting" away all values which don't match. So, include all exact
matches and also default values. Then, eliminate all later matches which are
covered by earlier matches, giving a warning iff the eliminated matches are
exact (and therefore are useless). If they are default matches, then they
aren't useless unless covered by all constructors on the outside -- they
should have already been checked by the vertical rule.

An example of this:

    [(1,A)-"block1", (1,B)-"block2", (2,A)-"block3", (_,B)-"block4", (_,_)-"block5"]

Naively transformed:

    1: [(A)-"block1", (B)-"block2", (B)-"block4", (_)-"block5"]

    2: [(A)-"block3", (B)-"block4", (_)-"block5"]

    _: [(B)-"block4", (_)-"block5"]

However, we didn't need to duplicate the (_,B) case to case 1, as it is
covered by the (1,B) case. So:

    1: [(A)-"block1", (B)-"block2", (_)-"block5"]

    2: [(A)-"block3", (B)-"block4", (_)-"block5"]

    _: [(B)-"block4", (_)-"block5"]

If A and B are the only ctors for the switch type, then it would not be
necessary to copy the (_,_) case either if 1 or 2 is matched, as it is covered
in both cases:

    1: [(A)-"block1", (B)-"block2"]

    2: [(A)-"block3", (B)-"block4"]

    _: [(B)-"block4", (_)-"block5"]

Tricky variable binding example
-------------------------------

Consider this code::

 switch x:
     case Pair(49, Box(3))
     case Pair(m, Box(n))

This example is important because in this case, the binding for `n` depends on
the path taken to the second block. This explains the need for Phi
instructions when making variable bindings. In this example, the second case's
`n` variable needs to be copied from a temporary unpack variable -- which
variable will depend on whether the first number was 49.

Args: [x]

Binds: []

.. highlight:: none

Matchers::

 Pair, 49, Box, 3 - block1
 Pair, m, Box, n - block2

Assoc::

 Pair: [49, Box, 3]-block1, [m, Box, n]-block2

Codegen::

 entry:
     a = x.Pair(0)
     b = x.Pair(1)
     <continue>

----

Args: [a,b]

Binds: []

Matchers::

 49, Box, 3 - block1
 m, Box, n - block2

Bind m=a in block2's binding table.

::

 49, Box, 3 - block1
 _, Box, n - block2

Assoc list::

 49: [Box, 3]-block1, [Box, n]-block2
 _: [Box, n]-block2

Codegen::

     <continue entry>

     switch a:
         case 49: goto case49
         default: goto default
 case49:
     <continue>
 default:
     <continue>

----

**(case 49)**

Args: [b]

Binds: [(block2-m)=a]

Matchers::

 Box, 3 - block1
 Box, n - block2

Assoc list::

 Box: [3]-block1, [n]-block2

Codegen::

     <continue case49>
     c = b.Box(0)
     <continue>

----

**(case 49, Box)**

Args: [c]

Binds: [(block2-m)=a]

Matchers::

 3 - block1
 n - block2

Bind n=c in block2's binding table.

.. note::

          A phi is required, as the binding is different to that below.

::

 3 - block1
 _ - block2

Codegen::

     <continue case49>
     switch c:
         case 3: goto block1
         default: goto block2

.. note::

          This isn't quite the base case. We could recurse once more,
          generating two new blocks, and then each block would consist
          entirely of a goto. However, this pre-base-case allows more
          efficient code generation.

Write to phi table: (block2-m)=(case49-a)

Write to phi table: (block2-n)=(case49-c)

----

**(case default)**

Args: [b]

Binds: [(block2-m)=a]

Matchers::

 Box, n - block2

Assoc list::

 Box: [n]-block2

Codegen::

     <continue default>
     d = b.Box(0)
     <continue>

----

**(case default, Box)**

Args: [d]

Binds: [(block2-m)=a]

Matchers::

 n - block2

Bind n=d in block2's binding table.

.. note::

          A phi is required, as the binding is different to that above.

::

 _ - block2

Assoc list::

 _: []-block2

Codegen::

     <continue default>
     goto block2

Write to phi table: (block2-m)=(default-a)

Write to phi table: (block2-n)=(default-d)

----

**(Final generated code)**

Final phi table::

 (block2-m)=[(case49-a), (default-a)]
 (block2-n)=[(case49-c), (default-d)]

.. highlight:: mars

::

 entry:
     a = x.Pair(0)
     b = x.Pair(1)
     switch a:
         case 49: goto case49
         default: goto default

 case49:
     c = b.Box(0)
     switch c:
         case 3: goto block1
         default: goto block2

 default:
     d = b.Box(0)
     goto block2

 block1:
     ...

 block2:
     n = phi(case49 = c, default = d)
     m = phi(case49 = a, default = a)
     ...

.. note::

          The phi node for m in block2 may be converted into a mov
          instruction, because the binding is the same for all predecessors.