Expressions

Expressions are a description of the computation of a value, given some environment. Expressions appear inside source code, and are evaluated at runtime. The environment in which an expression is evaluated is the environment of its surrounding scope, which is computed at runtime.

This section outlines the syntax and semantics of expressions in Mars.

Atoms

Atoms are the most basic expressions. They are comprised of single tokens, and do not recurse.

atom ::=  num_literal_expr | variable | constructor

Number literals

Number literal expressions are usually just formed from number literal tokens. However, a character literal token also forms a number literal expression:

num_literal_expr ::=  num_literal | char_literal

Because Mars has no “character” data type, a char_literal is simply interpreted as a number, with the byte value of the character. Thus, 'a' is equivalent to 97, and '\x8c' to 140.

The type of a number literal expression is always Num. It evaluates to the constant value of the number literal (rounded towards the nearest representable number).

Variable references

An identifier refers to a variable or constructor.

variable    ::=  lower_identifier
constructor ::=  upper_identifier

Statically, the variable must exist in the expression’s scope. If the identifier matches the name of a local variable or argument of the current procedure, then it refers to that variable, regardless of whether there is a global variable of the same name (hence, local variables shadow globals), and the expression has the type of that variable.

If the identifier matches the name of a global variable or constructor, then it refers to that variable, and the expression has the type of that global.

If the identifier does not match a local or global variable, it is a compiler error, and the program MUST be rejected.

The dynamic semantics of a variable reference expression are to evaluate to the current value of the variable, at the point at which the expression is evaluated. Since Mars is a strict language, that point is the exact point in the code at which the expression appears, so even though local variables may change value, the evaluation of such expressions is quite deterministic.

Note

A constructor reference is just a special type of variable reference. Constructors are variables too, only syntactically distinguished by having an uppercase letter. They are semantically equivalent.

Complex expressions

We now present the grammar and semantics for all of the complex (non-atom) expressions in Mars, in order of precedence from highest to lowest. We begin with “primary expressions” – expressions that never need to be surrounded in parentheses because they have such high precedence. Parenthesised expressions are primary because they may be freely placed wherever an expression is expected. Note that we do not yet give a production for expression: as that rule encompasses all possible expressions, it is given last (as the production with the lowest precedence).

primary         ::=  atom
                     | array_literal_expr
                     | apply_expr
                     | field_reference
                     | "(" expression ")"
expression_list ::=  expression ("," expression)*

Array literals

An array literal expression allows an array value to be created and initialised with a single expression. The value of an array literal does not need to be constant, as it may contain nested expressions which are evaluated at runtime.

Just as a character literal forms a numeric expression, so does a string literal form an array of numbers. Hence, a string literal is just special syntax for an array literal.

array_literal_expr ::=  array_literal | string_literal
array_literal      ::=  "[" [expression_list] "]"

An array literal consists of a list of zero or more sub-expressions. The sub-expressions may be of any type, but the types of all sub-expressions must unify, or the compiler MUST reject the program (with a type error). The type of the array literal expression is Array(a), where a is the unified type of all of the sub-expressions.

The dynamic semantics of an array literal expression are to construct an array value with as many elements as there are sub-expressions. The value of each element is the evaluation of its corresponding sub-expression.

A string literal consists of zero or more byte values. The type of a string literal expression is Array(Num). The evaluation of a string literal is an array value, with one Num element per character, with each element evaluating to the corresponding character’s byte value. Thus, "hello" is equivalent to ['h', 'e', 'l', 'l', 'o'], which is equivalent to [104, 101, 108, 108, 111].

Function application

Function application expressions allow arguments to be passed to functions, and functions to be evaluated:

apply_expr ::=  primary "(" [expression_list] ["," "..."] ")"

Primarily, function application has the form f(v1, v2, ..., vn). This is known as full function application (due to the lack of a trailing “...” token). f must have type (t1, t2, ..., tn) -> [io] t, or the expression is a type error, and the compiler MUST reject the program. Each value vi must have type ti, where 1 <= i <= n. This expression has type t.

The evaluation of this expression first causes f and each vi to be evaluated. This expression evaluates to the result of the full application of function f to all actual parameters v1 ... vn (see procedure evaluation).

If f has type (t1, t2, ..., tn) -> io t (that is, f is an I/O function), an additional static constraint applies: if the function application expression is not in an I/O context, the compiler MUST reject the program. In other words, it is illegal to call an I/O function from a non-I/O function. If an expression performs multiple I/O effects, it may have unspecified behaviour (see Evaluation order).

An extended form f(v1, v2, ..., vm, "...") is known as partial function application (it has a trailing “...” token). f must have type (t1, t2, ..., tn) -> t, where m <= n, or the expression is a type error, and the compiler MUST reject the program. Each value vi must have type ti, where 1 <= i <= m. This expression has type (tm+1, tm+2, ..., tn) -> t.

The evaluation of this expression first causes f and each vi to be evaluated. This expression evaluates to the result of the partial application of function f to all actual parameters v1 ... vm (see procedure evaluation).

Note

Partial function application is not subject to the I/O context constraint: it is legal to partially-apply an io function from a non-I/O context, as long as the resulting function is not applied within a non-I/O context.

Field references

A field reference retrieves the value of a named field of a user-defined type:

field_reference ::=  primary "." lower_identifier

Recall that user-defined types consist of one or more constructors, each with zero or more parameters, each with an optional name. The same name may appear in multiple constructors, but must have the same type in each.

For a field reference of the form obj.field, obj MUST have a user-defined type, and MUST have at least one constructor with a parameter named field, or the compiler must reject the program. The type of the field reference is the type of the parameter named field in any constructor of the type of obj.

The evaluation of the expression causes obj to be evaluated first. If the constructor used to construct obj has a parameter named field, then the value of the field reference is the value of the parameter field. If the constructor used to construct obj does not have a parameter named field, then a runtime error occurs, terminating the program.

For example, the List type in the prelude has two constructors: Cons with parameters head and tail, and Nil, with no parameters. Hence:

?> x = Cons(1, Nil)
?> x.head
1
?> x.tail
Nil
?> Nil.head
Runtime Error: List instance has no field 'head'

Due to the possibility of runtime errors, it is recommended that field references be used only on types with a single constructor, or fields which appear in all constructors of a type. For other types, the more powerful switch statement should be used. The switch statement can also be used to access fields without names.

Unary negation operator

The unary negation operator negates its numeric argument; it is equivalent to neg (but note that use of the - operator does not require importing the prelude).

unary_expr ::=  primary
                | "-" unary_expr

This is the tightest-binding arithmetic operator, so a negated complex expression must be surrounded by parentheses (-x+y negates only x; -(x+y) negates the sum of x and y).

Binary arithmetic operators

The binary arithmetic operators perform the usual mathematical operations, with the usual precedence and associativity.

multiplicative_expr ::=  unary_expr
                         | multiplicative_expr "*" unary_expr
                         | multiplicative_expr "/" unary_expr
                         | multiplicative_expr "//" unary_expr
                         | multiplicative_expr "%" unary_expr
additive_expr       ::=  multiplicative_expr
                         | additive_expr "+" multiplicative_expr
                         | additive_expr "-" multiplicative_expr

The *, /, //, %, + and - operators are equivalent to mul, fdiv, div, mod, add and sub, respectively. The *, /, // and % operators all have the same precedence, and bind more tightly than the + and - operators. The + and - operators have the same precedence as each other.

Note that there are two division operators: /, which may return a fractional result, and //, which always returns an integer (rounding towards negative infinity). The // operator should be used when dealing with integer arithmetic.

All of the operators are left-associative (so, for example, 2 - 3 - 4 is equivalent to (2 - 3) - 4).

Comparison operators

The comparison operators compare two operands and return a truth value (0 for false and 1 for true). They bind less tightly than the arithmetic operators.

relational_expr ::=  additive_expr
                     | additive_expr "<" additive_expr
                     | additive_expr "<=" additive_expr
                     | additive_expr ">" additive_expr
                     | additive_expr ">=" additive_expr
equality_expr   ::=  relational_expr
                     | relational_expr "==" relational_expr
                     | relational_expr "!=" relational_expr

The relational operators (<, <=, > and >=) operate on numeric arguments and are equivalent to lt, le, gt and ge, respectively. They bind more tightly than the equality operators. The equality operators (== and !=) operate on values of any type and are equivalent to eq and ne, respectively.

Note

The logic of == and != for non-numeric values is complicated; please see the documentation for eq.

Note

All of these operators are non-associative; it is an error to mix relational operators or equality operators within the same expression level, although it is permitted to mix relational operators with equality operators.

For example, the expression x < y < z is invalid, as is x == y == z. If you really want to say “the truth value of x < y is less than z,” use parentheses: (x < y) < z. However, it is permitted to mix equality and relational operators: a < b == x > y means “the truth value of a < b is the same as the truth value of x > y.”

Field replace expressions

A field replace expression is used to non-destructively modify a named field of a user-defined type:

expression ::=  equality_expr
                | primary "." lower_identifier ":=" expression

Note

The := operator is right-associative, and has lower precedence than field references, function application or binary operators. Hence, x.foo := y.bar := z is equivalent to x.foo := (y.bar := z). Updating multiple fields simultaneously therefore requires parentheses, for example, (x.foo := y).bar := z.

A field replace expression of the form obj.field := e produces a new value which is the same as obj but with field field replaced with the value of e. The original obj is not modified. The same compile-time and runtime rules as field reference expressions apply. The expression e MUST have the same type as the type of the parameter named field in any constructor of the type of obj.

The type of the field replace expression is the type of obj. The evaluation of the expression causes obj to be evaluated first. If the constructor used to construct obj has a parameter named field, then the value of the field replace is a value with the same constructor as obj, and all parameters the same as obj, except the parameter named field having the value of e. If the constructor used to construct obj does not have a parameter named field, then a runtime error occurs, terminating the program.

Note that obj is evaluated before e. If obj does not have an appropriate constructor, it is unspecified whether e is evaluated or not.

For example:

?> x = Cons(1, Nil)
?> x.head := 2
Cons(2, Nil)
?> x.tail := Cons(2, Nil)
Cons(1, Cons(2, Nil))
?> x
Cons(1, Nil)
?> Nil.head := 1
Runtime Error: List instance has no field 'head'

As with any other expression, field replace expressions do not update the variable obj. Therefore, it is common to perform an “update” as follows:

?> x = x.head := 2
?> x
Cons(2, Nil)

Note

Unlike conventional languages, this will not modify any aliases of x (it is literally an assignment of a new value to the variable named x). This can help avoid confusing bugs, and make your software more reliable overall (indeed, it is the reason behind Mars being a declarative language), but it can be confusing if you are used to conventional languages.

See also: Field update statements.

Precedence table

The precedence and associativity of all of the Mars expressions is summarised in the following table. Higher-numbered precedence levels indicate tighter binding.

Precedence Type Operator(s) Associativity
6 Primary primary expressions N/A
5 Unary unary - Right
4 Multiplicative *, /, //, % Left
3 Additive +, - Left
2 Relational <, <=, >, >= None
1 Equality ==, != None
0 Field replace := Right

Evaluation order

Mars imposes a strict evaluation order between statements: for a statement sequence s1; s2, all computation involved in s1 will be completed before the execution of s2 begins. However, within an expression, the evaluation order of sub-expressions is unspecified.

This is usually only relevant in an I/O context. For example, the following I/O code fragment has unspecified behaviour:

put_char('x') + put_char('y')       # Unspecified evaluation order

However, note that Mars has several “hidden” features which may perform side-effects in pure code (such as field update statements, impure functions and debug functions) – all of these features operate outside of the Mars semantics and should not be used in normal code. Such features can be affected by the unspecified evaluation order.

Pure code can also have unspecified termination behaviour. Consider the following pure function:

def weird() :: Num:
    return error("A") + error("B") + weird()    # Unspecified termination

Here it is unspecified whether the program will a) print the error message “A”, b) print the error message “B”, or c) loop forever or until the program runs out of stack space. However, in any case, the program will not continue executing (in the semantics, all of these cases have the denotation ⊥, pronounced “bottom”), so they can be considered equivalent.