Liskell

A 15 minute tour of Liskell

The following section gives a short tour of the Liskell syntax forms. You will learn how to define functions, define algebraic data types (ADTs), and use pattern matching do destructure ADTs.

Canonical hello world

(defmodule FactHello (main) ())

(define (fact n))
  (if (== n 0)
      1
      (* n (fact (- n 1)))))

(define main
  (print (fact 12)))

First, let's start with improving our intuition of the syntax forms. In this factorial Hello World, we see three top-level declarations:

  • (defmodule ..)
  • (define (fact n) ..)
  • (define main ..)

It also contains two expressions that are not part of any other expression:

  • (if ..)
  • (print (fact 12))

The latter (print ..) contains only one sub-expression (fact 12)which in turn contains the sub-expressions fact and 12

The Parse Tree

The Liskell parser can produce six kinds of parse tree elements. A parse tree element might have different meanings in different contexts. For instance, the uppercase symbol Shape might be a data constructor, the name of a type or the name of a module. Its meaning is determined by the surrounding context. The most important contexts are expressions, patterns, types and top-level declarations. But before we talk about them, we have a look of what parse tree elements their syntax forms might be built of:

  • Symbols are the workhorse of Liskell. A string of characters (without double quotes) is parsed as a symbol. Examples: define, defmodule, FactHello, fact, if, print, n, ==, -, *, main
  • Strings of characters enclosed into double quotes are interpreted as the parse tree element string. Examples: foobar
  • Integers and RationalsNothing unusually here. Integers are written as 1, 2, 3, .. while rationals are written as 2.7301, 3.1291..
  • Lists might contain an arbitrary number of other parser tree elements enclosed into parenthesis. Examples are: (this list contains five symbols), ("this list contains" "two strings"), (#a "mixed" list), (a list with (a nested sub-list))

Expressions

Every parse tree element from the list above can be part of an expression. Everything except symbols and lists are self-evaluating. ''self-evaluating'' means an integer parse tree element evaluates to its integer value, a string parse tree element evaluates to a list of characters representing that string and so on. Verify that we are using three integers in the example, all returning themselves when evaluated. This is different — for example — for n, the argument to the function fact. n is a symbol, and all symbols evaluate to the value bound by the variable in the lexical scope. Within the functions body of fact, n is bound to the functions first argument. The symbol fact -- interpreted as variable -- returns the function object for fact which we readily use in a function application to the expression (- n 1).

Lists in expressions represent function application. The following is a list of symbols (- n 1) and when evaluated as expression it is read as a function application of the subtraction function bound to the - symbol to the expressions n and 1.

In a few special cases, lists do not represent function application. All these special list forms are differentiated from function applications by having a special symbol as its first element. The conditional statement (if ..) is an example for that. It is a list and therefore it would be a function application to a function called if. But it is not because if is in the list of specials for expressions. In this particular case, it clearly is a language construct for conditionals. Notice that if is not colored blue in the parse tree, as the symbol if is not evaluated as an expression. Other special are lambda, case, let and a few more and they can be seen as language primitives for expressions. The Liskell paper specifies them and their function.

Top-level forms

Every well formed Liskell file starts with a defmodule top level declaration. It declares the modules name, its exported symbols and its imported symbol. Our example

(defmodule FactHello (main) ())
declares a module named FactHello that exports its main symbol and imports nothing. Instead of the export list (main), we could use the wildcard symbol _. In this particular position, the wildcard is interpreted as ''export-everything''.

The other two declarations

  • (define (fact n)
      ...)
  • (define main
      ...)
are top-level binders. Both of them are written in the E-Binder syntax. In general, this looks like Scheme binding syntax, with the addition that an E-binder also able to do destructuring.

At the top-level, only lists with a special list head are allowed, such as define, defmodule, defdata, defclass, .... The Liskell paper lists all of them.

Algebraic Data Types

As example for another top-level declaration, we define a new algebraic data type. Theoretically speaking, algebraic data types are sum types of product types.

If you are coming from Lisp, the data type ''list'' is a good example. To delay the discussion of type constructors, we are focusing on a specific type of list, namely the list of integers, named IList. INil should stand for an empty integer list, and ICons for a cons cell in an integer list.

(defdata IList
  (INil)
  (ICons Integer IList))
This top-level declaration defines the IList type and its two data constructors. Data constructors can be used within expressions to construct objects of the associated type. To actually construct an IList, we can either use INil without any arguments, or we can call ICons with two arguments of the types Integer and IList, as for instance in the expression:
(ICons 1 (ICons 2 INil))

To simplify things a bit, we define another type but this time, with only one constructor. It represents a point in three dimensional space.

(defdata Point3DType
  (Point3D Rational Rational Rational))
Once an ADT is created, it is an opaque object. It is not like a C struct, where one is free to inspect and modify its values from the outside. To access the components used when constructing the data object, we must first disassemble it via pattern matching. Pattern matching can be seen as the reverse process to data constructors. But before we explain pattern matching, we have to explain patterns.

Patterns

Before we talk about patterns, let us talk about matching or unification in general. A unification algorithm answer the questions, whether two expressions can be made equal by substituting specific parts of these expressions. The parts that can be substituted are meta-variables and in the following examples they are typeset in italic:

Expression 1 Expression2 Substitution set Comment
"Test1""Test1"EmptySuccess, no unification necessary as the expression are equal already
"Test1""Test2"FailureUnification fails, expression can not be made equal
(a "Test2")("Test1" "Test2")a = "Test1"Success
(a b)("Test1" "Test2")a = "Test1",
b = "Test2"
Success
(a "Test2")("Test1" b)a = "Test1",
b = "Test2"
Success
Matching is a restricted version of unification, where only one expression is allowed to contain meta-variables. Up until example 4 only Expression 1 contains meta-variables. The last example is only valid for unification.

Pattern matching in Liskell is the process of matching a pattern containing meta-variables against a given object. Patterns matching is even more restricted than matching, as you are only allowed to use what constitutes the pattern syntax. Before we describe the valid forms of pattern syntax, we again give an example to get a brief idea.

Assume you have an object that is constructed by the Liskell expression (Point3D 1 2 3) and you match it against the pattern (Point3D x y z), then it is natural to get x=1, y=2, z=3 as a result of the matching process. In Liskell patterns, all lowercase symbols are automatically interpreted as meta-variables -- parts available for substitution. The result of the a matching process is a substitution set, as in the third column in the example table. But Liskell takes a more convenient approach instead, and turns the meta-variables into real variables, with their content being their respective matching part. So after the matching of the given example, the lexical environment is extended by the variables x, y, z bound to 1, 2, 3 respectively. As before matching can also fail.

Function arguments are patterns. You see that by looking at the coloring of the arguments of functions so far. The function (fact n) contained the pattern n, which is pretty unexciting as it neither does any destructuring, nor can fail to match, because it is a primitive variable that can be substituted for every arbitrary object. To provide a more complicated example assume we use the (Point3D a b c) pattern in the argument list. The object in the function application supplied in the position of that argument, is then matched against this pattern, and when matched successfully, the variables a, b, c are bound to the matched parts.

Let us write a useful function that takes a Point3D object representing a point in three dimension space and returns the distance from the origin.

(define (distance3D (Point3D x y z))
  (sqrt (+ (^ x 2)
           (+ (^ y 2) (^ z 2))))
)
An example invocation of this function could look like:
(distance3D (Point3D 10 20 10))
The symbol Point3D refers to a constructor in this case and is part of a function application.

After all of these rash examples, we step back and examine the parse tree elements with respect to their meaning in the context of a pattern:

  • Lowercase symbols are interpreted as meta-variables. Meta-variables can not fail to match. They are turned into variables that are bound to the respective matching counterpart. The only exception to this is the wildcard symbol '_' which will be treated as meta-variable but won't create any lexically scoped binding for matches.
  • Strings, Characters, Integers, Rationals are literal patterns, that might fail to match.
  • Lists are used for destructuring algebraic data types (ADT). ADTs are constructed using an expression with one constructor of the ADT. A constructor might include zero or more arguments. In the example above Point3D was a constructor carrying 3 arguments, namely the positions x, y, z. Destructuring is the complementary process to construction, turning an opaque object into its parts. The distance3D contained a prefix constructor pattern for Point3D with 3 sub-patterns x, y, z. which are all meta-variables.

Function and Variable Bindings

Binders are a minor syntax form that is used in top-level binders and in the lexically scoped binder let. Liskell knows two syntax forms for binders: E-binders, and ET/TE-funbinders. The latter is a subclass of E-binders, namely when the binding is a function binder. For function binders, type annotation is possible. But except for this remark we won't treat ET/TE-funbinders here.

An E-binder consists of two elements: the binding, describing the object(s) to be bound and an expression, evaluated to obtain the values for the object(s) to be bound. A simple example of that is the top-level binding of a constant.

(define main (print (fact 12)))

The binding part of this E-binder is main, a symbol, the expression main is bound to is (print (fact 12)). This is the most simple case, binding a primitive variable.

To bind functions, we could use another special we have not used so far, namely lambda. It is also a special in the syntax of expressions. The left side below demonstrates the way of introducing a binding for a function with the help of lambda while the right one is a function binder.

(define fact
  (lambda (n)
    (if (== n 0)
        1
        (* n (fact (- n 1)))))
(define (fact n)
  (if (== n 0)
      1
      (* n (fact (- n 1))))
Both sides will return an identical function. Notice that instead of fact the right side uses a list pairing the function's name with its arguments to create function binder.

Before we talk about pattern binders, we introduce a special form for expressions, namely let, that allows us to create lexically scoped bindings for the context of a single expression. As maintained, the E-binder contains two elements: a binding and an expression. For let these parts are explicitly wrapped in a two-element list: (binding expr). To be able to introduce more bindings at once, let takes a list of E-binders: ((binding1 expr1) (binding2 expr2) ... (bindingn exprn)). Here is a concrete example for illustration:

(let ((one 1)
      (two 2))
  (* (+ two two) (+ one one)))
We are also able to introduce lexically scoped functions. We just use the binding form for functions. Let us define the helper function (twice n) as (+ n n).
(let ((one 1)
      (two 2)
      ((twice n) (+ n n)))
  (* (twice two) (twice one)))

Pattern bindings

We can extend the E-Binder syntax to be able to do pattern matching. The reason is that a single symbol has the same meaning as a pattern as well as a function binder, namely they bind a primitive variable. For lists, we can syntactically tell apart function binders from pattern binders. Function binders are not allowed to contain an uppercase symbol as the first element of a list, because functions must have lower case names. Pattern binders are not allowed to contain lowercase names, because the must start with an uppercase constructor. So, we can easily separate these two cases, and for the case where the overlap, their implications are identical.

Let's define a small helper function that implements scalar multiplication of that point with a given factor s

(define (scalar-multiply s point)
  (let (((Point3D x y z) point)
        ((p x) (* s x)))
    (Point3D (p x) (p y) (p z))))
In this example let contains two bindings, both written as list, but the first is a pattern binder and the second is a function of one argument. The difference comes from the case of its first symbol.

Recommended Readings

The Liskell paper contains more information about Liskell. You might also want to inspect the Haskell 98 Language report, providing the basis for many of Liskell's semantics.

Meta-programming

To be written. It will most likely feature the implementation of backquoting, defmacro and Prolog. Please check in a few weeks. In the meantime, please enjoy this little preview: Flash Video, 99mb, mplayer likes it: direct, torrent (preferred)