Chapter 10. Datatype Refinement

Table of Contents
Dependent Datatypes
Example: Function Templates on Lists (Redux)
Example: Mergesort on Lists (Redux)
Sequentiality of Pattern Matching
Example: Functional Red-Black Trees

The datatype mechanism in ATS is adopted from ML directly, and it is really a signatory feature in functional programming. However, the datatypes we have seen so far are not very precise when employed to classify values. For instance, given a type T, the type list0(T) is for values representing both empty and non-empty lists consisting of elements of the type T. Therefore, empty and non-empty lists cannot be distinguished at the level of types. This limitation severely diminishes the effectiveness of datatypes of ML-style in capturing program invariants. In ATS, datatypes of ML-style can be refined into dependent datatypes of DML-style, where DML refers to the programming language Dependent ML, the immediate precursor of ATS. With such refinement, datatypes can classify values with greatly enhanced precision.

The code employed for illustration in this chapter plus some additional code for testing is available on-line.

Dependent Datatypes

The syntax for declaring dependent datatypes is mostly similar to the syntax for declaring non-dependent datatypes: For instance, the dependent datatype list in ATS is declared as follows:

datatype list(t@ype+, int) = | {a:t@ype} list_nil(a, 0) of () // [of ()] is optional | {a:t@ype}{n:nat} list_cons(a, n+1) of (a, list(a, n))

More precisely, list is declared as a type constructor of the sort (t@ype, int) -> type, which means that list takes an unboxed type and a static integer to form a boxed type. The keyword t@ype+ indicates that list is covariant at its first parameter (of the sort t@ype), that is, list(T1, I) is considered a subtype of list(T2, I) if T1 is a subtype of T2. There is also the keyword t@ype- for indicating the declared type constructor being contravariant at a parameter, but it is rarely used in practice. Also, keywords like type+ and type- are interpreted similarly.

There two data (or value) constructors list_nil and list_cons associated with list, which are declared to be of the following types:

list_nil : {a:t@ype} () -> list(a, 0)
list_cons : {a:t@ype}{n:nat} (a, list(a, n)) -> list(a, n+1)

Given a type T and a static integer I, the type list(T, I) is for values representing lists of length I in which each element is of the type T. Hence, the types of list_nil and list_cons mean that list_nil forms a list of length 0 and list_cons forms a list of length n+1 if applied to an element and a list of length n. Note that it is also possible to declare list as follows in a more concise style:

datatype list(a:t@ype+, int) = | list_nil(a, 0) of () // [of ()] is optional | {n:nat} list_cons(a, n+1) of (a, list(a, n))

The use of a:t@ype+ (instead of t@ype+) introduces implicitly a universal quantifier over a for the type assigned to each data constructor associated with the declared type constructor list.

As an example of programming with dependent datatypes, the following code implements a function template for computing the length of a given list:

fun{ a:t@ype } list_length {n:nat} .<n>. (xs: list(a, n)): int(n) = case+ xs of | list_nil() => 0 | list_cons(_, xs1) => 1 + list_length<a>(xs1) // end of [list_length]

Note that .<n>. is is a termination metric. The type assigned to the function list_length indicates that the function takes a list of length n for any natural number n and returns an integer of value n. In addition, the function is verified to be terminating. Therefore, list_length is guaranteed to implement the function that computes the length of a given list. I now briefly explain how typechecking can be performed on the definition of list_length. Let us first see that the the following clause typechecks:

  | list_cons(_, xs1) => 1 + list_length(xs1)

What we need to verify is that the expression on the righthand side of the symbol => can be assigned the type int(n) under the assumption that xs matches the pattern on the lefthand side of the symbol =>. Let us assume that xs1 is of the type list(a, n1) for some natural number n1, and this assumption implies that the pattern list_cons(_, xs1) is of the type list(a, n1+1). Clearly, matching xs against the pattern list_cons(_, xs1) generates a condition n=n1+1. It is also clear that list_length(xs1) is of the type int(n1) and thus 1 + list_length(xs1) is of the type int(1+n1). As the condition n=n1+1 implies n=1+n1, 1 + list_length(xs1) can be given the type int(n). So this clause typechecks. Note that matching xs against the pattern list_nil() generates the assumption n=0, which implies that 0 is of the type int(n). Therefore, the following clause typechecks:

  | list_nil((*void*)) => 0

Given that the two clauses typecheck properly, the case-expression containing them can be assigned the type int(n). Therefore, the definition of list_length typechecks.

As the recursive call in the body of the above defined function list_length is not a tail-call, the function may not be able to handle a long list (e.g., one that contains 1 million elements). The following code gives another implementation for computing the length of a given list:

fun {a:t@ype} list_length{n:nat} .<>. (xs: list(a, n)): int(n) = let // // loop: {i,j:nat} (list(a, i), int(j)) -> int(i+j) // fun loop{i,j:nat} .<i>. // .<i>. is a termination metric (xs: list(a, i), j: int(j)): int(i+j) = case+ xs of | list_cons(_, xs1) => loop(xs1, j+1) | list_nil() => j // end of [loop] // in loop(xs, 0) end // end of [list_length]

This time, list_length is based on a tail-recursive function loop and thus can handle lists of any length in constant stack space. Note that the type assigned to loop indicates that loop takes a list of length i and an integer of value j for some natural numbers i and j and returns an integer of value i+j. Also, loop is verified to be terminating.

There is also a dependent datatype option in ATS for forming optional values:

datatype option(a:t@ype+, bool) = | Some(a, true) of a | None(a, false) of () // end of [option]

As an example, the following function template list_last tries to find the last element in a given list:

fun{ a:t@ype } list_last {n:nat} .<>. ( xs: list(a, n) ) : option(a, n > 0) = let // fun loop {n:pos} .<n>. ( xs: list(a, n) ) : a = let val+ list_cons(_, xs1) = xs in case+ xs1 of | list_nil() => let val+list_cons(x, _) = xs in x end // end of [list_nil] | list_cons(_, _) => loop(xs1) end // end of [loop] // in case+ xs of | list_nil() => None() | list_cons _ => Some(loop(xs)) end // end of [list_last]

The inner function loop is evidently tail-recursive and it is verified to be terminating.

After a programmer becomes familar with list and option, there is little incentive for him or her to use list0 and option0 anymore. Internally, values of list and list0 have exactly the same representation and there are cast functions of zero run-time cost in ATS for translating between values of list and list0. The same applies to values of option and option0 as well.