Tutorial on ATS/Anairiats

ATS is a rich language with a highly expressive type system, and the syntax of ATS can be a daunting obstacle for beginners trying to understand ATS code. In order to alleviate this problem, we may employ colors to differentiate various syntatical components in a program written in ATS. The convention for coloring ATS syntax is summarized as follows:

The keywords in ATS are all colored black (and possibly in bold face).
The comments in ATS are all colored gray.
The code in the statics of ATS is colored blue.
The code in the dynamics of ATS is colored red unless it represents proofs, for which the dark green color is used.
The external code (in C) is colored deep brown.

Some Basics on ATS

ATS consists of a static component (statics), where types are formed and reasoned about, and a dynamic component (dynamics), where programs are constructed and evaluated.

Some Primitive Sorts and Constants

The statics of ATS is a simply typed language. The types for terms in the statics are called sorts (so as to avoid potential confusion with the types for terms in the dynamics) and the terms in it are called static terms. We use sigma (or $srt) for sorts and s (or $strm) for static terms. The primitive sorts in ATS include bool, char, int, prop, type, view and viewtype.

There are also some primitive constants c in the statics, each of which is assigned a constant sort (or c-sort, for short) of the following form:

(sigma_1,...,sigma_n) => sigma

Intuitively, given static terms s_1,...,s_n of sorts sigma_1,...,sigma_n, respectively, c(s_1,...,s_n) is a static term of sort sigma if c is assigned the c-sort (sigma_1,...,sigma_n) => sigma. For instance, each boolean value is given the sort () => bool and each integer is given the c-sort () => int. In the following table, we list some of the commonly used constants in the statics of ATS:


~	:	(int) => int
+	:	(int, int) => int
-	:	(int, int) => int
*	:	(int, int) => int
/	:	(int, int) => int
<	:	(int, int) => bool
<=	:	(int, int) => bool
>	:	(int, int) => bool
>=	:	(int, int) => bool
==	:	(int, int) => bool
<>	:	(int, int) => bool
~	:	(bool) => bool
&&	:	(bool, bool) => bool
\|\|	:	(bool, bool) => bool
bool	:	() => type
bool	:	(bool) => type
int	:	() => type
int	:	(int) => type

Note that many symbols are overloaded in ATS. For instance, ~ is for negation on integers as well as on booleans; bool and int are both sorts and static constants. In ATS, a type refers to a static term s of sort type. For instance, bool and bool(true) are types, and int and int(2+3) are types, too.

Subset Sorts

A subset sort is essentially a sort constrained by a predicate. For instance, we can define a subset sort nat as follows:

sortdef nat = {a:int | a >= 0}

It is important to note that a subset sort is not regarded as a (regular) sort. The sole purpose of introducing subset sorts is to provide a form of syntactic sugar (to be used together with quantifiers), which is to be explained shortly. Following are some commonly used subset sorts:

sortdef two = {a:nat | a < 2}
sortdef sgn = {a:int | -1 <= a && a <= 1}
sortdef sgn = {a:int | a == -1 || a == 0 || a == 1} // another definition
sortdef pos = {a:int | a > 0}
sortdef neg = {a:int | a < 0}

where && and || stand for conjunction and disjunction, respectively. Note that we may also use semicolon ; for conjunction. For instance, the subset sort sgn can be defined as follows:

sortdef sgn = {a:int | -1 <= a; a <= 1}

Some Primitive Types and Values

We use the name dynamic term for a term in the dynamics of ATS and a value is a dynamic term in a special form (which is to be made clear later). We have primitive types bool, char, int and string for booleans, characters, integers and strings, respectively. Also, we have types float and double for floating point numbers with single and double precision, respectively.

The syntax for literal characters, literal integers, literal floats and literal strings is the same as that specified in the ANSI C.

The use of dependent types in ATS is ubiquitous. Given a boolean b, we can form a type bool(b) and the only value that can be assigned this type is the boolean value b. In other words, bool(b) is a singleton type. Similarly, we can form a type int(i) for each integer i and the only value that can be assigned this type is the integer value i. The dependent types Bool and Int, which are for boolean values and integer values, respectively, can be defined as follows in ATS:

typedef Bool = [a:bool] bool(a)
typedef Int = [a:int] int(a)

where we use [...] for existential quantification. Given an integer i, we can form a type string(i) for strings of length i. The type String for strings is defined as follows:

typedef String = [a:nat] string (a)

Guarded and Asserting Types

Given a proposition B, that is, a static term of sort bool and a type T, we use the name guarded type for a type of the form B =) T and the name asserting type for a type of the form B /\ T. Intuitively, in order for a value of type B =) T to be used, which is most likely a function, the guard B needs to be discharged first. On the other hand, if an expression of type B /\ T evaluates to a value v, then we know that the assertion B holds (at the point where v is obtained) and v is of type T.

Elements of Programming

We use $exp and $typ as meta-variables ranging over expressions (dynamic terms) and types in ATS.

Naming and the Environment

A critical aspect in programming is to be able to bind names to (complex) computational objects. For instance, we can bind names to values through the following syntax:

val radius = 1.0
val pi = 3.1415926
val area = pi * radius * radius

We can also bind a name to a function value:

val square = lam (x: double): double => x * x
val area = pi * square (radius)

Tuples

Given expressions $exp_1,..., $exp_n, we can form a tuple:

'($exp_1,...,$exp_n)

of length n such that the ith component of the tuple is $exp_i for 1 <= i <= n. The use of the quote symbol ' is to indicate that the tuple is boxed. For instance, a pair zero_and_one is formed through the following syntax:

val zero_and_one: '(int, int) = '(0, 1)

We can extract the components of a tuple by pattern matching. For instance, the following syntax binds x and y to 0 and 1:

val '(x, y) = zero_and_one

If values $exp_1,...,$exp_n are assigned types $typ_1,...,$typ_n, respectively, then the tuple

'($exp_1,...,$exp_n)

can be assigned the type

'($typ_1,...,$typ_n)

There is another kind of tuple in ATS, which is referred to as a flat or an unboxed tuple. A tuple of this kind is like a struct in C. For instance, a flat tuple two_and_three is constructed as follows:

val two_and_three: @(int, int) = @(2, 3)

and its components can be selected through pattern matching as follows:

val @(x, y) = two_and_three

If values $exp_1,...,$exp_n are assigned types $typ_1,...,$typ_n, respectively, then the tuple

@($exp_1,...,$exp_n)

can be assigned the type

@($typ_1,...,$typ_n).

Note that the use of the symbol @ is optional.

Records

A label $lab is an alphanumeric identifier or an integer. Given n labels $lab_1, ..., $lab_n and n expressions $exp_1, ..., $exp_n, we can form a boxed record:

'{ $lab_1=$exp_1,...,$lab_n=$exp_n }

If the types of $exp_1, ..., $exp_n are $typ_1, ..., $typ_n, then this record can be assgined the following type:

'{ $lab_1=$typ_1,...,$lab_n=$typ_n }

The use of the quote symbol ' is to indicate that the record is boxed.

The symbol ' needs to be changed to @ (or simply dropped) in order to form flat or unboxed records.

Conditionals

The syntax for forming a conditional is either

if $exp_1 then $exp_2 else $exp3

if $exp_1 then $exp_2

In either case, $exp_1 needs to be a boolean. In the former case, the conditional is of type $typ if both $exp_2 and $exp_3 are of type $typ. In the latter case, the conditional is of type void if $exp_2 is of type void.

Type Annotations

The syntax for supplying a type annotation is

$exp : $typ

For instance, we may write (1+1: Int) to indicate that the expression 1+1 can be assigned the type Int. The need for type annotations is indispensable when advanced programming features in ATS are involved.

Function Values

We use $var a meta-variable ranging over (dynamic) variables in ATS. A function argument $funarg may be of the form $var (unannotated) or $var: $typ (annotated), and a function argument sequence $funargseq is of the form:

$funarg_1,...,$funarg_n

where n may equal 0.

A non-recursive function can be constructed based on the following syntactic form:

lam ($funargseq_1) ... ($funargseq_n) {: $typ} => $exp

where we use {...} to indicate optional syntax. For instance, the following function adds two floating numbers of double precision:

lam (x: double, y: double): double => x + y

The type of this function is (double, double) -> double, that is, it takes two doubles and returns a double. The curried version of this function is written as follows:

lam (x: double) (y: double): double => x + y

The type of this function is double -> (double -> double), that is, it takes a double and returns a function that takes a double and returns a double.

A recursive function can be constructed based on the following syntactic form:

fix $var ($funargseq_n) ... ($funargseq_n) {: $typ} => $exp

As an example, the following function, which is of the type Nat -> Nat, computes the Fibonacci numbers:

fix fib (x: Nat): Nat => if x >= 2 then fib (x-1) + fib (x-2) else x

Function Declarations

The following form of syntax declares a function named "foo":

fun foo ($funargseq_1) ... ($funargseq_n) {: $typ} = $exp

In the case where the defined function is not recursive, the keyword fun may be replaced with the keyword fn. For instance, we can declare a function as follows to compute Fibonacci numbers:

fun fib (x: Nat): Nat = if x >= 2 then fib (x-1) + fib (x-2) else x

Alternatively, we may write equivalent code as follows:

val fib = fix fib (x: Nat): Nat => if x >= 2 then fib (x-1) + fib (x-2) else x

or as follows:

val rec fib: Nat -> Nat = lam x => if x >= 2 then fib (x-1) + fib (x-2) else x

If we need to declare two functions "foo" and "bar" mutually recursively, we can use the following form of syntax:

fun foo ($funargseq_1_1) ... ($funargseq_1_n1) {: $typ_1} = $exp_1
and bar ($funargseq_2_1) ... ($funargseq_2_n2) {: $typ_2} = $exp_2

This form can be further generalized to declare n mutual recursive functions for n >= 2 . As an example, the follow code implements two mutually recursive functions:

fun is_even (x: int): bool = if x > 0 then is_odd  (x-1) else true
and is_odd  (x: int): bool = if x > 0 then is_even (x-1) else false

Local Bindings

A common way to introduce local bindings is through the use of the keyword let. As an example, the following code defines a function that computes the roots of a quadratic equation in terms of the efficients of the equation:

// computing roots for [Axx + Bx + C]
fn foo (A: double, B: double, C: double): @(double, double) = let
  val Delta = B * B - 4.0 * A * C
  val () = if Delta < 0.0 then (prerr "no real roots!\n"; exit {void} 1)
  val Delta_sqrt = sqrt (Delta)
  val root1 = (~B + Delta_sqrt) / (2.0 * A)
  val root2 = (~B - Delta_sqrt) / (2.0 * A)
in
  @(root1, root2) // this a flat tuple
end // end of [foo]

Note that the variables Delta, Delta_sqrt, root1 and root2 are introduced for use only inside the let-expression. Another form of local binding is introduced through the keyword where as is shown below:

val fact10 = fact 10 where {
  fun fact (x: int): int = if x > 0 then x * fact (x-1) else 1
} // end of [fact10]

This code is just an equivalent form of the following code:

val fact10 = let
  fun fact (x: int): int = if x > 0 then x * fact (x-1) else 1
in
  fact 10
end // end of [fact10]

Sequencing

Given expressions $exp_1, ..., $exp_n, $exp_{n+1}, where n is some natural number, we can sequence them to form the expression ($exp_1; ...; $exp_n; $exp_{n+1}); this expression is essentially equivalent to the following one in terms of both static and dynamic semantics:

let val () = $exp_1 ... val () = $exp_n in $exp_{n+1} end

So it is clear that the expressions $exp_1, ..., $exp_n are required to be of the void type; the type of the sequence expression is the type of its last component $exp_{n+1}. If $exp_{n+1} is omitted, the default is the expression (), which represents the void value, that is, the only value of the void type.

Comments

There are currently three forms of comments in ATS/Anairiats.

A line comment starts with a double slash "//" and ends with a newline character.
A rest-of-file comment starts with a quadruple slash "////" and ends at the end of the file in which this comment occurs. This style is often useful for debugging or testing.
An enclosed comment starts with the symbol "(*" and ends with "*)". Such comments can be embedded in one another.

As in C, it is also possible to comment out a piece of code in ATS as follows:

#if 0 #then // [#then] is optional

(all the code here is commented out)

#endif

The only requirement is that the code that is commented out must represent a list of syntactically correct declarations in ATS.

The code used for illustration is available here.

Filename Extensions

In ATS, we use the filename extensions ".sats" and ".dats" to indicate static and dynamic files, respectively. These two extensions have some special meaning attached to them and thus should not be replaced arbitrarily.

A static file may contain sort definitions, datasort declarations, static definitions, abstract type declarations, exception declarations, datatype declarations, macro definition, interfaces for dynamic values and functions, etc. In terms of functionality, a static file in ATS is similar to a header file (with the extension ".h") in C or an interface file (with the extension ".mli") in Objective Caml.

A dynamic file may contain everything in a static file. In addition, it may also contain defintions for dynamic values and functions.

In general, the syntax for constructing code in a static file can also be used for constructing code in a dynamic file. The only exception involves declaring interfaces for dynamic values and functions. For instance, in a static file, we can use the following syntax to declare interfaces (or types) for a value named pi and a function named area_of_circle.

val pi : double
fun area_of_circle (radius: double): double

When doing the same thing in a dynamic file, we need to use the following slightly different syntax:

extern val pi : double
extern fun area_of_circle (radius: double): double

where extern is a keyword in ATS.

As a convention, we often use the filename extension ".cats" to indicate that a file contains some C code that is supposed to be combined with ATS code in certain manner. However, the use of this filename extension is not mandatory.

File Inclusion

As is in C, file inclusion in ATS can be done by using the directive #include. For instance, the following line indicates that a file named foobar is included, that is, this line is to be replaced with the content of the file foobar:

#include "foobar.hats"

Note that the included file is parsed according to the syntax for statics or dynamics depending whether the file is included in a static or dynamic file. As a convention, the name of an included file often ends with the extension ".hats".

A common use of file inclusion is to keep some constants, flags or parameters being defined consistently across a set of files. For instance, the file prelude/params.hats serves such a purpose.

File inclusion can also be used to emulate (in a limited but rather useful manner) the feature of functors (a module-level language construct) as is supported in languages such as SML and Objective Caml. Examples illustrating such a use of file inclusion can be found elsewhere.

Compilation

The command for compiling ATS programs is atscc, which is stored in the directory "$ATSHOME/bin/" after ATS/Anairiats is installed. The ATS compiler translates ATS programs into C code, which is then compiled by the GNU C compiler gcc.

atscc accepts all the flags gcc recognizes and passes them to gcc directly. The following is a list of flags that are special to atscc:

The flag --atstc or -tc indicates typechecking only.
The flag --atscc or -cc indicates compiling to C code only.
The flag -D_ATS_GCATS indicates the need for garbage collection.

If a C compiler other than gcc is to be used, please set the environment variable ATSCCOMP to the command name of this C compiler.

As an example, the following command only typechecks the ATS code in "foo.dats" and "bar.sats":

atscc -tc foo.dats bar.sats

while the following command compiles "foo.dats" and "bar.sats" into "foo_dats.c" and "bar_sats.c", respectively, if typechecking "foo.dats" and bar.sats" succeeds:

atscc -cc foo.dats bar.sats

Typically, atscc is used as follows:

atscc -o foobar -O3 foo.dats bar.sats

When this command is issued, atscc first generates "foo_dats.c" and "bar_sats.c", and then invokes gcc to compile "foo_dats.c" and "bar_sats.c" to generate an executable file named "foobar". The flag "-O3" requires that gcc perform all optimizations classified at level 3 or below. We may also issue the following command to produce some debugging information for tools like gdb and valgrind:

atscc -o foobar -g foo.dats bar.sats

ATS programs can run with or without (systematic) garbage collection. If garbage collection is needed, then the following command can be issued to generate the executable "foobar":

atscc -o foobar -O3 foo.dats bar.sats -D_ATS_GCATS

For your information, the command atscc is not implemented in a scripting language. It is one of the very first programs implemented in ATS/Geizella (and then ported to ATS/Anairiats). The code for atscc (and several other commands) can be found at utils/scripts/.

The Main Function in ATS

The main function in ATS is declared as follows:

fun main {n:int | n > 0} (argc: int n, argv: &(@[string][n])): void
  = "mainats"

So this function takes an integer argc greater than 0 and a string array argv of size argc and returns no value. The syntax argv: &(@[string][n]) indicates that argv is a call-by-reference argument. If we followed C++ kind of syntax, then this would be written as something like &argv: @[string][n].

The name "mainats", which is global, can be used in C code to refer to this function. When a program in ATS is compiled that implements the main function in ATS, the following implementation of the main function in C is automatically included in the C code generated by the ATS compiler:

main (int argc, char *argv[]) {

// some initialization code is included here

mainats (argc, argv) ;
return 0 ;

}

As an example, the following ATS program prints out the command line on the standard output:

implement main (argc, argv) = let
  fun loop {n,i:nat | i <= n} // [loop] is tail-recursive
    (i: int i, argc: int n, argv: &(@[string][n])): void =
    if i < argc then begin
      if i > 0 then print (' '); print argv.[i]; loop (i+1, argc, argv)
    end // end of [if]
  // end of [loop]
in
  loop (0, argc, argv); print_newline ()
end // end of [main]

There are also situations where the function mainats may need to be implemented in C. If this happens, the function main_dummy needs to be implemented as follows:

implement main_dummy () = ()

This allows the compiler to generate proper code for the main function in C.

As an example, we present as follows a typical scenario in GTK+ programming, where the function gtk_init needs to be called to modify the arguments passed from a command line:

// some function implemented in ATS
extern fun main_work {n:pos} (argc: int n, argv: &(@[string][n])): void
  = "main_work"

implement main_work (argc, argv) = ( (* some ATS code that does the main work *) )

implement main_dummy () = () // indicating [mainats] being implemented in C

%{$

ats_void_type
mainats (ats_int_type argc, ats_ptr_type argv) {
  gtk_init ((int*)&argc, (char ***)&argv) ;
  main_work (argc, argv) ;
  return ;
} /* end of [mainats] */

%}

The code used for illustration is available here.

Fixity Declaration

Given a function f of arity n, the standard syntax for applying f to n arguments v_1, ..., v_n is f(v_1, ..., v_n). It is, however, allowed in ATS to use infix notation for a binary function application, and prefix or postifix notation for a unary function application.

In ATS, each identifier can be assigned one of the following fixities: prefix, infix and postfix. The fixity declarations for many commonly used identifiers can be found in prelude/fixity.ats. We often use the name operator to refer to an identifier that is assigned a fixity.

For instance, we use the following syntax to declare that + and - are infix operators of precedence value 50:

infixl 50 + -

After this declaration, we can write an expression like 1 + 2 - 3, which is parsed into -(+(1, 2), 3) in terms of the standard syntax for function application. The keyword infixl indicates that the declared infix operators are left-associative. For right-associative and non-associative infix operators, please use the keywords infixr and infix, respectively. If the precedence value is omitted in a fixity declaration, it is assumed to equal 0.

We can also use the following syntax to declare that iadd, fadd, padd and uadd are left-associative infix operators with a precedence value equal to that of the operator +:

infixl (+) iadd fadd padd uadd

This is useful as it is difficult in practice to remember the precedence values of (a large collection of) declared operators. Sometimes, we may need to specify that the precedence value of one operator in relation to that of another one. For instance, the following syntax declares that opr2 is a left-associative infix operator and its precedence value is that of opr1 plus 10:

infixl (opr1 + 10) opr2

If the plus sign (+) is changed to the minus sign (-), then the precedence value of opr2 is that of opr1 minus 10.

We can also turn an identifier opr into a non-associative infix operator (of precedence value 0) by putting the backslash symbol \ in front of it. For instance, the expression $exp_1 \opr $exp_2 stands for opr ($exp_1, $exp_2), where $exp_1 and $exp_2 refer to some expressions, either static or dynamic.

The syntax for declaring (unary) prefix and postfix operators are similar. For instance, the following syntax declares that ~ and ? are prefix and postfix operators of precedence values 61 and 69, respectively:

prefix 61 ~
postfix 69 ?

Please find an example involving a postfix operator here.

For a given occurrence of an operator, we can deprive it of its assigned fixity by simply putting the keyword op in front of it. For instance 1 + 2 - 3 can be writen as op- (op+ (1, 2), 3). It is also possible to (permanently) deprive an operator of its assigned fixity. For instance, the following syntax does so to the operators iadd, fadd, padd and uadd:

nonfix iadd fadd padd uadd

Lastly, please note that fixity declarations are lexically scoped, and each fixity declaration is only effective within its legal scope.

Overloading

A symbol in ATS can be overloaded with multiple functions. The following syntax introduces a symbol with the name foo for overloading.

symintr foo // symbol introduction for overloading

Suppose that foo1, foo2 and foo3 are names of three functions in ATS. Then we can overload foo with these three functions as follows:

overload foo with foo1
overload foo with foo2
overload foo with foo3

An overloaded function symbol is resolved according to the number of arguments it takes and, if needed, the types of these arguments.

The code used for illustration is available here.

Macros

There are two kinds of macros in ATS. One kind is C-like and the other kind is LISP-like, though they are much simpler as well as weaker than their counterparts in C and LISP, respectively.

C-like Macros

We use some examples to illustrate certain typical uses of C-like macros in ATS.

The following two declarations bind the identifiers N1 and N2 to the abstract syntax trees (not strings) representing 1024 and N1 + N1, respectively:

#define N1 1024
#define N2 N1 + N1

Suppose we have the following value declaration appearing in the scope of the above macro delarations:

val x = N1 * N2

Then N1 * N2 first expands into 1024 * (N1 + N1), which further expands into 1024 * (1024 + 1024). Note that if this example is done in C, then N1 * N2 expands into 1024 * 1024 + 1024, which is different from what we have here. Also note that it makes no difference if we reverse the order of the previous macro definitions:

#define N2 N1 + N1
#define N1 1024

If we declare a marco as follows:

#define LOOP (LOOP + 1)

then an infinite loop is entered (or more precisely, some macro expansion depth is to be reached) when the identifier LOOP is expanded.

LISP-like Macros

There are two forms of LISP-like macros in ATS: short form and long form. These (untyped) macros are highly flexible and expressive, and they can certainly be used in convoluted manners that should probably be avoided in the first place. Some commonly used macro definitions can be found in prelude/macrodef.sats. In order to use LISP-like macros in ATS effectively, the programmer may want to find some examples in LISP involving backquote-comma-notation.

Macros in Long Form As a macro in short form can simply be considered a special kind of macro in long form, we first give some explanantion on the latter. A macro definition in long form is introduced via the use of the keyword macrodef. For instance, the following syntax introduces a macro name one that refers to some code, that is, abstract syntax tree (AST) representing the integer number 1.

macrodef one = `(1)

The special syntax `(...), where no space is allowed between the backquote "`" and the left parenthsis "(", means to form an abstract syntax tree representing what is written inside the parentheses. This is often referred to as backquote-notation. Intuitively, one may think that a backquote-notation exerts an effect that "freezes" everything inside it. Let us now define another macro as follows:

macrodef one_plus_one = `(1 + 1)

The defined macro name one_plus_one refers to some code (i.e., AST) representing 1 + 1. At this point, it is important to stress that the code representing 1 + 1 is different from the code representing 2. The macro name one_plus_one can also be defined as follows:

macrodef one_plus_one = `(,(one) + ,(one))

The syntax ,(...), where no space is allowed between the comma "," and the left parenthesis "(", indicates the need to expand (or evaluate) whatever is written inside the parentheses. This is often referred to as comma-notation. A comma-notation is only allowed inside a backquote-notation. Intuitively, a comma-notation cancels out the "freezing" effect of the enclosing backquote-notation.

In addition to macro names, we can also define macro functions. For instance, the following syntax introduces a macro function square_mac:

macrodef square_mac (x) = `(,(x) * ,(x)) // [x] should refer to some code

Here are some examples that make use of square_mac:

fun square_fun (i: int): int = ,(square_mac `(i))
fun area_of_circle_fun (r: double): doubld = 3.1416 * ,(square_mac `(r))

Macros in Short Form The previous macro function square_mac can also be defined as follows:

macdef square_mac (x) = ,(x) * ,(x) // [x] should refer to some code

The keyword macdef introduces a macro definition in short form. The previous examples that make use of square_mac can now be written as follows:

fun square_fun (i: int): int = square_mac (i)
fun area_of_circle_fun (r: double): doubld = 3.1416 * square_mac (r)

In terms of syntax, a macro function in short form is just like an ordinary function. In general, if a unary macro function fmac in short form is defined as as follows:

macdef fmac (x) = $exp_def

then one may essentially think that a macro definition in long form is defined as follows:

macrodef fmac_long (x) = `($exp_def) // please note the backquote

and each occurrence of fmac($exp_arg) is automatically rewritten into ,(fmac_long(`($exp_arg))). Note that macro functions in short form with multiple arguments are handled in precisely the same fashion. The primary purpose for introducing macros in short form is to provide a form of syntax that seems more accessible. While macros in long form can be defined recursively (as is to be explained later), macros in short form cannot.

Recursive Macro Definitions (to be written later)

The code used for illustration is available here.

Function or Closure?

It is possible in ATS to differentiate at the level of types functions without environment from functions with environment. This is an indispensable feature for interfacing functions in C directly inside ATS as these functions all have no environment. A function with an environment is often referred to as a closure.

A problem with closures

In the programming language C, each function is at the toplevel, and it is naturally represented as a pointer to the code heap where the code for the function is stored. However, inner functions are supported in ATS, and they may appear in the return value of a function call. As an example, the following code in ATS involves an inner function add_x appearing in the return value of an outer function add:

fn add (x: int):<cloref> int -<cloref> int = begin
  let fn add_x (y: int):<cloref> int = x + y in add_x end
end // end of [add]

The syntax <cloref> indicates that the defined functions add and add_x are both (persistant) closure references. Applying add to a given integer i, we obtain a unary function that adds i to its argument. This function is represented as a pair (add_x_env, [x -> i]), where we use [x->i] for something often referred to as an environment that binds x to i, and add_x_env for the (toplevel) function defined in the following (pseudo) code:

fun add_x_env (env, y) = env.x + y

Note that the (pseudo) syntax env.x stands for the selection of the value to which x is bound in the environment env.

We use the name closure to refer to a pair like (add_x_env, [x -> i]). Given that functions may occur as arguments (of other functions) in ATS, it is necessary that all functions be represented uniformly as closures (if functions without environments cannot be differentiated from functions with environments at compile-time). For instance, the previously defined (toplevel) function add needs to be represented as a pair (add_env, []), where [] stands for the empty environment and add_env is the function defined in the following (pseudo) code:

fun add_env (env, x) = (add_x_env, [x -> x])

In functional languages like ML and Haskell, all functions are represented as closures. Unfortunately, this requirement for representing all functions as closures, can cause a serious difficulty when we try to use in ATS a higher-order function implemented in C. Let us see a concrete example.

The function qsort is declared in <stdlib.h> with the following type:

void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *, const void *));

Clearly, qsort demands that its fourth argument be a function (not a closure). In order to use qsort in ATS directly, we need a way to construct functions represented as code pointers (instead of closures).

Types for functions with environment

In ATS, a type of the form (T_1, ..., T_n) -<cloref> T_0 is for a closure reference, that is, a reference to a function paired with an environment that takes n arguments of types T_1, ..., T_n and returns a value of type T_0. The following declaration states that foo is a closure reference of the type (T_1, ..., T_n) -<cloref> T_0:

val foo : (T_1, ...., T_n) -<cloref> T_0

Types for functions without environment

In ATS, a type of the form (T_1, ..., T_n) -<fun> T_0, where fun can be omitted, is for a function without environment that takes n arguments of types T_1, ..., T_n and returns a value of type T_0. The following two equivalent declarations both state that bar is a function of the type (T_1, ..., T_n) -<fun> T_0:

val bar : (T_1, ...., T_n) -<fun> T_0
fun bar (x_1: T_1, ...., x_n: T_n): T_0

The previously mentioned function qsort can be given the following type in ATS:

fun qsort {a:viewt@ype} {n:nat}
  (base: & @[a][n], nmemb: size_t n, size: sizeof_t a, compar: (&a, &a) -<fun> int): void

This type indicates that qsort itself is a function without environment and its fourth argument is also a function without environment. As an example, the following code implements a simple test on qsort:

fn test_qsort () = let
  fun pr_loop {n,i:nat | i <= n} .<n-i>.
    (A: &(@[int][n]), n: size_t n, i: size_t i): void =
    if i < n then begin
      if i > 0 then print ", "; print A.[i]; pr_loop (A, n, i+1)
    end // end of [if]
  // end of [pr_loop]

  // creating a linear array of size 10
  val (pf_gc, pf_arr | p_arr, asz) = $arrsz {int} (1, 9, 2, 8, 3, 7, 4, 6, 5, 0)

  val () = (print "before quicksort:\n")
  val () = pr_loop (!p_arr, asz, 0)
  val () = print_newline ()

  val () = begin
    qsort {int} (!p_arr, asz, sizeof<int>, lam (x, y) => compare (x, y))
  end // end of [val]

  val () = (print "after quicksort:
")
  val () = pr_loop (!p_arr, asz, 0)
  val () = print_newline ()
in
  array_ptr_free {int} (pf_gc, pf_arr | p_arr)
end // end of [test_qsort]

Implementing a function without enviroment

The following code implements add as a function without enviroment:

fn add0 (x: int):<fun> int -<cloref> int = begin
  let fn add0_x (y: int):<cloref> int = x + y in add0_x end
end // end of [add0]

The syntax :<fun> is an annotation indicating that the type ascribed to add0 is int -<fun> (int -<cloref> int). If we change int -<cloref> int into int -<fun> int, then an error is reported at compile-time as the function add0_x does require a nonempty environment that binds the variable x to some value. If add0 is needed in a place where a closure is expected, we can simply write lam x => add0 (x) instead.

When a function is declared via the keyword fun or fn, it is assumed by default that the function is without environment. For instance, the following code is equivalent to the previous implementation of add0:

fn add0 (x: int): int -<cloref> int = begin
  let fn add0_x (y: int):<cloref> int = x + y in add0_x end
end // end of [add0]

In contrast, the following code, which is used at the beginning of this tutorial, implements add as a closure:

fn add (x: int):<cloref> int -<cloref> int = begin
  let fn add_x (y: int):<cloref> int = x + y in add_x end
end // end of [add]

Linear Closures

As a closure is an aggregated value, memory allocation is required in order to form closures. In ATS, linear closures are supported. As memory allocated for linear closures can be freed explicitly by the programmer, such closures are particularly useful in a situation where automatic garbage collection is not allowed or its use needs to be significantly reduced. We are to present detailed explanation on linear closures elsewhere.

The code used for illustration is available here.

Variadic Functions

As in C, a funciton in ATS may also take an indefinite number of arguments. Let us use the function printf in ATS, which corresponds to the function of the same name in C, as an example to explain this feature.

The type of printf is given as follows:

fun printf : {ts:types} (printf_c ts, ts) -> void

We use printf_c for a type constructor that forms types for format strings (in C) when applied to lists of types. For instance, printf_c(char, double, int) is a type for format strings that require a character, a double, and an integer to be supplied. Given a character c, a double d and an integer i, @(c, d, i) is an argument of types (char, double, int), and the following expression is well-typed in ATS:

printf ("c = %c and d = %f and i = %i", @(c, d, i))

The type of the format string "c = %c and d = %f and i = %i" is computed to be printf_c (char, double, int) and then @(c, d, i) is checked to be of the type (char, double, int). Note that a format string must be a constant in order for its type to be computed during typechecking.

As an example, we present as follows a program that prints out a multiplication table for single digits:

#define N 9

implement main (argc, argv) = loop1 (0) where {

  // [loop1] and [loop2] are verified to be terminating based on the supplied metrics

  // [.< N-i, 0 >.] is a termination metric
  // Ignore it if you have not learned this feature yet
  fun loop1 {i:nat | i <= N} .< N-i, 0 >. (i: int i): void =
    if i < N then loop2 (i+1, 0) else ()
  // end of [loop1]

  // [.< N-i, N+1-j >.] is a termination metric
  // Ignore it if you have notlearned this feature yet
  and loop2 {i,j:nat | i <= N; j <= i} .< N-i, i-j+1 >. (i: int i, j: int j): void =
    if j < i then begin
      if (j > 0) then print '\t';
      printf ("%1d*%1d=%2.2d", @(j+1, i, (j+1) * i));
      loop2 (i, j+1)
    end else begin
      print_newline (); loop1 (i)
    end // end of [if]
  // end of [loop2]

} // end of [main]

The following text is the output of the program:

1*1=01
1*2=02	2*2=04
1*3=03	2*3=06	3*3=09
1*4=04	2*4=08	3*4=12	4*4=16
1*5=05	2*5=10	3*5=15	4*5=20	5*5=25
1*6=06	2*6=12	3*6=18	4*6=24	5*6=30	6*6=36
1*7=07	2*7=14	3*7=21	4*7=28	5*7=35	6*7=42	7*7=49
1*8=08	2*8=16	3*8=24	4*8=32	5*8=40	6*8=48	7*8=56	8*8=64
1*9=09	2*9=18	3*9=27	4*9=36	5*9=45	6*9=54	7*9=63	8*9=72	9*9=81

The code used for illustration is available here.

Tail-Recursive Functions

Probably the single most important optimization performed by the ATS/Anairiats compiler is the translation of tail-recursive function calls into direct (local) jumps.

Tail-Recursion

When applied to an integer n, the following defined function sum1 sums up integers from 1 to n.

// [sum1] is recursive but not tail-recursive
fun sum1 (n: int): int = if n > 0 then n + sum1 (n-1) else 0

This function is recursive but not tail-recursive. The stack space it consumes is proportional to the value of its argument. Essentially, the ATS compiler translates the definition of sum1 into the following C code:

int sum1 (int n) {
  if (n > 1) return n + sum1 (n-1) ; else return 1 ;
}

When applied to an integer n, the following defined function sum2 also sums up integers from 1 to n.

fn sum2 (n: int): int = let // sum2 is non-recursive
  // [loop] is tail-recursive
  fun loop (n: int, res: int): int =
    if n > 0 then loop (n-1, res+n) else res
  // end of [loop]
in
  loop (n, 0)
end // end of [sum2]

The inner function loop in the definition of sum2 is tail-recursive. The stack space consumed by loop is a constant independent of th value of the argument of sum2. Essentially, the ATS compiler translates the definition of sum2 into the following C code:

int sum2_loop (int n, int res) {
  loop:
  if (n > 1) {
    res = res + n ; n = n - 1; goto loop; 
  } else {
    // do nothing
  }
  return res ;
}

int sum2 (int n) { return sum2_loop (n, 0) ; }

Mutual Tail-Recursion

Sometimes, mutually tail-recursive functions are encountered. For instance, in the following example, the functions even and odd are mutually tail-recursive.

// [fn*] indicates the need to combine two or more functions
// so as to translate tail-recursive calls into direct jumps
fn* even (n: int): bool = if n > 0 then odd (n-1) else true
and odd (n: int): bool = if n > 0 then even (n-1) else false

The keyword fn* is used to indicate to the ATS compiler that the functions even and odd need to be combined together so as to turn (mutually) tail-recursive function calls into direct jumps. Essentially, the ATS compiler emits the following C code after compiling this example:

bool even_odd (int tag, int n) {

bool res ;

switch (tag) {
  0: goto even ;
  1: goto odd ;
  default : exit (1) ;
}

even: if (n > 0) { n = n - 1; goto odd; } else { res = true; goto done; }

odd: if (n > 0) { n = n - 1; goto even; } else { res = false; goto done; }

done: return res ;

} /* end of [even_odd] */

bool even (int n) { return even_odd (0, n) ; }
bool odd (int n) { return even_odd (1, n) ; }

Note that mutually recursive functions can be combined in such a manner only if they all have the same return type. In the above case, both even and odd have the same return type bool.

When translating C code involving embedded loops, we often encounter mutual tail-recursion. For instance, the following C code prints out ordered pairs of digits:

int main (int argc, char *argv[]) {
  int i, j ;

  for (i = 0; i <= 9; i += 1) {
    for (j = i; j <= 9; j += 1) {
      if (i < j) printf (", ") ; printf ("(%i, %i)", i, j) ;
    } /* for */
    printf ("\n") ;
  } /* for */

  return 0 ;
}

A straightforward translation of the C code into ATS (in functional style) is given as follows:

implement main (argc, argv) = let
  fn* loop1 {i:nat} (i: int i): void =
    if i <= 9 then loop2 (i, i) else ()
  // end of [loop1]

  and loop2 {i,j:nat} (i: int i, j: int j): void =
    if j <= 9 then begin
      if i < j then begin
        print ", "; printf ("(%i, %i)", @(i, j)); loop2 (i, j+1)
      end // end of [if]
    end else begin
      print_newline (); loop1 (i+1)
    end // end of [if]
  // end of [loop2]
in
  loop1 0
end // end of [main]

where the mutually tail-recursive funtions loop1 and loop2 correspond to the outer and inner loops in the C code, respectively.

The code used for illustration is available here.

Termination Metrics

In ATS, the programmer is allowed to supply termination metrics for verifing the termination of recursively defined functions. This is really an indispensable feature for supporting programming with theorem proving as proof functions, namely, functions representing proofs, must be proven to be pure and terminating.

A termination metric is a tuple (M1, ..., Mn) of natural numbers, where n >= 0 . We use the standard well-founded lexicographical ordering on natural numbers to order such tuples.

A Primitive Recursive Function

The kind of recursion in the following implementation of the factorial function is primitive recursion:

// [fact] implements the factorial function
fun fact {n:nat} .< n >. (n: int n): Int = if n > 0 then n * fact (n-1) else 1

The syntax .< n >. indicates that the metric supplied for verifying the termination of the defined function is a singleton tuple (n). In the definition of fact, the metric for the recursive call to fact is (n-1), which is strictly less than (n). So the function fact is terminating.

Some General Recursive Functions

We implement as follows a function gcd that computes the greatest common division of two given positive integers:

// [gcd] computes the greates common divisors of two positive integers
fun gcd {m,n:int | m > 0; n > 0} .< m+n >.
  (m: int m, n: int n): [r:nat | 1 <= r; r <= min(m, n)] int r =
  if m > n then gcd (m - n, n)
  else if m < n then gcd (m, n - m)
  else m

The syntax .< m+n >. indicates that the termination metric (m+n) should be used to verify that the defined function gcd is terminating. In the definition of gcd, the termination metric for the first recursive call to gcd is (m-n)+n=m, which is strictly less than the original termination metri m+n (as n is positive); the termination metric for the second recursive call to gcd is m+(n-m)=n, which is also strictly less than the original termination metric m+n (as m is positive). Thus, gcd is a terminating function.

As another example, we implement as follows the Ackermann's function:

// [ack] implements the Ackermann's function
fun ack {m,n:nat} .< m, n >.
  (m: int m, n: int n): Nat =
  if m > 0 then
    if n > 0 then ack (m-1, ack (m, n-1)) else ack (m-1, 1)
  else n+1

The syntax .< m, n >. indicates that the termination metric is a pair of natural numbers: (m, n). We use the lexicographical ordering on natural numbers to compare such metrics. To verify that ack is terminating, we need to solve the following constraints:

(m-1, k) is less than (m, n) under the assumption m > 0, where k is an arbitrary natural number.
(m, n-1) is less than (m, n) under the assumption m > 0 and n > 0.
(m-1, 1) is less than (m, n) under the assumption m > 0.

As all of these constraints can be readily solved, we conclude that ack is a terminating funciton.

Mutually Recursive Functions

When mutually recursive functions are to be verified, the termination metrics for these functions, which are tuples of natural numbers, must be of the same tuple length. We given a simple example as follows:

// mutually recursive functions
fun isEven {n:nat} .< 2*n+2 >. (n: int n): bool =
  if n > 0 then ~(isOdd n) else true
and isOdd {n:nat} .< 2*n+1 >. (n: int n): bool =
  if n > 0 then isEven (n-1) else false

Clearly, we may also verify the termination of these two functions by using the metrics .< n, 1 >. and .< n, 0 >. for isEven and isOdd, respectively.

Termination Checking at Run-time

Suppose that foo and bar are declared as follows:

fun foo ():<> void and bar ():<> void

Moreover, suppose that the following implementation of foo is given in a file named foo.dats:

implement foo () = $Bar.bar ()

while the following implementation of bar is given in another file named bar.dats:

implement bar () = $Foo.foo ()

Clearly, neither foo nor bar is terminating. In practice, it is difficult to resolve this issue of calling cycles among terminating functions by solely relying on termination metrics. Instead, atscc can generate run-time code for detecting calling cycles among terminating functions if the flag -D_ATS_TERMINATION_CHECK is present. For instance, if foo.dats and bar.dats are compiled as follows:

atscc -D_ATS_TERMINATION_CHECK foo.dats and bar.dats

then a run-time error is to be reported to indicate a calling cycle when either foo.dats or bar.dats is loaded dynamically.

The code used for illustration is available here.

Types with Effects

The type system of ATS can currently track the following list of effects that a program may incur during its execution:

The exn (exception) effect is incurred if a program raises an exception during its execution. This covers the case where the execution of a program is terminated abnormally, for instance, by calling exit.
The ntm (nontermination) effect refers to potential nonterminating excecution of a program.
The ref (reference) effect essentially means that a program may potentially update the content of some shared memory.

The code used for illustration is available here.

Parametric Polymorphism and Templates

Parametric polymorphism (or polymorphism for short) offers a flexible and effective approach to supporting code reuse. For instance, given a pair (v1, v2) where v1 is a a boolean and v2 a character, the function swap_bool_char defined below returns a pair (v2, v1):

fun swap_bool_char (xy: @(bool, char)): @(char, bool) = (xy.1, xy.0)

Now suppose that a pair of integers need to be swapped, and this results in the implementation of the following function swap_int_int:

fun swap_int_int (xy: @(int, int)): @(int, int) = (xy.1, xy.0)

The code duplication between swap_bool_char and swap_int_int is obvious, and it can be easily avoided by implementing a function template as follows:

fun{a,b:t@ype} swap (xy: @(a, b)): @(b, a) = (xy.1, xy.0)

Now the functions swap_bool_char and swap_int_int can simply be replaced with swap<bool,char> and swap<int,int>, respectively. The function template swap cannot be compiled into executable binary code directly as the sizes of type variables a and b are unknown: The special sort t@ype is for classifying types whose sizes are unspecified. If swap<T1,T2> is used for some types T1 and T2 of known sizes, then an instantiation of swap is created where type variables a and b are replaced with T1 and T2, respectively, and then compiled into executable binary code. For those who know the feature of templates in C++, this should sound rather familiar. In contrast to swap, swap_type_type is defined below as a polymorphic function (rather than a function template):

fun swap_type_type {a,b:type} (xy: @(a, b)): @(b, a) = (xy.1, xy.0)

This function can be compiled into executable binary code as the sizes of type variables a and b are known: The special sort type is for classifying types whose sizes equal exactly one word, that is, the size of a pointer. For example, the size of a string is one word, and the size of any declared datatype is also one word. Given strings s1 and s2, an application of swap_type_type to @(s1, s2) can be written as follows:

swap_type_type {string,string} @(s1, s2)

where the expression {string,string} is often referred to as a static argument. As in this case, most static arguments do not have to be provided explicitly since they can be automatically inferred. However, such static arguments, if provided, can often enhance the quality and precision of the error messages reported in case of typechecking failure. This is a topic to be explored elsewhere in great depth.

Template Declaration and Implementation

Often, the interface for a template may need to be declared alone. For instance, the interface for the above swap function template can be declared as follows:

extern fun{a,b:t@ype} swap (xy: @(a, b)): @(b, a)

Just like a declared function interface, a declared template interface can be implemented. For instance, the following code implements the interface declared for the swap function template:

implement{a,b} swap (xy) = (xy.1, xy.0)

This form of template implementation is often referred to as generic template implementation in contrast to specialized template implementation presented as follows. It is also allowed to implement specialized templates in ATS. For instance, the following code implements the above swap function template that is specialized with the type variables a and b being set to int and int, respectively:

implement swap<int,int> (xy) = let
  val x = xy.0 and y = xy.1; val s = x + y in (s - x, s - y)
end // end of [swap<int,int>]

The code used for illustration is available here.

Persistent Lists

Lists are by far the most commonly used data structure in functional programming. We say that a data structure is persistent if it is heap-allocated and can only be freed by the GC. In contrast, a data structure is said to be linear if it is either stack-allocated or heap-allocated and can be freed by the user as well as by the GC.

The datatype for persistent lists in ATS is declared as follows:

datatype list (a:t@ype+, int) = // t@ype+: covariant
  | list_nil (a, 0)
  | {n:int | n >= 0} list_cons (a, n+1) of (a, list (a, n))

Given a type T and an integer I, the type list(T, I) is for lists of length I in which each element is of type T.

Let us first introduce some abbreviations for the list constructors:

#define nil list_nil
#define cons list_cons
#define :: list_cons

The following syntax creates a list consisting of 1, 2 and 3:

cons (1, cons (2, cons (3, nil ()))) // [nil ()] can be replaced with [nil]

This kind of syntax is a bit unwieldy if longer lists need to be handled, and some alternatives are given as follows:

'[1, 2, 3] // the first character is quote (')
$lst (1, 2, 3) // this is equivalent to '[1, 2, 3]
$lst {Nat} (1, 2, 3) // [Nat] is given as the type for the list elements

The interfaces for various functions on lists can be found in the file prelude/SATS/list.sats.

We now present some simple programs involving lists. The following code implements a function template that computes the length of a given list:

// This implementation is not tail-recursive
fun{a:t@ype} length {n:nat} (xs: list (a, n)): int n =
  case+ xs of _ :: xs => 1 + length xs | nil () => 0

This is a rather poor implementation as it is not tail-recursive. A better one, which is tail-recursive, is given as follows:

// This implementation is tail-recursive
fn{a:t@ype} length {n:nat} (xs: list (a, n)): int n = let
  fun loop {i,j:int} (xs: list (a, i), j: int j): int (i+j) =
    case+ xs of _ :: xs => loop (xs, j+1) | nil () => j
in
  loop (xs, 0)
end // end of [length]

The code used for illustration is available here.

Val(ue) and Var(iable) Declarations

The keywords for introducing value identifiers and variable identifiers are val and var, respectiveily. The essential difference between a value identifier and a variable identifier is that the value referred to by the former cannot be changed during the course of evaluation while the value referred to by the latter can. This difference is clearly reflected in the following two styles of implementation of the factorial function:

// functional style
fn fact_val (x: int): int = loop (x, 1) where {
  fun loop (x: int, res: int): int =
    if x > 0 then loop (x-1, x * res) else res
  // end of [loop]
} // end of [fact_val]

// imperative style
fn fact_var (x: int): int = let
  var x: int = x; var res: int = 1
  val () = while (x > 0) (res := x * res; x := x - 1)
in
  res  
end // end of [fact_var]

In a functional language such as ML, where variable identifiers are not available, an imperative style of implementation of the factorial function may have to be written as follows:


// imperative style based on persistent references, which looks
// awkward and runs inefficiently (in terms of both time and memory)
fn fact_ref (x: int): int = let
  val x = ref<int> (x); val res = ref<int> (1)
  val () = while (!x > 0) (!res := !x * !res; !x := !x - 1)
in
  !res
end // end of [fact_ref]

An implementation as such is often written by a beginner in functional programming who has previously programmed in imperative languages (e.g., C, C++, Java). The function fact_ref is clearly inefficient as each call to it needs to allocate two references on heap (corresponding to x and res), which can only be reclaimed by GC later.

No Local Variable Escapes

In a language like C that supports local variables, many problems are caused by a local variable escaping its legal scope. This, however, is not an issue in ATS as the type system of ATS guarantees that local variables cannot be accessed out of its legal scope (while allowing the addresses of local variables to be passed as function parameters).

We give another implementation of the factorial function as follows that involves passing the addresses of local variables as function parameters:

fun loop {x,res:addr}
  (pf_x: !int @ x, pf_res: !int @ res | p_x: ptr x, p_res: ptr res): void =
  if !p_x > 0 then begin
    !p_res := !p_x * !p_res; !p_x := !p_x - 1; loop (pf_x, pf_res | p_x, p_res)
  end // end of [loop]
// end of [loop]

fn fact_var2 (x: int): int = let
  var x: int = x; var res: int = 1
in
  loop (view@ x, view@ res | &x, &res); res
end // end of [fact_var2]

Each variable identifier is assoicated with two pieces of properties: its address L, which is referred to by the same identifier, and a proof of VT@L, where VT is the viewtype of the content stored at L. For instance, in the implementation of fact_var2, we use view@ x for the proof associated with the variable x and &x for the address of x.

For each variable that is declared of viewtype VT, it is required that a proof of VT?@L is available at the end of the legal scope of the variable, where L is the address of the variable. This requirement guarantees that a local variable, while its address can be taken out of its scope, can never be accessed out of its scope due to the unavailability of a proof needed for accessing the address.

The code used for illustration is available here.

Call-By-Reference

The feature of call-by-reference in ATS is similar to the corresponding one in C++. What is special in ATS is the way in which this feature is handled by the type system. In general, if f is given a type of the following form for some viewtypes VT1 and VT2:

(..., &VT1 >> VT2, ...) -> ...

then a function call f(..., x, ...) on some variable x of the viewtype VT1 is to change the viewtype of x into VT2 upon its return. In the case where VT1 and VT2 are the same, &VT1 >> VT2 can simply be written as &VT1. Note that the variable x may be replaced with other forms of left-values.

As an example, an implementation of the factorial function is given as follows that makes use of call-by-reference:


fun fact (x: int): int = let
  fun loop {l:addr} (x: int, res: &int): void =
    if x > 0 then (res := res * x; loop (x-1, res)) else ()
  var res: int = 1
in
  loop (x, res); res
end // end of [fact]

Note that if the line for introducing the variable res in the implementation is replaced with the following one:

  val res: int = 1 // [res] is now a value, not a variable!

then a type error should occur as res is no longer a left-value when it is passed as an argument to loop.

The code used for illustration is available here.

Pointers

A significant achievement of ATS lies in its support for safe and flexible use of pointers. This is done in a programming paradigm that is often referred to as programming with theorem proving.

Generally speaking, stateful views are linear propositions for describing memory layouts. For instance, given a type T and a memory location L, T@L is a primitive (stateful) view stating that a value of type T is stored at L. We can also form compound views in terms of primitive views. For instance, given types T_1 and T_2 and an address L, we can form a view (T_1@L, T_2@L+sizeof(T_1)) to mean that a value of type T_1 and another value of type T_2 are stored at addresses L and L+sizeof(T_1), respectively, where sizeof(T_1) is the size of a value of type T. Given a term of some view V, we often say that the term proves the view V and thus refer to the term as a proof (of V).

There are two built-in functions ptr_get_t and ptr_set_t that are given the following types:


fun{a:t@ype} ptr_get_t : {l:addr} (a @ l >> a @ l | ptr l) -<> a
fun{a:t@ype} ptr_set_t : {l:addr} (a? @ l >> a @ l | ptr l, a) -<> void

These two functions are used to read from and write to a given pointer. Clearly, the type of ptr_get_t indicates that ptr_get_t requires a proof of view T@L for some type T when reading from a pointer L. This requirement disallows reading from a dangling pointer as such a proof cannot be found for any dangling pointers. Similarly, the type of ptr_set_t means that writing to a dangling pointer is also disallowed. When reading from a pointer L with a proof of view T@L, ptr_get_t consumes the proof and then generates a proof of the same view (and stores it in the same variable where the orginal proof was stored). On the other hand, when writing a value of type T to a pointer with a proof of view T?@L, where the type T? is for possibly uninitialized values of type T, ptr_set_t consumes the proof and then generates a proof of the view T@L, which simply attests to the fact that a value of type T is stored at L after the writing is done.

As an example, we implement a (template) function swap1 as follows that swaps the contents stored at two memory locations:

fn{a:viewt@ype} swap1 {l1,l2:addr}
  (pf1: !a @ l1, pf2: !a @ l2 | p1: ptr l1, p2: ptr l2): void =
  let
    val tmp = ptr_get_vt<a> (pf1 | p1)
  in
    ptr_set_vt<a> (pf1 | p1, ptr_get_vt (pf2 | p2));
    ptr_set_vt<a> (pf2 | p2, tmp)
  end

Note that the (linear) proofs are manipulated explicitly in the implementation of swap1. This can be burdensome in practice. In ATS/Anairiats, (certain) proof manipulation can also be done implicitly by the typechecker. For instance, the following code implements a (template) function swap2 that is equivalent to swap1:

fn{a:viewt@ype} swap2 {l1,l2:addr}
  (pf1: !a @ l1, pf2: !a @ l2 | p1: ptr l1, p2: ptr l2): void =
  let
    val tmp = !p1
  in
    !p1 := !p2; !p2 := tmp
  end

There is no explicit manipulation of (linear) proofs in the implementation of swap2.

The code used for illustration is available here.

References

A reference is essentially a persistent array of size 1. Given a viewtype VT, the type for references to values of viewtype VT is ref(VT). The type constructor ref is abstract in ATS, though it can be defined as follows:

typedef ref (a: viewt@ype) = [l:addr] (vbox (a@l) | ptr l)

The interfaces for various functions on references can be found in the file prelude/SATS/reference.sats.

Reference Creation There is a function template ref_make_elt of the following type:

fun{a:viewt@ype} ref_make_elt : a -<> ref a

Given a value v of some viewtype VT, ref_make_elt<VT> (v) creates a reference of type ref(VT). For instance, the following code creates some references:

val r_int = ref_make_elt<int> (0)
val r_double = ref_make_elt<double> (0.0)
val r_fun = ref_make_elt<int->int> (lam x => x+1)

There is also a function ref_make_view_ptr of the following type for turning pointers into references:

fun ref_make_view_ptr : {a:viewt@ype} {l:addr} (vbox(a @ l) | ptr l) -> ref a

Note that ref_make_view_ptr is a polymorphic function; it is not a function template.

Reference Read and Write Given a reference r, the syntax for reading from r is !r and the syntax for writing (a value v) to r is !r := v. Note that in both SML and Objective Caml, the syntax for writing (a value v) to a reference r is r := v (instead of !r := v).

There are also two function templates ref_get_elt and ref_set_elt of the following types:

fun{a:t@type} ref_get_elt : ref a -<!ref> a
fun{a:t@type} ref_set_elt : (ref a, a) -<!ref> void

which can used to read from and write to a given reference, respectively. As an example, we give a simple implementation of counters as follows:

// [int64] is the type for 64-bit integers in ATS
typedef counter = '{ // the type for counter objects
  get= () -<cloref1> int64 // get the value of the counter
, set= int64 -<cloref1> void // set the value of the counter
, inc= () -<cloref1> void // increase the value of the counter
, dec= () -<cloref1> void // decrease the value of the counter
}

fn counter_make (): counter = let
  // [int64_of_int] coerce an integer into a 64-bit integer
  val r = ref_make_elt<int64> (int64_of_int 0)
in '{ // record creation
  get= lam () => !r // read from [r]
, set= lam (x) => !r := x // write to [r]
, inc= lam () => !r := succ !r
, dec= lam () => !r := pred !r
} end // end of [counter_make]

The code used for illustration is available here.

Persistent Arrays and Matrices

Though arrays are widely used in practice, there are many difficult issues involving arrays that seem to have not been adequately addressed in type theory. In particular, programming can become rather tricky when arrays are used to store resources (e.g., pointers to allocated memory).

In ATS, there are two forms of arrays: linear arrays and persistent arrays. On one hand, a linear array cannot be shared but a persistent array can. On the other hand, a linear array can be explicitly freed but a persistent one cannot. However, a persistent array may be freed by GC. Although persistent arrays are built in terms of linear arrays, it seems easier to explain persistent arrays as such arrays are available in probably any programming language that may be considered practical.

Persistent Arrays

The interfaces for various functions on persistent arrays can be found in the file prelude/SATS/array.sats. Given a viewtype VT and an integer I, the type array(VT, I) is for persistent arrays of size I in which each element is of viewtype VT. Internally, a value of the type array(VT, I) is just a pointer to a piece of consecutive memory holding I elements of the type VT. Note that there is no size information directly attached to such an array, that is, no function of the following type is available:

fun{a:viewt@ype} array_size {n:nat} (A: array (a, n)): int n

If the size of such an array is needed, it may be stored somewhere explicitly by the programmer.

Array Creation There are several approaches to creating persistent arrays in ATS. For instance, the following code in ATS creates an array of integers:

val digits = array $arrsz{int}(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

The type assigned to the variable digits is array(int, 10). We can certainly be more precise by stating that digits is indeed an array of digits:

typedef digit = [i:nat | i < 10] int (i)

// digits: array (digit, 10)
val digits = array $arrsz{digit}(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Array Subscripting The following two function templates array_get_elt_at and array_set_elt_at are for accessing and updating a persistent array, respectively:

fun{a:t@ype} array_get_elt_at {n:nat} (A: array (a, n), i: sizeLt n):<!ref> a
fun{a:t@ype} array_set_elt_at {n:nat} (A: array (a, n), i: sizeLt n, x: a):<!ref> void

The symbol [] is overloaded with both array_get_elt_at and array_set_elt_at:

overload [] with array_get_elt_at
overload [] with array_set_elt_at

As an example, we implement as follows a function that squares each element in a given array of doubles:

fn array_square {n:nat}
  (A: array (double, n), sz: int n): void = loop (0) where {
  fun loop {i:nat | i <= n} .<n-i>. (i: int i):<cloptr1> void =
    if i < sz then
      let val x = A[i] in A[i] := x * x; loop (i+1) end
    else ()
} // end of [array_square]

In this code, A[i] and A[i] := x * x are shorthands for array_get_elt_at(A, i) and array_set_elt_at(A, i, x * x), respectively.

Persistent Matrices

A matrix is an array of 2 dimensions. Note that the 2 dimensions of a matrix are not necessarily the same. If they are the same, then the matrix is often referred to as a square matrix.

The interfaces for various functions on matrices can be found in the file prelude/SATS/matrix.sats . Given a viewtype VT and an integer I and another integer J, the type matrix(VT, I, J) is for persistent matrices of dimensions I (row) and J (column) in which each element is of type VT. Internally, a value of the type matrix(VT, I, J) is just a pointer to a piece of consecutive memory holding I*J elements of the type VT, where the elements are placed in the row-major format.

Matrix Creation The approaches to creating matrices resemble those to creating arrays. For instance, the following code creates a square matrix of dimension 10 by 10:

val mat_10_10 =
  matrix (10, 10) $arrsz {Int} (
   0,  1,  2,  3,  4,  5,  6,  7,  8,  9
, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29
, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39
, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49
, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59
, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69
, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79
, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89
, 90, 91, 92, 93, 94, 99, 96, 97, 98, 99
) // end of [val]

Matrix Subscripting The following two function templates matrix_get_elt_at and matrix_set_elt_at are for accessing and updating a persistent matrix, respectively:

fun{a:t@ype} matrix_get_elt_at {m,n:nat} (A: matrix (a,m,n), i: sizeLt m, n: int n, j: sizeLt n): <!ref> a
fun{a:t@ype} matrix_set_elt_at {m,n:nat} (A: matrix (a,m,n), i: sizeLt m, n: int n, j: sizeLt n, x: a): <!ref> void

Note that the number of columns in a matrix as well as the row and column indices are needed in order to perform matrix subscripting. As an example, the following code implements a function template that transposes a given square matrix in-situ:

fn{a:t@ype} matrix_transpose {n:nat}
  (A: matrix (a, n, n), n: int n): void = let
  fn get {i,j:nat | i < n; j < n}
   (A: matrix (a, n, n), i: int i, j: int j):<cloref1> a =
   matrix_get_elt_at__intsz (A, i, n, j)

  fn set {i,j:nat | i < n; j < n}
   (A: matrix (a, n, n), i: int i, j: int j, x: a):<cloref1> void =
   matrix_set_elt_at__intsz (A, i, n, j, x)

  overload [] with get; overload [] with set

  // [fn*] indicates a request for tail-call optimization
  fn* loop1 {i:nat | i <= n} .< n-i+1, 0 >.
    (i: int i):<cloptr1> void = begin
    if i < n then loop2 (i, 0) else ()
  end

  and loop2 {i,j:nat | i < n; j <= i} .< n-i, i-j+1 >.
    (i: int i, j: int j):<cloptr1> void = begin
    if j < i then
      let val x = A[i,j] in A[i,j] := A[j,i]; A[j,i] := x; loop2 (i, j+1) end
    else begin
      loop1 (i+1)
    end
  end // end of [loop2]
in
  loop1 0
end // end of [matrix_transpose]

fn{a:t@ype} matrix_transpose {n:nat}
  (M: matrix (a, n, n), n: size_t n): void = let
  prval pf = unit_v ()
  val () = matrix_iforeach_clo<a> {unit_v}
    (pf | M, !p_f, n, n) where { 
    var !p_f = @lam
      (pf: !unit_v | i: sizeLt n, j: sizeLt n, _: a): void =<clo,!ref>
      if i > j then let
        val x = M[i, n, j] in M[i, n, j] := M[j, n, i]; M[j, n, i] := x
      end // end of [if]
    // end of [var]
  } // end of [val]
  prval unit_v () = pf
in
  // empty
end // end of [matrix_transpose]

The code used for illustration is available here.

Linear Arrays

Given a viewtype VT and integer I, the special static term @[VT][I] is a viewtype for an array of elements of the type VT (and thus the size of @[VT][I] is I times the size of VT). If VT happens to be a type (instead of viewtype), then @[VT][I] is also a type. If a linear proof of the view (@[VT][I]) @ L is available, then we say that a linear array of size I is stored at the location L in which each element is of the type VT. A view of the form @[VT][I] @ L is often referred to as an array view.

The interfaces for various functions on linear arrays can be found in the file prelude/SATS/array.sats. In particular, we have the following functions for manipulating array views:

viewdef array_v (a:viewt@ype, n:int, l: addr) = @[a][n] @ l

praxi array_v_nil :
  {a:viewt@ype} {l:addr} () -<prf> array_v (a, 0, l)

praxi array_v_unnil :
  {a:viewt@ype} {l:addr} array_v (a, 0, l) -<prf> void

praxi array_v_cons : {a:viewt@ype} {n:nat} {l:addr}
  (a @ l, array_v (a, n, l+sizeof a)) -<prf> array_v (a, n+1, l)

praxi array_v_uncons : {a:viewt@ype} {n:int | n > 0} {l:addr}
  array_v (a, n, l) -<prf> (a @ l, array_v (a, n-1, l+sizeof a))

Linear Array Creation

The following function template array_ptr_alloc can be called to allocate memory for storing an array of elements:

fun{a:viewt@ype}
  array_ptr_alloc {n:nat} (asz: int n):<>
    [l:addr | l <> null] (free_gc_v (a, n, l), array_v (a?, n, l) | ptr l)

The view constructor free_gc_v is abstract, and a proof of the view free_gc_v (a, n, l) can be thought of as a certificate that must be provided when the allocated array is to be freed. The array view array_v (a?, n, l) is for an uninitialized array of size n in which each element should be of the type a if initialized.

Linear Array Initialization

After an uninitialized array is created, it needs to be initialized for use. There are various functions for initializing linear arrays in prelude/SATS/array.sats. As an example, we implement as follows a function that initializes an array with a list of elements:

fn{a:t@ype}
  array_ptr_initialize_list {n:nat}
  (A: &(@[a?][n]) >> @[a][n], xs: list (a, n))
  :<> void = loop (view@ A | &A, xs) where {
  fun loop {n: nat} {l:addr} .<n>. ( // [loop] is tail-recursive!
      pf_arr: !array_v (a?, n, l) >> array_v (a, n, l) | p_arr: ptr l, xs: list (a, n)
    ) :<> void = case+ xs of
    | list_cons (x, xs) => let
        prval (pf1_at, pf2_arr) = array_v_uncons {a?} (pf_arr)
        val () = !p_arr := x
        val () = loop (pf2_arr | p_arr + sizeof<a>, xs)
      in
        pf_arr := array_v_cons {a} (pf1_at, pf2_arr)
      end // end of [list_cons]
    | list_nil () => let
        prval () = array_v_unnil {a?} (pf_arr) in pf_arr := array_v_nil {a} ()
      end // end of [list_nil]
  // end of [loop]
} // end of [array_ptr_initialize_list]

Note that the first argument of the function array_ptr_initialize_list is passed as a reference (call-by-reference). We use the expression view@ A for some particular left-value in which a proof of the view associated with the variable A is stored, and the expression &A for the address of the variable A.

Linear Array Subscription

Let A be an array of the type @[T][I] for some type (but not viewtype) T and integer I. For each natural number less than I, the expression A.[i], which can be used as a left-value, refers to the ith elemement of the array (where counting starts from 0). A function template array_ptr_swap is implemented as follows for swapping two elements in a given array:

fn{a:t@ype} array_ptr_swap
  {n:nat} {i,j:nat | i < n; j < n}
  (A: &(@[a][n]), i: int i, j: int j):<> void = begin
  let val tmp = A.[i] in A.[i] := A.[j]; A.[j] := tmp end
end // endof [array_ptr_swap]

The following two function templates are also available:

fun{a:t@ype} array_ptr_get_elt_at
  {n:nat} (A: &(@[a][n]), i: natLt n):<> a

fun{a:t@ype} array_ptr_set_elt_at
  {n:nat} (A: &(@[a][n]), i: natLt n, x:a):<> void

overload [] with array_ptr_get_elt_at
overload [] with array_ptr_set_elt_at

Because of the overloading of [] with both array_ptr_get_elt_at and array_ptr_set_elt_at, the function array_ptr_swap can also be implemented as follows:

fn{a:t@ype} array_ptr_swap
  {n:nat} {i,j:nat | i < n; j < n}
  (A: &(@[a][n]), i: int i, j: int j):<> void = begin
  let val tmp = A[i] in A[i] := A[j]; A[j] := tmp end
end // endof [array_ptr_swap]

The use of [i] for .[i] possibly makes the code look more conventional.

For accessing an array of linear elements, the following function template array_ptr_takeout can be used:

fun{a:viewt@ype}
  array_ptr_takeout {n,i:nat | i < n} {l0:addr} (
    pf: array_v (a, n, l0) | base: ptr l0, offset: int i
  ) :<> [l:addr] (
      a @ l
    , a @ l -<lin,prf> array_v (a, n, l0)
    | ptr l
    )
// end of [array_ptr_takeout]

Linear Array Destruction

The following function is for freeing the memory occupied by a linear array:

fun array_ptr_free {a:viewt@ype} {n:int} {l:addr}
  (pf_gc: free_gc_v (a, n, l), pf_arr: array_v (a?, n, l) | _: ptr l):<> void

Note that in order to free an array of I elements allocated at an address L, a proof of the view free_gc_v (VT, I, L) must be provided, where the viewtype VT is employed for specifying the size of each element in the array.

An Example

Linear arrays are used extensively in this implementation of FFT: fft.dats

The code used for illustration is available here.

Strings and String Buffers

The representation of a string in ATS is exactly the same as in C, allowing most functions declared in the header file <string.h> to be imported into ATS directly. A variety of string functions are declared in prelude/SATS/string.sats.

Given an integer I, a string of length I is a pointer to some memory location where a seqence of bytes are stored: the first I bytes in this sequence are non-null bytes, and the next one is the null byte, and then the rest of bytes are unspecified. We use string both as a type and a type constructor of the sort int -> type. The type string is for strings of unspecified length while the type string (I) is for strings whose length equals I, and the following casting function is relating the former to the latter:

castfn string1_of_string (s: string): [n:nat] string n

Common Functions for Processing Strings

The following function string_is_at_end is of great use for processing strings sequentially:

fun string_is_at_end
  {n,i:nat | i <= n} (s: string n, i: int i):<> bool (i == n)
  = "atspre_string_is_at_end"

Given a string of length n and a natural number i bounded by n, the function string_is_at_end tests whether i equals n. Essentially, it checks whether the ith character in the string, where counting starts from 0, is the null byte.

As an example, the following implementation of the length function for strings makes use of the function string_is_at_end:

fn string_length {n:nat}
  (s: string n):<> size_t n = loop (s, 0) where {
  fun loop {i:nat | i <= n} .<n-i>.
    (s: string n, i: size_t i):<> size_t n =
    if string_is_at_end (s, i) then i else loop (s, i+1)
  // end of [loop]
} // end of [string_length]

The following two functions string_get_char_at and string_set_char_at are for accessing and updating charaters stored in a string:

#define NUL '\000'

fun string_get_char_at {n:nat}
  (s: string n, i: natLt n):<> [c:char | c <> NUL] char c

fun string_set_char_at {n:nat} {c:char | c <> NUL}
  (s: string n, i: natLt n, c: char c):<!ref> void

overload [] with string_get_char_at
overload [] with string_set_char_at

Note that only non-null characters can be stored in a string.

String Buffers

A string buffer in ATS is just a linear string. Given two integers M and N satisfying M <= N, the type strbuf (M, N) is for a sequence of M bytes such that the first N bytes in this sequence are not null and the next one (following the first N bytes) is null. The following declared functions for string buffers are just the variant of the previously introduced functions for strings:

fun strbuf_is_at_end
  {m,n,i:nat | i <= n} (sb: &strbuf (m, n), i: int i):<> bool (i == n)

fun strbuf_get_char_at {m,n:nat}
  (sb: &strbuf (m, n), i: natLt n):<> [c:char | c <> NUL] char c

fun strbuf_set_char_at {m,n:nat} {c: char | c <> NUL}
  (sb: &strbuf (m, n), i: natLt n, c: char c):<> void

As an example, the following program implements a function that turns each lowercase letter in a string buffer into the corresponding uppercase one:

fn strbuf_toupper {m,n:nat}
  (s: &strbuf (m, n)):<> void = loop (s, 0) where {
  extern fun toupper (c: c1har):<> c1har = "atspre_char_toupper"
  fun loop {i:nat | i <= n} .<n-i>. (s: &strbuf (m, n), i: size_t i):<> void =
    if strbuf_is_at_end (s, i) then ()
    else let
      val () = s[i] := toupper (s[i]) in loop (s, i+1)
    end // end of [if]
} // end of [strbuf_toupper]

Note that c1har is just a shorthand for [c:char | c <> NUL] char c, which is a type for all non-null characters. Let us see another example. In C, the length of a string can be computed as follows:

int strlen (char *s) {
  char *p = s ;
  while (*p != '\000') ++p ;
  return (p - s) ;
}

We give as follows an implementation for computing the length of a string buffer that corresponds to the above C code:

extern fun strlen {m,n:nat} (s: &strbuf (m, n)):<> size_t n

implement strlen {m,n} (s) = let
  stadef bsz = sizeof(byte)
  macdef bsz = sizeof<byte>
  fun loop {m,n:nat} {l:addr} {ofs:int} .<m>. (
      pf: !strbuf (m, n) @ l
    , pf_mul: MUL (n, bsz, ofs)
    | p: ptr l
    ) :<> ptr (l + ofs) = let
    prval (pf1, pf2) = strbuf_v_uncons (pf)
    val c = !p
  in
    if (c = NUL) then let
      prval strbufopt_v_none pf2 = pf2
      prval MULbas () = pf_mul
    in
      pf := strbuf_v_null (pf1, pf2); p
    end else let
      prval () = eqsize_byte_char ()
      prval strbufopt_v_some pf2 = pf2
      prval pf1_mul = mul_add_const {~1} (pf_mul)
      val p_end = loop (pf2, pf1_mul | p+bsz)
    in
      pf := strbuf_v_cons (pf1, pf2); p_end
    end // end of [if]
  end // end of [loop]
  val p_beg = &s
  prval pf_mul = mul_istot {n,bsz} ()
  val p_end = loop (view@ s, pf_mul | p_beg)
  prval () = eqsize_byte_one () where {
    extern praxi eqsize_byte_one (): [bsz == 1] void
  } // end of [val]
  prval () = mul_elim {n,1} (pf_mul)
in
  size1_of_ptrdiff1 (p_end - p_beg)
end // end of [strlen]

Please see prelude/SATS/string.sats for various dataview constructors and proof functions in this implementation. In general, it can be quite involved in ATS to handle strings in a style that is often employed in C code.

String Creation

There are various function for creating strings that are delcared in prelude/SATS/string.sats. For instance, the following function string_make_substring creates a string by copying a segment of another string:

fun string_make_substring
  {n:int} {st,ln:nat | st + ln <= n}
  (s: string n, st: int st, ln: int ln):<> string ln

In order to implement a string creation function, we often need a function such as the following one for initializing a string buffer:

fun strbuf_initialize_cloptr {m,n:nat | m > n} (
    pf_buf: !b0ytes m @ l >> strbuf (m, n) @ l
  | p: ptr l, n: int n, f: !natLt n - c1har
  ) :<> void

An implementation of strbuf_initialize_cloptr is given below:

implement strbuf_initialize_cloptr {m,n}
  (pf_buf | p_buf, n, f) = loop (pf_buf | p_buf, f, 0) where {
  fun loop {i:nat | i <= n} {l:addr} .<n-i>. (
      pf: !b0ytes (m-i) @ l >> strbuf (m-i, n-i) @ l
    | p: ptr l, f: !sizeLt n -<cloptr> c1har, i: size_t i)
    :<cloref> void = let
    prval () = eqsize_byte_char ()
    prval (pf1, pf2) = array_v_uncons {byte?} (pf)
    prval pf1 = char_v_of_b0yte_v (pf1)
  in
    if i < n then let
      val () = !p := f (i)
      val () = loop (pf2 | p + sizeof<byte>, f, i + 1)
      prval () = pf := strbuf_v_cons (pf1, pf2)
    in
      // empty
    end else let
      val () = !p := NUL
      prval () = pf := strbuf_v_null (pf1, pf2)
    in
      // empty
    end // end of [if]
  end // end of [loop]
} // end of [val]

With strbuf_initialize_cloptr, the string creation function string_make_substring can be readily implemented as follows:

implement string_make_substring (str, st, ln) = let
  val (pf_gc, pf_buf | p_buf) = malloc_gc (ln + 1)
  prval () = free_gc_elim (pf_gc) // give the certificate to GC
  val () = strbuf_initialize_cloptr (pf_buf | p_buf, ln, lam (i) => str[st + i])
in
  string1_of_strbuf (pf_buf | p_buf)
end // end of [string_make_substring]

Note that a linear closure (cloptr) representing lam (i) => str[st + i] is formed for each call to strbuf_initialize_cloptr and this closure is freed once the call returns. The string created by a call to string_make_substring is persistent, and it can only be freed by GC.

The code used for illustration is available here.

Datatypes

The feature of datatypes in ATS is directly adopted from ML. In addition to what is available in ML, we can also form in ATS dependent datatypes and guarded recursive datatypes (GRDTs), also known as generalized algebraic datatypes (GADTs). There is also a form of linear datatypes (dataviewtypes) available in ATS, which is to be explained elsewhere.

A Simple Datatype

In the first example given below, a datatype type intlst0 is declared to represent lists of integers and a function length_intlst0 for computing the length of a given list is implemented:

datatype intlst0 = // simple datatype
  | INTLST0nil // the first bar (|) is optional
  | INTLST0cons of (int, intlst0)

fun length_intlst0 (xs: intlst0): int =
  case+ xs of // [case+] demands exhaustive pattern matching
  // the bar (|) in the first clause is optional
  | INTLST0cons (_, xs) => 1 + length_intlst0 (xs)
  | INTLST0nil () => 0
// end of [length_intlst0]

Please note that any valid identifiers for variables can also be used as identifiers for constructors. The pattern for a constructor C with no arguments is C( ). If we write C instead of C( ), then C is assumed to be a variable pattern, which matches any value.

A Dependent Datatype

In ATS, we can form a dependent datatype intlst1 as follows to represent integer lists:

datatype intlst1 (int) = // dependent datatype
  | INTLST1nil (0) // the first bar (|) is optional
  | {n:nat} INTLST1cons (n+1) of (int, intlst1 n)
// end of [intlst1]

We may see intlst1 as a refinement of intlst0. Given an integer n, intlst1(n) is a type for integer lists of length n. The syntax for declaring intlst1 introduces two constructors of the following types:

INTLST1nil : intlst1 0
INTLST1cons : {n:nat} (int, intlst1 n) -> intlst (n+1)

So INTLST1nil is a list of length 0 and INTLST1cons constructs a list of length n+1 if given an integer and an integer list of length n.

The function for computing the length of a given integer list can now be implemented as follows:

fun length_intlst1 {n:nat} (xs: intlst1 n): int n =
  case+ xs of // the bar (|) in the first clause is optional
  // the bar (|) in the first clause is optional
  | INTLST1cons (_, xs) => 1 + length_intlst1 (xs)
  | INTLST1nil () => 0
// end of [length_intlst1]

Note that length_intlst1 is assigned the following type:

{n:nat} intlst1 n -> int n

which clearly indicates that the returned value of a call to length_intlst1 is the length of the argument of the call.

As another example, the following code implements the list subscripting function:

// the index is a natural number less than the size of the indexed length
fun nth_intlst1 {n,i:int | 0 <= i; i < n} (xs: intlst1 n, i: int i): int =
  // [val+] demands exhaustive pattern matching
  let val+ INTLST1cons (x, xs) = xs in
    if i > 0 then nth_intlst1 (xs, i-1) else x
  end // end of [let]
// end of [nth_intlst1]

Note that the pattern matching involved in the value declaration val+ INTLST1cons (x, xs) = xs is guaranteed to be exhaustive as n is positive. As a consequence, there is no tag checking in the C code generated from the compilation of nth_intlst. This simple example demonstrates that safety can actually enhance efficiency.

A Polymorphic Datatype

We can declare a datatype list as follows for representing polymorphic lists:

// [t@ype] is a sort for types of unrestricted size
datatype list (a:t@ype+, int) = // polymorphic datatype
  | nil (a, 0)
  | {n:nat} cons (a, n+1) of (a, list (a, n))

The syntax used for declaring list introduces two constructors of the following types:

nil : {a:t@ype} list (a, 0)
cons : {a:t@ype} {n:nat} (a, list (a, n)) -> list (a, n+1)

We use a:t@ype+ (instead of a:t@ype) to indicate that the type constructor list is covariant at this argument, that is, list (T1, n) is considered to be a subtype of list (T2, n) whenever T1 is a subtype of T2.

The function for appending two polymorphic lists can be implemented as follows:

fun{a:t@ype}
append_list {m,n:nat}
  (xs: list (a, m), ys: list (a, n)): list (a, m+n) =
  case+ xs of
  | cons (x, xs) => cons (x, append_list (xs, ys))
  | nil () => ys
// end of [append_list]

This function definition is a template. It can be typechecked but cannot be compiled until the type variable a is instantiated with a type of known size. The feature of templates in ATS is to be explained elsewhere.

As another example, the following code implements a (template) function that zips together two given lists of the same length:

fun{a1,a2:t@ype}
zip_list {n:nat}
  (xs1: list (a1, n), xs2: list (a2, n)): list ('(a1, a2), n) =
  case+ (xs1, xs2) of
  | (cons (x1, xs1), cons (x2, xs2)) => cons ('(x1, x2), zip_list (xs1, xs2))
  | (nil (), nil ()) => nil ()
  | (_, _) =/=>> () // none of these cases can occur
// end of [zip_list]

Note that the last clause in the definition of zip_list is presented only for the purpose of illustration, and it can simply be omitted.

The code used for illustration is available here.

Dataprops

A prop is similar to a type. If a prop is assigned to a dynamic term, then the term is guaranteed to be pure and total, that is, the evaluation of the term generates no effects and always terminates. We refer to dynamic terms classified by props as proof terms, or simply proofs. Dataprops are often declared for encoding recursively defined relations. For instance, multiplication on integers can be defined as follows in terms of integer addition:


0 n*	=	0
(m+1) n*	=	m n + n*	; if m >= 0
m n*	=	-(-m n)*	; if m < 0

Let MUL(m, n, p) be a relation on integers such that MUL(m, n, p) holds if and only if m * n = p. Then this relation can be encoded by the followng dataprop:

dataprop MUL (int, int, int) =
  | {n:int} MULbas (0, n, 0)
  | {m,n,p:int | m >= 0} MULind (m+1, n, p+n) of MUL (m, n, p)
  | {m,n,p:int | m > 0} MULneg (~m, n, ~p) of MUL (m, n, p)
// end of [MUL]

Proposition (Totality) For each pair of integers m and n, there exists an integer p such that m * n = p holds.

This proposition can be encoded as the following prop in ATS:

{m,n:int} () -< prf > [p:int] MUL (m, n, p)

which is assigned to the function MULprop_total defined below:

prfun MULprop_total
  {m,n:int}
  .< max(2*m,~2*m+1) >.
  (): [p:int] MUL (m, n, p) =
  sif m > 0 then MULind (MULprop_total {m-1,n} ())
  else sif m < 0 then MULneg (MULprop_total {~m,n} ())
  else MULbas ()
// end of [MULprop_total]

Note that we use sif for constructing a static conditional in which the condition is a static proposition, that is, a static term of sort bool. The conditions for the two occurrences of sif are m > 0 and m < 0, which are both static propositions.

Given that the prop {m,n:int} () -< prf > [p:int] MUL (m, n, p) is inhabited, we know that the proposition it encodes must hold.

Proposition (Uniqueness) Given integers m,n,p1,p2 such that m * n = p1 and m * n = p2, then p1 = p2 holds.

This proposition can be encoded as the following prop in ATS:

{m,n,p1,p2:int} (MUL (m, n, p1), MUL (m, n, p2)) -< prf > [p1 == p2] void

which is assigned to the function MULprop_unique defined below:

prfun MULprop_unique
  {m,n,p1,p2:int} .< max(2*m, ~2*m+1) >.
  (pf1: MUL (m, n, p1), pf2: MUL (m, n, p2)): [p1 == p2] void =
  case+ (pf1, pf2) of
    | (MULbas (), MULbas ()) => ()
    | (MULind pf1, MULind pf2) => MULprop_unique (pf1, pf2)
    | (MULneg pf1, MULneg pf2) => MULprop_unique (pf1, pf2)
// end of [MULprop_unique]

Therefore, the prop is inhabited and thus the proposition it encodes must hold.

The distributivity (over addition), commutativity and associativity of multiplication can all be esstablished similarly.

The code used for illustration is available here.

Dataviews

A view is a linear prop, where the meaning of the word linear is in the sense of linear logic. Dataviews are often declared to encode recursively defined relations on linear resources. As an example, we declare a dataview as follows to describe and then reason about the probably most common data structure in use: arrays.

dataview array_v (a: viewt@ype+, int, addr) =
  | {l:addr} array_v_none (a, 0, l)
  | {n:nat} {l:addr} array_v_some (a, n+1, l) of (a @ l, array_v (a, n, l+sizeof a))

The syntax introduces a view constructor array_v and two proof constructors array_v_none and array_v_some associated with array_v, which are assigned the following props:

array_v_none : {a:viewt@ype} {l:addr} () -> array_v (a, 0, l)
array_v_some : {a:viewt@ype} {n:nat} {l:addr} (a @ l, array_v (a, n, l+sizeof a)) -> array_v_some (a, n+1, l)

Given a viewtype VT, an integer I and an address L, the view array_v(VT, I, L) indicates that an array of size I is located at the address L and each element in the array is of viewtype VT.

The prop assigned to array_v_none indicates that we can assume an array of size 0 located at any given address, while the prop assigned to array_v_some states that an array of I+1 elements of viewtype VT is located at some address L if an element of viewtype VT is located at L and also an array of I elements of viewtype VT is located at L+sizeof(VT), where sizeof(VT) is the size of VT, or more precisely, the size of an element of viewtype VT.

Let us establish some propoerties on arrays. The following proof function array_v_split shows that an array of size N can be decomposed into two adjacently located arrays of sizes I and N - I, respectively, where I can be any natural number satisfying I <= N.

(*
 * array_v_split
 *   {a:viewt@ype} {n,i:nat | i <= n} {l:addr} {off: int} .<i>.
 *   (pf_arr: array_v (a, n, l), pf_mul: MUL (i, sizeof a, off))
 *   : (array_v (a, i, l), array_v (a, n-i, l+off))
 *)

prfun array_v_split
  {a:viewt@ype} {n,i:nat | i <= n} {l:addr} {off: int} .<i>.
  (pf_arr: array_v (a, n, l), pf_mul: MUL (i, sizeof a, off))
  : (array_v (a, i, l), array_v (a, n-i, l+off)) =
  sif i > 0 then let
    prval array_v_some (pf_at, pf_arr) = pf_arr
    prval MULind pf_mul = pf_mul
    prval (pf_arr1, pf_arr2) = array_v_split (pf_arr, pf_mul)
  in
    (array_v_some (pf_at, pf_arr1), pf_arr2)
  end else let
    prval MULbas () = pf_mul
  in
    (array_v_none (), pf_arr)
  end

The next proof function array_v_unsplit is the inverse of array_v_split: It states that two adjacently located arrays of sizes N_1 and N_2 can be combined into a single array of sizes N_1+N_2.

(*
 * array_v_unsplit
 *   {a:viewt@ype} {n1,n2:nat} {l:addr} {off: int} .<n1>.
 *   (pf_arr1: array_v (a, n1, l),
 *    pf_arr2: array_v (a, n2, l+off),
 *    pf_mul: MUL (n1, sizeof a, off))
 *   : array_v (a, n1+n2, l)
 *)

prfun array_v_unsplit
  {a:viewt@ype} {n1,n2:nat} {l:addr} {off: int} .<n1>.
  (pf_arr1: array_v (a, n1, l), pf_arr2: array_v (a, n2, l+off), pf_mul: MUL (n1, sizeof a, off))
  : array_v (a, n1+n2, l) =
  sif n1 > 0 then let
    prval array_v_some (pf_at, pf_arr1) = pf_arr1
    prval MULind pf_mul = pf_mul
    prval pf_arr = array_v_unsplit (pf_arr1, pf_arr2, pf_mul)
  in
    array_v_some (pf_at, pf_arr)
  end else let
    prval array_v_none () = pf_arr1
    prval MULbas () = pf_mul
  in
    pf_arr2
  end

The following proof function states that given an array of size N and a natural number I satisfying I < N, the element indexed by I can be taken out of the array, leaving the array with a hole inside.

(*
 * array_v_takeout
 *   {n,i:nat | i < n} {l:addr} {off: int}
 *   (pf_arr: array_v (a, n, l), pf_mul: MUL (i, sizeof a, off))
 *   : (a @ (l+off), a @ (l+off) -<lin> array_v (a, n, l))
 *)

prfun array_v_takeout
  {a:viewt@ype} {n,i:nat | i < n} {l:addr} {off: int} .<i>.
  (pf_arr: array_v (a, n, l), pf_mul: MUL (i, sizeof a, off))
  : (a @ (l+off), a @ (l+off) -<lin> array_v (a, n, l)) =
  sif i > 0 then let
    prval array_v_some (pf_at, pf_arr) = pf_arr
    prval MULind pf_mul = pf_mul
    prval (pf_out, pf_rst) = array_v_takeout (pf_arr, pf_mul)
  in
    (pf_out, llam pf_out => array_v_some (pf_at, pf_rst pf_out))
  end else let
    prval MULbas () = pf_mul
    prval array_v_some (pf_at, pf_arr)  = pf_arr
  in
    (pf_at, llam pf_at => array_v_some (pf_at, pf_arr))
  end

The following code shows how the proof function array_v_takeout can be used to implement functions for accessing and updating arrays.

extern // a template function for read through a pointer
fun{a:t@ype} ptr_get {l:addr} (pf: a @ l | p: ptr l): (a @ l | a)

extern // a template function for write through a pointer
fun{a:t@ype} ptr_set {l:addr} (pf: a @ l | p: ptr l, x: a): (a @ l | void)

fn{a:t@ype} // a template function for array read
  array_ptr_get_at {n,i:nat | i < n} {l:addr}
  (pf_arr: array_v (a, n, l) | p: ptr l, i: size_t i)
  : (array_v (a, n, l) | a) =
  let
    val (pf_mul | off) = mul2_size1_size1 (i, sizeof<a>)
    prval (pf_elt, pf_rst) = array_v_takeout {a} (pf_arr, pf_mul)
    val (pf_elt | x) = ptr_get<a> (pf_elt | p + off)
  in
    (pf_rst pf_elt | x)
  end

fn{a:t@ype} // a template function for array write
  array_ptr_set_at {n,i:nat | i < n} {l:addr}
  (pf_arr: array_v (a, n, l) | p: ptr l, i: size_t i, x: a)
  : (array_v (a, n, l) | void) =
  let
    val (pf_mul | off) = mul2_size1_size1 (i, sizeof<a>)
    prval (pf_elt, pf_rst) = array_v_takeout {a} (pf_arr, pf_mul)
    val (pf_elt | ()) = ptr_set<a> (pf_elt | p + off, x)
  in
    (pf_rst pf_elt | ())
  end

Note that the code here is primarily for the purpose of illustration. In practice, array access and update functions in ATS/Anairiats are implemented as primtives for the sake of efficiency.

The code used for illustration is available here.

Dataviewtypes

The feature of dataviewtype in ATS is an advanced one.

A viewtype is just a linear type. The name viewtype is chosen as a viewtype often consists of a view and a type. A dataviewtype is similar to a datatype, but it is linear. With a dataviewtype, the programmer is allowed to explicitly free (or deallocate) in a safe manner the memory used for storing constructors associated with the dataviewtype. This is particularly important in a situation where garbage collection needs to be significantly reduced or even completely avoided.

An an example, the following dataviewtype list_vt is declared to model singly-linked lists in ATS:

// [list_vt] is declared to model singly-linked lists
dataviewtype list_vt (a:viewt@ype+, int) =
  | list_vt_nil (a, 0)
  | {n:nat} list_vt_cons (a, n+1) of (a, list_vt (a, n))

Given a viewtype VT and an integer I, the viewtype list_vt(VT, I) is for singly-linked lists of length I in which each element is of viewtype VT. Let us define List_vt as follows:

viewtypedef List_vt (a:viewt@ype) = [n:nat] list_vt (a, n)

Then roughly speaking, the viewtype List_vt(VT) corresponds to the following struct type in C:

typedef struct sllist_struct {
  VT head ;
  sllist_struct *tail ;
} *sllist ;

where VT refers to some type in C.

We now assume that nil and cons have been introduced as shorthands for list_vt_nil and list_vt_cons, respectively.

Computing the length of a singly-linked list

The following code implements a function that computes the length of a given linked-list in ATS:

// An implementation of the length function on singly-linked lists
fn{a:viewt@ype}
  list_vt_length {n:nat} (xs0: !list_vt (a, n)): int n = let
  fun loop {i,j:nat} .< i >.
    (xs: !list_vt (a, i), j: int j): int (i+j) =
    case+ xs of
    | cons (_, !p_xs1) => begin
        let val n = loop (!p_xs1, j+1) in fold@ xs; n end
      end
    | nil () => (fold@ xs; j)
in
  loop (xs0, 0)
end // end of [list_vt_length]

The function list_vt_length approximately corresponds to the following function sllist_length implemented in C:

int sllist_length (sllist xs) {
  int i = 0 ;
  while (xs != NULL) { i += 1 ; xs = xs->tail ; }
  return i ;
}

Appending two singly-linked lists

// An implementation of the append function on singly-linked lists
fun{a:viewt@ype} list_vt_append {m,n:nat}
  (xs0: &list_vt (a, m) >> list_vt (a, m+n), ys: list_vt (a, n)): void =
  case+ : (xs0: list_vt (a, m+n)) => xs0 of
  | cons (_, !p_xs) => (list_vt_append (!p_xs, ys); fold@ xs0)
  | ~nil () => (xs0 := ys)
// end of [list_vt_append]

The function list_vt_append approximately corresponds to the following function sllist_append implemented in C:

void sllist_append (sllist *p_xs, sllist ys) {
  sllist xs ;
  xs = *p_xs ;
  while (xs != NULL) { p_xs = &xs->tail ; xs = *p_xs ; }
  *p_xs = ys ;
  return ;
} /* end of [sllist_append] */

Reversing a singly-linked lists

// An implementation of the reverse function on singly-linked lists
fn{a:viewt@ype} list_vt_reverse {n:nat} (xs: &list_vt (a, n)): void = let
  fun revapp {m,n:nat} .< m >.
    (xs: list_vt (a, m), ys: list_vt (a, n)): list_vt (a, m+n) =
    case+ xs of
    | cons (_, !p_xs1) => begin
        let val xs1 = !p_xs1 in !p_xs1 := ys; fold@ xs; revapp (xs1, xs) end
      end
    | ~nil () => ys
in
  xs := revapp (xs, nil ())    
end // end of [list_vt_reverse]

The function list_vt_reverse approximately corresponds to the following function sllist_reverse implemented in C:

void sllist_reverse (sllist *p_xs) {
  sllist xs, ys, *p_xs1, xs1 ;
  xs = !p_xs ; ys = NULL ;
  while (xs != NULL) {
    p_xs1 = &xs->tail ; xs1 = *p_xs1 ; *p_xs1 = ys ; ys = xs ; xs = xs1
  }
  *p_xs = xs ;
  return ;
} /* end of [sllist_reverse] */

Sorting a singly-linked list

The following code implements quicksort on a singly-linked list in ATS. Note that there is no memory allocation/deallocation, either implicit or explict, involved in this code.

// An implementation of quicksort on singly-linked lists
fun{a:viewt@ype}
  list_vt_qsort {n:nat} .< n, 0 >.
  (lte: (!a, !a) -> bool, xs: list_vt (a, n)): list_vt (a, n) =
  case+ xs of
  | cons (!p_x1, !p_xs1) =>
    let val xs1 = !p_xs1 in
      list_vt_par<a> (
        view@ (!p_x1), view@ (!p_xs1) | lte, xs, p_xs1, p_x1, xs1, nil (), nil ()
      )
    end
  | nil () => (fold@ xs; xs)
// end of [list_vt_qsort]

and // [list_vt_par] for partition
  list_vt_par {l0,l1:addr} {p,q,r:nat} .< p+q+r, p+1 >.
  (pf0: a @ l0, pf1: List_vt a? @ l1 |
   lte: (!a, !a) -> bool,
   node: list_vt_cons_unfold (l0, l1), node1: ptr l1,
   p_x0: ptr l0, xs: list_vt (a, p), l: list_vt (a, q), r: list_vt (a, r))
  : list_vt (a, p+q+r+1) = case+ xs of
  | cons (!p_x1, !p_xs1) => let
      val xs1 = !p_xs1
    in
      if lte (!p_x1, !p_x0) then begin
        !p_xs1 := l; fold@ xs;
        list_vt_par<a> (pf0, pf1 | lte, node, node1, p_x0, xs1, xs, r)
      end else begin
        !p_xs1 := r; fold@ xs;
        list_vt_par<a> (pf0, pf1 | lte, node, node1, p_x0, xs1, l, xs)
      end // end of [if]
    end // end of [cons]
  | ~nil () => let
      var l = list_vt_qsort<a> (lte, l) and r = list_vt_qsort<a> (lte, r)
    in
      !node1 := r; fold@ node; r := node; list_vt_append<a> (l, r); l
    end // end of [nil]
// end of [list_vt_par]

The code used for illustration is available here.

Pattern Matching

The feature of pattern matching in ATS is adopted from ML. However, this feature becomes much more elaborated in the presence of dependent types as well as linear types. We present as follows several examples to illustrate some typical uses of pattern matching in ATS.

Let us first declare a polymorphic datatype list0 as follows for representing lists:

datatype list0 (a:t@ype) = nil0 (a) | cons0 (a) of (a, list0 a)

Notice that this is not a dependent datatype. Suppose we need to implement a function zip0 that should only zip together two lists of the same length. This may be done as follows:

fun{a1,a2:t@ype} // [zip0] is a template
zip0 (xs: list0 a1, ys: list0 a2): list0 '(a1, a2) =
  case+ (xs, ys) of // [case+] indicates the exhaustiveness of pattern matching
  | (cons0 (x, xs), cons0 (y, ys)) => cons0 ('(x, y), zip0 (xs, ys))
  | (nil0 (), nil0 ()) => nil0 ()

The problem with this implementation is that a pattern matching failure occurs if the function zip0 is mistakenly applied to two lists of unequal length, resulting in a run-time exception. In practice, a special exception is often declared as follows to handle the case of pattern matching failure, enabling more accurate run-time information to be reported.

exception ZipException

fun{a1,a2:t@ype} // [zip0] is a template
zip0 (xs: list0 a1, ys: list0 a2): list0 '(a1, a2) =
  case+ (xs, ys) of // [case+] indicates the exhaustiveness of pattern matching
  | (cons0 (x, xs), cons0 (y, ys)) => cons0 ('(x, y), zip0 (xs, ys))
  | (nil0 (), nil0 ()) => nil0 ()
  | (_, _) => $raise ZipException

We can also declare a polymorphic dependent datatype list1 as follows for representing lists:

datatype list1 (a:t@ype, int) =
  | nil1 (a, 0)
  | {n:nat} cons1 (a, n+1) of (a, list1 (a, n))

Given a type T and an integer I, the type list1(T, I) is for lists of length I in which each element is of type T. The function for zipping two lists of the same length can now be implemented as follows:

fun{a1,a2:t@ype} // [zip1] is a template
zip1 {n:nat} (xs: list1 (a1, n), ys: list1 (a2, n)): list1 ('(a1, a2), n) =
  case+ (xs, ys) of // [case+] indicates the exhaustiveness of pattern matching
  | (cons1 (x, xs), cons1 (y, ys)) => cons1 ('(x, y), zip1 (xs, ys))
  | (nil1 (), nil1 ()) => nil1 ()

Note that the typechecker can verify that pattern matching in this implementation is exhaustive.

Given that pattern matching in ATS is done sequentially, that is, from left to right in a given row and from top to bottom for all the clauses in a given case-expression, it is natural to expect that the following code gives an equivalent implementation of zip1:

fun{a1,a2:t@ype}
zip1 {n:nat} (xs: list1 (a1, n), ys: list1 (a2, n)): list1 ('(a1, a2), n) =
  case+ (xs, ys) of // [case+] indicates the exhaustiveness of pattern matching
  | (cons1 (x, xs), cons1 (y, ys)) => cons1 ('(x, y), zip1 (xs, ys))
  | (_, _) => nil1 ()

This, however, is not the case as typechecking in ATS does not assume the sequentiality of pattern matching. In this example, the second pattern matching clause is typechecked indepenent of the first one, resulting in a type error. In order to demand that the second clause be typechecked under the assumption that the first clause is not chosen at run-time, the following special syntax is needed:

    | (_, _) =>> nil1 ()

The special arrow =>> indicates that a clause formed with this arrow needs to be typechecked under the assumption that all of the previous clauses (in the same case-expression) are not chosen at run-time.

In the C code generated from compiling the function zip1, there are two tag checkes: they check whether the first and the second arguments of zip1 are empty. Clearly, only one tag check is necessary as both arguments are of the same length. We can modify the implementation of zip1 as follows so as to use only one check:


fun{a1,a2:t@ype} // only one tag check
zip1 {n:nat} (xs: list1 (a1, n), ys: list1 (a2, n))
  : list1 ('(a1, a2), n) = begin case+ xs of
  | cons1 (x, xs) => begin
      let val+ cons1 (y, ys) = ys in cons1 ('(x, y), zip1 (xs, ys)) end
    end
  | _ =>> nil1 ()
end // end of [zip1]

Note that the keyword val+ indicates that the patthern matching following it is exhaustive and thus needs no tag check. This simple example demonstrates in concrete terms that safety can enhance efficiency.

The code used for illustration is available here.

Exceptions

Exceptions can provide a convenient approach to handling abnormal or erroneous cases. This convenience, however, is not without a cost as programs making use of exceptions can often become rather difficult to reason about.

In ATS, the type for exceptions is denoted by exn, which actually is a viewtype of the sort viewtype. So the size of exn equals the size of a pointer. The type exn may be thought of as a dataviewtype (i.e., linear datatype) with which an extensible set of constructors are associated. The syntax for declaring an exception is rather similar to that for declaring a constructor (associated with a datatype). For instance, three exceptions (or more precisely, exception constructors) are declared as follows:

exception Fail // Fail: exn
exception Fail_msg of string // Fail_msg: string -> exn

// Fail_msgs : {n:nat} (int n, list (string n)) -> exn
exception {n:nat} Fail_msgs of (int n, list (string, n))

All exceptions in ATS are static and there is no issue of exceptions being generative as is in Standard ML. It is allowed that an exception be declared in a closed scope so that the declared exception becomes inaccessible outside the closed scope. This is analogous to declaring a static variable inside the body of a function in C.

We present as follows an example that involves an exception being raised and then captured. A binary tree is a Braun tree if it is empty or it satisfies the property that its left and right children are Braun trees and the size of the left child minus the size of the right child equals 0 or 1. The following code implements a function that checks whether a binary tree is a Braun tree:

datatype bt = E | B of (bt, bt) // datatype for binary trees

fn isBraunTree (bt0: bt): bool = let
  exception NotBraunTree
  fun aux (bt: bt): int =
    case+ bt of
    | B (bt1, bt2) => let
        val ls = aux bt1 and rs = aux bt2
      in
        if (ls >= rs && rs+1 >= ls) then ls+rs+1 else $raise NotBraunTree()
      end
    | E () => 0
in
  try let val s = aux bt0 in true end with ~NotBraunTree() => false
end

The type of an exception constructor cannot contain free static variables. In other words, each exception constructor can be lifted to the toplevel. For instance, the following code is illegal: the exception constructor Foo cannot be lifted to the toplevel as its type contains a free type variable.

fun{a:t@ype} f (x: a) = let
  exception Foo of a // Foo: a -> exn
in
  // ...
end

Sometimes, we may want to raise an exception carrying a value of some variable type. This is shown, for instance, in the following illegal code that tries to implement the list subscripting function:

extern fun{a:t@ype}
  list_iforeach (xs: List a, f: (Nat, a) -> void): void

fn{a:t@ype} nth (xs: List a, n: Nat): a = let
  exception Found of a
  fn f (i: Nat, x: a): void = if i = n then $raise (Found x)
in
  try let
    val () = list_iforeach (xs, f) in $raise ListSubscriptException ()
  end with
    ~Found x => x
  // end of [try]
end // end of [nth]

A method to work around the issue is shown in the following code:

fn{a:t@ype} nth (xs: List a, n: Nat): a = let
  exception Found of ()
  val ans = ref_make_elt<Option a> (None)
  fn f (i: Nat, x: a): void =
    if i = n then (!ans := Some x; $raise Found ())
  val () = (try let
    val () = list_iforeach (xs, f) in $raise ListSubscriptException ()
  end with
    ~Found () => ()
  ) : void // end of [try]
in
  case !ans of
  | Some x => x | None () => $raise ListSubscriptException ()
end // end of [nth]

The code used for illustration is available here.

Higher-Order Functions

The core of ATS is a functional language in which functions are first-class values. A higher-order function is a function whose arguments also include functions.

Let us use BT to range over base types such as char, double, int, string, etc. A simple type T is formed according to the following inductive definition:

BT is a simple type.
(T_1, ..., T_n) -> T_0 is a simple type if T_0, T_1, ..., T_n are simple types.

Let order be a function from simply types to natural numbers defined as follows:

order(BT) = 0
order((T_1, ..., T_n) -> T_0) = max (order(T_0), 1 + order (T_1), ..., 1 + order(T_n))

Given a function f of simple type T, we say that f is a nth-order function if order(T) = n. For instance, a function of the type (int, int) -> int is 1st-order, and a function of the type int -> (int -> int) is also 1st-order, and a function of the type ((int -> int), int) -> int is 2nd-order. In practice, most higher-order functions are 2nd-order.

As an example, we implement as follows a 2nd-order function find_root that takes a function f from integers to integers as its argument and searches for a root of f by enumeration:


fn find_root (f: int -<cloref1> int): int = let
  fun aux (f: int -<cloref1> int, n: int): int =
    if f (n) = 0 then n else begin
      if n <= 0 then aux (f, ~n + 1) else aux (f, ~n)
    end
in
  aux (f, 0)
end // end of [fint_root]

The function find_root computes f(0), f(1), f(-1), f(2), f(-2), ... until it finds the first integer i satisfying f(i) = 0.

As another example, we implement the Newton-Raphson's method for finding roots of functions on reals:


val epsilon = 1E-6 (* precision *)

// Newton-Raphson's method for finding roots
// [f1] is a derivative of [f]
fn newton_raphson (
    f: double -<cloref1> double
  , f1: double -<cloref1> double
  , x0: double
  ) : double = let
  fun loop (
      f: double -<cloref1> double
    , f1: double -<cloref1> double
    , x0: double
    ): double = let
    val y0 = f x0
  in
    if abs (y0 / x0) < epsilon then x0 else begin
      let val y1 = f1 x0 in loop (f, f1, x0 - y0 / y1) end
    end
  end // end of [loop]
in
  loop (f, f1, x0)
end // end of [newton_raphson]

// square root function
fn sqrt (c: double): double =
  newton_raphson (lam x => x * x - c, lam x => 2.0 * x, 1.0)

// cubic root function
fn cbrt (c: double): double =
  newton_raphson (lam x => x * x * x - c, lam x => 3.0 * x * x, 1.0)

The code used for illustration is available here.

Lazy Evaluation

Though ATS is a language based on eager call-by-value evaluation, it also allows the programmer to perform lazy call-by-need evaluation. In ATS, there is a special language construct $delay that can be used to delay or suspend the evaluation of an expression and a special function lazy_force that can be called to resume a suspended computation.

There is a special type constructor lazy of the sort (t@ype) => type in ATS, which forms a (boxed) type when applied to a type. On one hand, given an expression exp of type T, $delay (exp) forms a value of type lazy(T) that represents the suspended evaluation of exp. On the other hand, given a value v of type lazy(T) for some type T, lazy_force(v) resumes the suspended evaluation represented by v and returns a result of type T (if the resumed evaluation terminates). The interface for the (template) function lazy_force is:

fun{a:t@ype} lazy_force : lazy a -> a

The special operatior ! in ATS is overloaded with the function lazy_force.

In the file prelude/SATS/lazy.sats, the following datatype stream_con and type stream are declared for representing (lazy) streams:

datatype stream_con (a:t@ype+) =
  | stream_nil (a) | stream_cons (a) of (a, stream a)

where stream (a:t@ype) = lazy (stream_con a)

Also, a number of common functions on streams are implemented in prelude/DATS/lazy.dats.

To simplify the presentation, we use the following syntax to create shorthands for the stream constructors stream_nil and stream_cons:

#define nil stream_nil
#define :: stream_cons

The following code implements a stream representing the sequence of natural numbers starting from 2:

typedef N2 = [n:int | n >= 2] int n
val N2s: stream N2 = from 2 where {
  fun from (n: N2): stream N2 = $delay (n :: from (n+1))
}

More interestingly, the stream of prime numbers can be readily computed as follows:

fun sieve (ns: stream N2):<1,~ref> stream N2 =
  // [val-] means no warning message from the compiler
  let val- n :: ns = !ns in
     $delay (n :: sieve (stream_filter<N2> (ns, lam x => x nmod n > 0)))
  end

val primes: stream N2 = sieve N2s

// find the nth prime
fn nprime {n: pos} (n: int n): N2 = stream_nth (primes, n-1)

Note that the keyword val- indicates to the typechecker that no message warning of nonexhaustive pattern matching should be reported.

As another example, the following code based on lazy evaluation computes Fibonacci numbers:

val // the following values are defined mutually recursively
rec fibs_1: stream int64 = $delay (one :: fibs_2) // fib1, fib2, ...
and fibs_2: stream int64 = $delay (one :: fibs_3) // fib2, fib3, ...
and fibs_3: stream int64 = // fib3, fib4, ...
  stream_map2<int64,int64,int64> (fibs_1, fibs_2, lam (x, y) => x + y)

// find the nth Fibonacci number (counting starts from 1)
fn nfib {n:pos} (n: int n): int64 = stream_nth (fibs_1, n-1)

Please find extensive use of lazy evaluation in the implementation of a package for parsing combinators: contrib/parcomb.

The code used for illustration is available here.

Linear Lazy Evaluation

The feature of linear lazy evaluation, which is considerably advanced, addresses the issue of freeing up resources held by a lazy value (that is, a thunk representing a delayed computation). Let us first see a concrete example involving the issue. In prelude/SATS/file.sats the following function char_stream_make_file is declared:

// making a lazy char stream out of a file handle
fun char_stream_make_file (fil: FILEref):<1,~ref> stream (char)

Note that the syntax <1,~ref> indicates that the function may have all kinds of effects except ref. As suggested by its type, the function char_stream_make_file turns a file handle into a (lazy) stream of characters. We give an implementation of this function as follows:

implement char_stream_make_file (fil) = let
  val c = fgetc0_err (fil)
in
  if c <> EOF then let
    val c = char_of_int (c)
  in
    $delay (stream_cons (c, char_stream_make_file fil))
  end else begin
    let val () = fclose0_exn (fil) in $delay (stream_nil ()) end
  end // end of [if]
end // end of [char_stream_make_file]

Clearly, there is an opened file handle inside each character stream formed by calling char_stream_make_file until all the characters in the file are put into the stream. If such a character stream is discarded during evaluation, then the file handle inside it may be left unclosed indefinitely. This can cause a serious problem in a situation where a large number of character streams need to be formed by calling char_stream_make_file. In general, the inability (or the lack of the ability) to directly free the resources held by lazy values often makes it rather difficult or even infeasible to employ lazy evaluation in a setting that requires great precision in resource management.

A Direct Approach to Freeing Resources inside Lazy Values

The following function char_stream_vt_make_file is also declared in prelude/SATS/file.sats:

// making a linear lazy char stream out of a file handle
fun char_stream_vt_make_file {m:file_mode} {l:addr}
  (pf_mod: file_mode_lte (m, r), pf_fil: FILE m @ l | p_fil: ptr l)
  :<1,~ref> stream_vt (char)

As suggested by its type, the function char_stream_vt_make_file turns a file handle into a linear stream of characters. An implementation of this function is given as follows:

implement char_stream_vt_make_file (pf_mod, pf_fil | p_fil) = let
  val c = fgetc1_err (pf_mod | !p_fil)
in
  if c >= 0 then let // c <> EOF
    val c = char_of_int (c)
  in
    $ldelay (
      stream_vt_cons (c, char_stream_vt_make_file (pf_mod, pf_fil | p_fil))
    , fclose1_exn (pf_fil | p_fil)
    ) // end of [$ldelay]
  end else let
    val () = fclose1_exn (pf_fil | p_fil) in $ldelay (stream_vt_nil ())
  end // end of [if]
end // end of [char_stream_vt_make_file]

The keyword $ldelay is used to form a linear lazy value. Given two dynamic expressions exp1 and exp2, the linear lazy value $ldelay (exp1, exp2) essentially contains two thunks whose bodies are exp1 and exp2, repectively. The first thunk represents a suspended computation while the second thunk represents a finalizer that can be called to free up the resources held in the first thunk. In the case where exp2 is missing, exp1 is assumed to hold no resources.

The code used for illustration is available here.

Input and Output

The I/O is done in ATS/Anairiats by providing interfaces for various I/O functions in C. In particular, most functions declared in <stdio.h> have their counterparts declared in libc/SATS/stdio.sats.

There is a datasort file_mode and an abstract type constructor FILE in ATS that are declared as follows:

datasort file_mode =
  | file_mode_r | file_mode_w | file_mode_rw

absviewt@ype FILE (file_mode) = $extype "ats_FILE_type"

where the external type ats_FILE_type is an alias of the type FILE in C. We now use r, w and rw for file_mode_r, file_mode_w and file_mode_rw in the following presentation. If a file is assigned the type FILE(r), then it is read-only; if it is assigned the type FILE(w), then it is write-only; it is assigned the type FILE(rw), then it can be read from and also written to.

There is also an abstract type constructorfile_mode declared as follows:

abst@ype file_mode (file_mode)

Note that file_mode is overloaded: it refers to a type constructor as well as a datasort. The following values are available that represent various file modes:

val file_mode_r : file_mode (r) // = "r"
val file_mode_rr : file_mode (rw) // = "r+"
val file_mode_w : file_mode (w) // = "w"
val file_mode_ww : file_mode (rw) // = "w+"
val file_mode_a : file_mode (w) // = "a"
val file_mode_aa : file_mode (rw) // = "a+"

Opening a File

The function fopen in C is for opening a file of a given name. In ATS, there is a function fopen_exn that is assigned the following type:

fun fopen_exn {m:file_mode}
  (path: string, m: file_mode m): [l:addr] (FILE m @ l | ptr l)

Note that we use _exn in the name of fopen_exn as a convention to indicate that an exception is raised if a call to this function results in an error. There is another function fopen_err in ATS that is assigned the following type:

fun fopen_err {m:file_mode}
  (path: string, m: file_mode m): [l:addr] (FILE_opt_v (m, l) | ptr l)

where FILE_opt_v (m, l) is defined as follows:

viewdef FILE_opt_v (m:file_mode, l:addr) = option_v (FILE m @ l, l <> null)

The function fopen_err exactly corresponds to fopen in C: It returns a null pointer in case of error.

Closing a File

The function fclose in C is for closing an opened file. In ATS, there is a function fclose_exn that is assigned the following type:

fun fclose_exn {m:file_mode} {l:addr} (pf: FILE m @ l | p: ptr l): void

The function fclose_exn raises an exception if a call to it results in an error. There is another function fclose_err that is assigned the following type:

fun fclose_exn {m:file_mode} {l:addr} (pf: FILE m @ l | p: ptr l): int

The function fclose_err exactly corresponds to fclose in C: It returns 0 in case of success and EOF in case of error.

Reading from a File

The function fgetc in C is for reading a character from an opened file. In ATS, there is a function fget_err that is assigned the following type:

fun fgetc_err {m:file_mode}
  (pf: file_mode_lte (m, r) | f: &FILE m): [i:int | i <= UCHAR_MAX] int i

where UCHAR_MAX is defined to be 255. This function exactly corresponds to fgetc in C: It returns the obtained character (cast into unsigned char) if successful, or it returns EOF in case of failure. The file argument of the function fgets_err, i.e., its first non-proof argument, is passed as a reference (call-by-reference).

Note that file_mode_lte is a prop constructor. Given two file modes m1 and m2, if a proof of the prop file_mode_lte (m1, m2) can be formed, then m1 is less than or equal to m2. The following proofs are available:

prval file_mode_lte_r_r : file_mode_lte (r, r)
prval file_mode_lte_rw_r : file_mode_lte (rw, r)
prval file_mode_lte_w_w : file_mode_lte (w, w)
prval file_mode_lte_rw_w : file_mode_lte (rw, w)
prval file_mode_lte_rw_rw : file_mode_lte (rw, rw)

In order to read from a file of the type FILE (m), it is necessary to prove that m is less than or equal to r.

There is currently no function fget_exn, which would be given the following type if defined:

fun fgetc_exn {m:file_mode}
  (pf: file_mode_lte (m, r) | f: &FILE m): [i:nat | i <= UCHAR_MAX] int i

The problem with fgetc_exn is that it is not a function of much use. If implemented, this function should probably be called on a file after it is made certain (e.g., by calling feof) that the file has not reached its end.

Writing to a File

The function fputc in C is for writing a character into a file. In ATS, there is a function fputc_err that is assigned the following type:

fun fputc_err {m:file_mode}
  (pf: file_mode_lte (m, w) | c: char, f: &FILE m): [i:int | i <= UCHAR_MAX] int i

This function exactly corresponds to fputc in C: It returns the charater being written in case of success, or EOF in case of failure. There is another function fputc_exn that is assigned the following type:

fun fputc_exn {m:file_mode} (pf: file_mode_lte (m, w) | c: char, f: &FILE m): void

In case of failure, fputs_exn raises an exception.

An Example: Concatenating Files

The following function filecopy_exn copies a file into another one:

#define i2c char_of_int1

fn filecopy_exn {m1,m2:file_mode} (
    pf1: file_mode_lte (m1, r)
  , pf2: file_mode_lte (m2, w)
  | fil_s: &FILE m1, fil_d: &FILE m2
  ) : void = let
  fun loop
    (fil_s: &FILE m1, fil_d: &FILE m2): void = let
    val c = fgetc_err (pf1 | fil_s)
  in
    if c >= 0 then begin
      fputc_exn (pf2 | i2c c, fil_d); loop (fil_s, fil_d)
    end // end of [if]
  end // end of [loop]
in
  loop (fil_s, fil_d)
end // end of [filecopy_exn]

In case of error, filecopy_exn raises an exception. The following function filecopy_err is a variant of filecopy_exn. Instead of raising an exception in case of error, filecopy_err returns EOF.

fn filecopy_err {m1,m2:file_mode} (
    pf1: file_mode_lte (m1, r)
  , pf2: file_mode_lte (m2, w)
  | fil_s: &FILE m1, fil_d: &FILE m2
  ) : int = let
  fun loop
    (fil_s: &FILE m1, fil_d: &FILE m2): void = let
    val c = fgetc_err (pf1 | fil_s)
  in
    if c >= 0 then let
      val err = fputc_err (pf2 | i2c c, fil_d)
    in
      loop (fil_s, fil_d)
    end // end of [if]
  end // end of [loop]
  val () = loop (fil_s, fil_d)
in
  if ferror (fil_d) = 0 then ~1 else 0
end // end of [filecopy_err]

We now use filecopy_err to implement as follows the functionality of concatenating a list of files into a single one:

// concatenation of a list of files
implement main {n} (argc, argv) = let
  val () = case+ argc of
  | 1 => let
      val (pf_stdin | p_stdin) = stdin_get ()
      val (pf_stdout | p_stdout) = stdout_get ()
      val _(*err*) = filecopy_err
        (file_mode_lte_r_r, file_mode_lte_w_w | !p_stdin, !p_stdout)
      val () = stdout_view_set (pf_stdout | (*none*))
      val () = stdin_view_set (pf_stdin | (*none*))
    in
      // empty
    end // end of [1]
  | _ (*argc >= 2*) => loop (argc, argv, 1) where {
      fun loop {i:nat | i <= n}
        (argc: int n, argv: &(@[string][n]), i: int i): void =
        if i < argc then let
          val name = argv.[i]
          val (pfopt | p_ifp) = fopen_err (name, file_mode_r)
        in
          if p_ifp <> null then let
            prval Some_v (pf) = pfopt
            val (pf_stdout | p_stdout) = stdout_get ()
            val _(*err*) = filecopy_err
              (file_mode_lte_r_r, file_mode_lte_w_w | !p_ifp, !p_stdout)
            val () = stdout_view_set (pf_stdout | (*none*))
            val () = fclose_exn (pf | p_ifp)
          in
            loop (argc, argv, i+1)
          end else let
            prval None_v () = pfopt
            val () = prerrf ("%s: can't open [%s]\n", @(argv.[0], name))
          in
            exit {void} (1)
          end // end of [if]
        end // end of [if]
    } // end of [_]
  // end of [val]
  val (pf_stdout | p_stdout) = stdout_get ()
  val err = ferror (!p_stdout)
  val () = stdout_view_set (pf_stdout | (*none*))
in
  if (err <> 0) then begin
    prerrf ("%s: error writing stdout\n", @(argv.[0])); exit {void} (2)
  end else begin
    exit {void} (0) // exit normally
  end // end of [if]
end // end of [main]

The code used for illustration is available here.

Combining ATS and C

When programming in ATS, a programmer may need to write C code occassionally. This is particularly common when the programmer does systems programming. Given that both ATS and C use the same data representation, it is largely straightforward for ATS and C to interact with each other. In particular, there is no need to write "wrappers" that are used in languages like Objective Caml or Haskell. In ATS, the syntax for enclosing C code is

%{ (some C code) %}

The symbol %{ can be replaced with either %{^ or %{$. The former and the latter mean that the enclosed C code needs to be put at the top and the bottom of the generated C code, respectively.

For a function to be called in C, we need to assign a global name to this function. For instance, the following syntax declares a function foo that takes a pair of integers and returns a boolean. This function is given a global name "ats_foo", which can be used in C code to refer to foo.

fun foo : (int, int) -> bool = "ats_foo"

We may implement foo either in ATS as follows:

implement foo (x, y) = ...

or in C as follows:

%{

ats_bool_type ats_foo (ats_int_type x, ats_int_type y) { ... }

%}

Note that the C types ats_int_type and ats_bool_type are defined in the file ccomp/runtime/ats_types.h

The following example involves a call in ATS to a function that is implemented in C:

// This function computes Fibonacci numbers
extern fun fibonacci (n: Nat): Nat = "fibonacci"

%{

ats_int_type fibonacci (ats_int_type n) {
  int res1, res2, tmp ;

  if (n < 1) return 0 ;
 
  res1 = 0 ; res2 = 1 ;
  while (n > 1) {
    --n ; tmp = res2 ; res2 += res1 ; res1 = tmp ;
  }

  return res2 ;
}

%}

fn fibonacci_usage (cmd: string): void =
  prerrf ("Usage: %s [integer]\n", @(cmd)) // print an error message

implement main (argc, argv) =
  if argc >= 2 then let
    val n = int1_of argv.[1] // turning string into integer
    val () = assert_errmsg
      (n >= 0, "The integer argument needs to be nonnegative.\n")
    val res = fibonacci n
  in
    printf ("fibonacci (%i) = %i\n", @(n, res))
  end else begin
    fibonacci_usage argv.[0]; exit {void} (1)
  end

It is also possible to assign names to types and values in ATS, and such names can be used in extenal C code. As an example, the type Tree in the following code is given a name exTree, which can be used in C code to refer to the type Tree.


dataviewtype tree (int) =
  Nil(0) | {n1,n2:two} Node(1) of (tree n1, int, tree n2)

viewtypedef Tree = [n:two] tree n

// assigning an external name to [Tree]
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
extern typedef "exTree" = Tree
extern fun ref_tree : Tree -> ref Tree = "ref_tree"

%{

ats_ref_type ref_tree (ats_ptr_type t) {
  exTree* r ;
  r = ats_malloc_gc (sizeof(exTree)) ;
  *r = (exTree)t ;
  return r ;
}

%}

The syntax for assigning a name to a value in ATS is given as follow:

extern val $name = $exp

where $name is a string literal (representing a valid identifier in C) and $exp ranges over dynamic expressions in ATS. When used in external C code, the string $name refers to the value of the expression $exp.

In the other direction, we can use $extype "SomeType" in ATS to refer to a type of the name SomeType in C. Similarly, we can use $extval ($typ, "SomeValue") for a value of the name SomeValue in C, where $typ is the supposed type of this value in ATS. For instance, stdout in ATS is defined as follows:

#define stdout $exval ($extype "ats_ptr_type", "stdout")

where ats_ptr_type is an alias for the type void* in C.

The code used for illustration is available here.

Programming with Theorem Proving

The paradigm of programming with theorem proving is rich and broad, and it is probably the most innovative feature in ATS. We give an introduction to this paradigm by presenting a few examples, explaining some motivations behind programming with theorem proving as well as demonstrating a means to achieve it in ATS.

The following code implements a function that computes factorials:

// non-tail-recursive
fun fact {n:nat} (n: int n): Nat =
  if n > 0 then n nmul fact (n-1) else 1

The type assigned to fact is

{n:nat} int (n) -> Nat

That is, the function fact takes a nature number as its argument and, if it terminates, returns a natural number as its result.

Now suppose we want to be more accurate. We want to verify that the code indeed implements the factorial functions. One possibility is to specify the factorial function in the statics of ATS, and then try to assign the following type to fact:

{n:nat} int (n) -> int (fact(n))

Note that the name fact is overloaded here to also refer to the factorial function in the statics of ATS. There is, however, a serious problem with this approach: If functions as complex as the factorial function could be specified in the statics of ATS, then it would no longer be possible to support fully automatic constraint-solving. We take a different approach instead.

With standard mathematical notation, the factorial function can be specified as follows:

fact (0) = 1 ;
fact (n) = n * fact (n-1) ; if n > 0

In ATS, we can declare a dataprop to encode this specification:

// [FACT (n, x)] is inhabited if and only if [fact (n) = x]
dataprop FACT (int, int) =
  | FACTbas (0, 1)
  | {n:int | n > 0} {r,r1:int}
    FACTind (n, r) of (FACT (n-1, r1), MUL (n, r1, r))

The prop constructor MUL is already defined: Given integers p, q, and pq, MUL(p, q, pq) is inhabited if and only if the product of p and q is pq. The constructors FACTbas (base case) and FACTind (inductive case) correspond to the first and the second clauses in the above specification, respectively. Given integers n and r, the prop FACT(n, r) is inhabited if and only if the factorial of n equals r. Clearly, if a terminating function can be assgined the following type:

{n:nat} int n -> [r:int] (FACT (n, r) | int r)

then this function must compute factorials. The code below implements such a function fact1:

// [fact1] implements the factorial function in a non-tail-recursive manner
fun fact1 {n:nat} .< n >.
  (n: int n): [r:int] (FACT (n, r) | int r) =
  if n > 0 then
    let
       val (pf_fac_n1_r1 | r1) = fact1 (n-1)
       val (pf_mul_n_r1_r | r) = n imul2 r1
    in
       (FACTind (pf_fac_n1_r1, pf_mul_n_r1_r) | r)
    end
  else (FACTbas () | 1)

When applied to a natural number n, fact1 is guaranteed to return a pair (pf, r), where pf is a proof stating that the factorial of n is r. A crucial property of ATS is the irrelevance of proofs to program evaluation: The dynamic semantics of a program cannot be altered by erasing the proofs in it. So proofs are all erased after compilation and thus are not constructed at run-time.

This example clearly gives some impression of code duplication as the specification (FACT) and the implementation (fact1) are of great similarity to each other. We now present another example, where the specification and the implementation are far different from each other.

The following code gives another implementation of the factorial function, where only tail-recursion is involved:

// tail-recursive
fn fact {n:nat} (n: int n): Nat =
  let
    fun loop {n:nat} (n: int n, res: Nat): Nat =
      if n > 0 then loop (n-1, n nmul res) else res
  in
    loop (n, 1)
  end

Base on this tail-recursive implementation, we implement as follows a function fact2 that is verified to compute factorials.

// Some lemmas on multiplication are stated (without proofs)
extern prval MULlem00 : {x,y:int} () -<prf> [xy:int] MUL (x, y, xy)

// 1 * x = x
extern prval MULlem10 : {x,y:int} MUL (1, x, y) -<prf> [x==y] void
// x * 1 = x
extern prval MULlem11 : {x,y:int} MUL (x, 1, y) -<prf> [x==y] void

// multiplication is associative: (xy)z = x(yz)
extern prval MULlem20 : {x,y,z,xy,yz,xyz:int}
  (MUL (x, y, xy), MUL (y, z, yz), MUL (xy, z, xyz)) -<prf> MUL (x, yz, xyz)

// [fact2] implements the factorial function in a tail-recursive style
fn fact2 {n:nat} (n: int n): [r0: int] (FACT (n, r0) | int r0) =
  let
    // [loop] is tail-recusive
    fun loop {n:nat; x:int} .< n >. (n: int n, x: int x)
      : [r,r0:int] (FACT (n, r), MUL (x, r, r0) | int r0) =
      if n > 0 then let
        val (pf_mul_x_n_xn | xn) = x imul2 n
        val (pf_fac_n1_r1, pf_mul_xn_r1_r0 | res) = loop (n-1, xn)
        prval pf_mul_n_r1_nr1 = MULlem00 ()
        prval pf_mul_x_nr1_r0 = MULlem20 (pf_mul_x_n_xn, pf_mul_n_r1_nr1, pf_mul_xn_r1_r0)
        prval pf_fac_n_nr1 = FACTind (pf_fac_n1_r1, pf_mul_n_r1_nr1)
      in
        (pf_fac_n_nr1, pf_mul_x_nr1_r0 | res)
      end else let
        prval pf_mul_x_1_y = MULlem00 () // x * 1 = y
        prval () = MULlem11 (pf_mul_x_1_y) // x = y
      in
        (FACTbas (), pf_mul_x_1_y | x)
      end

    val (pf_fac, pf_mul | res) = loop (n, 1)
    prval () = MULlem10 (pf_mul)
  in
    (pf_fac | res)
  end

In practice, programming with theorem proving is commonly employed to capture program invariants that would otherwise not be possible in ATS. For instance, aided by programming with theorem proving, we can program a complex data structure like doubly-linked list or doubly-linked tree while having no fear of misusing pointers. This is considered a great strength of ATS.

The code used for illustration is available here.

Casting Functions

A casting function in ATS is equivalent to the identify function in terms of dynamic semantics. A call to such a function is evaluated at compile-time, and its argument is returned. For instance, we have the following commonly used casting functions:

castfn int1_of_int (x: int):<> [n:nat] int n
castfn string1_of_string (x: string):<> [n:nat] string n

The keyword castfn is for introducing casting functions.

We now present a typical use of casting functions. The following declared function list_concat is intended for concatenating a list of lists:

fun{a:t@ype} list_concat (xss: List (List a)): List a

Let us say that we would like to verify that the concatenation of a list of lists yields a list whose length equals the sum of the lengths of the lists in the given list of lists. This, for instance, can be done as follows by introducting a datatype constructor lstlst.

datatype lstlst (a:t@ype+, int, int) =
  | {m,t:nat} {n:nat}
    lstlst_cons (a, m+1, t+n) of (list (a, n), lstlst (a, m, t))
  | lstlst_nil (a, 0, 0) of ()

fun{a:t@ype} _concat {m,t:nat} .<m>.
  (xss: lstlst (a, m, t)): list (a, t) = case+ xss of
  | lstlst_cons (xs, xss) => list_append (xs, _concat<a> xss)
  | lstlst_nil () => list_nil ()
// end of [_concat]

Given a type T and integers I and J, the type lstlst (T, I, J) is for a list of lists such that the length of the list is I and each element in the list is a list of values of the type T and the sum of the lengths of these elements equals J.

The function list_concat is the same as the function _concat in terms of dynamic semantics, and it can be implemented as follows:

implement{a} list_concat (xss) =
  _concat<a> (lstlst_of_listlist xss) where {
  castfn lstlst_of_listlist
    {m:nat} .<m>. (xss: list (List a, m))
    :<> [t:nat] lstlst (a, m, t) = case+ xss of
    | list_cons (xs, xss) => lstlst_cons (xs, lstlst_of_listlist xss)
    | list_nil () => lstlst_nil ()
} // end of [list_concat]

Given that lstlst_of_listlist being implemented as a casting function, the implementation of list_concat is equivalent to the following one in terms of dynamic semantics:

implement{a} list_concat (xss) = _concat (xss) // this one does not typecheck

The code used for illustration is available here.

Allocation in Stack Frames

ATS supports memory allocation in the stack frame of a calling function, and it is guaranteed by the type system of ATS that memory thus allocated cannot be accessed once the calling function returns.

Storing Arrays in Stack Frames

In the following contrived example, the implemented function name_of_month_1 allocates in its stack frame a string array of size 12 that is initialized with the names of 12 months, and then returns the name of the ith month.

fn name_of_month_1 {i:int | 1 <= i; i <= 12} (i: int i): string = let
  var !p_arr with pf_arr = @[string](
    "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
  )
in
  p_arr->[i-1]
end // end of [name_of_month_1]

The following syntax indicates that the starting address of the allocated array is stored in p_arr while the view of the array is stored in pf_arr:

  var !p_arr with pf_arr = @[string](
    "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
  )

This allocated array is initialized with the strings representing the names of the 12 months: "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec".

A variant of the function name_of_month_1 is implemeneted as follows:

fn name_of_month_2 {i:int | 1 <= i; i < 12} (i: int i): string = let
  var !p_arr with pf_arr = @[string][12]("")
  val () = p_arr->[0] := "Jan"
  val () = p_arr->[1] := "Feb"
  val () = p_arr->[2] := "Mar"
  val () = p_arr->[3] := "Apr"
  val () = p_arr->[4] := "May"
  val () = p_arr->[5] := "Jun"
  val () = p_arr->[6] := "Jul"
  val () = p_arr->[7] := "Aug"
  val () = p_arr->[8] := "Sep"
  val () = p_arr->[9] := "Oct"
  val () = p_arr->[10] := "Nov"
  val () = p_arr->[11] := "Dec"
in
  p_arr->[i-1]
end // end of [name_of_month_2]

The following syntax means that the function name_of_month_2 allocates a string array of size 12 in its stack frame and initializes the array with the empty string:

var !p_arr with pf_arr = @[string][12]("")

The starting address and the view of the allocated array are stored in p_arr and pf_arr, respectively. If the following syntax is used:

var !p_arr with pf_arr = @[string][12]()

then the allocated array is uninitialized, that is, the view of the proof pf_arr is [string?][12] @ p_arr (instead of [string][12] @ p_arr).

Storing Closures in Stack Frames

When higher-order functions are employed in systems programming, it is often desirable to form closures in the stack frame of the calling function so as to avoid the need for memory allocation on heap.

In the following example, the implemented function print_month_name forms a closure in its stack frame, which is then passed to a higher-order function iforeach_array_ptr_clo:

fn print_month_names () = let
  var !p_arr with pf_arr = @[string](
    "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
  )
  var !p_clo with pf_clo = @lam // this closure is allocated in the stack frame
    (pf: !unit_v | i: sizeLt 12, x: &string): void =<clo1> (if i > 0 then print ", "; print x)
  // end of [var]
  prval pf = unit_v ()
  val () = iforeach_array_ptr_clo_tsz {string} {unit_v} (pf | !p_clo, !p_arr, 12, sizeof<string>)
  prval unit_v () = pf
  val () = print_newline ()
in
  // empty
end // end of [print_month_names]

Note that the keyword @lam (instead of lam) is used here to indicate that the closure is constructed in the stack frame of the function print_month_names.

The code used for illustration is available here.

State Types

A state type is a mapping that associates variables with viewtypes. Intuitively, if a program point can be assigned a state type ST, then each variable in the domain of ST must have the type with which ST associates the variable when program execution reaches that point. In ATS, a state type is represented as follows:

(x_1 : VT_1, ..., x_n : VT_n)

where for 1 <= i <= n, each variable x_i is assiciated with the viewtype VT_i.

The code used for illustration is available here.

Loop Constructs

In ATS, there are also constructs for forming for-loops and while-loops directly. While it is tempting for someone familiar with imperative programming to implement loops with these constructs (instead of using tail-recursion), it should be emphasized that the amount of effort involved in constructing loop invariants (if needed) often makes this style of programming difficult for beginners and, sometimes, for experts as well.

Simple Loops

The following code gives an implementation of the factorial function, where a while-loop is constructed:

fn fact (n: int): int = let
  var x: int = n
  var res: int = 1 // initialized
  val () = while (x > 0) (res := res * x; x := x - 1)
in
  res // res = n!
end // end of [fact]

The syntax for a while-expression is given as follows:

while $test $body

where $test is an expression of the type bool and $body is an expression of the type void. Usually, $body consists of a sequence of expressions of the type void.

In the following, another implementation of the factorial function is given, where a for-loop is constructed:

// a simple for-loop 
fn fact (n: int): int = let
  var x: int
  var res: int = 1 // initialized
  val () = for (x := 1 ; x <= n ; x := x+1) res := res * x
in
  res // res = n!
end // end of [fact]

The syntax for a for-expression is given as follows:

for ($init ; $test ; $post) $body

where $init is an expression of the type void, $test is an expression of the type bool, $post is an expression of the type void, and $body is an expression of the type void. Usually, $body consists of a sequence of expressions of the type void. Note that each of $init, $test, and $post can be omitted. The meaning of each omission should be evident. The following code gives yet another implemenation of the factorial function:

// a simple for-loop with omissions involving [break]
fn fact (n: int): int = let
  var x: int = 1 // initialized
  var res: int = 1 // initialized
  val () = for ( ; ; ) // infinite loop
    if x <= n then (res := res * x; x := x+1) else break
  // end of [val]
in
  res // res = n!
end // end of [fact]

Note that break in ATS is equivalent to its counterpart in C. Also, continue is supported in ATS as shown in the following implementation of the factorial function:

// a simple for-loop with omissions involving [break] and [continue]
fn fact (n: int): int = let
  var x: int = 1 // initialized
  var res: int = 1 // initialized
  val () = for ( ; ; x := x+1) // no loop test
(*
** note that [continue] means to loop again *after* post increment!
*)
    if x <= n then (res := res * x; continue) else break
  // end of [val]
in
  res // res = n!
end // end of [fact]

Loops with Annotated Invariants

It is also possible to annotate loops with invariants expressed in terms of state types. For such loops, the keywords while* and for* are used in place of while and for, respectively. As an example, the following code implements the standard binary search on an array of doubles:

fn bsearch {n:nat} (
    A: &(@[double][n]), n: int n, key: double
  ) :<> int = let
  var l: int = 0 and u: int = n-1; var res: int = ~1
  val () = while*
    {i,j:int | 0 <= i; i <= j+1; j < n} .<j+1-i>. (l: int i, u: int j) =>
    (l <= u) let
      val m = l + (u-l) / 2
      val sgn = compare (key, A.[m])
    in
      case+ 0 of
      | _ when sgn < 0 => (u := m-1; continue)
      | _ when sgn > 0 => (l := m+1; continue)
      | _ => (res := m; break)
    end // end of [val]
in
  res // 0 <= res < n if [key] is found; or res = ~1 if not
end // end of [bsearch]

The annotated invariant for the loop follows the keyword while*, and it is separated from the rest of the loop by the symbol =>. The invariant states that there are integers i and j satisfying 0 <= i, i <= j+1 and j < n such that the variables l and u are of the types int(i) and int(j), respectively, at the entry point of the loop, that is, the point immediately before the loop test. A termination metric .<j+1-i>. is provided to verify that the loop terminates: the metric must decrease whenever the loop goes back to its entry point.

As another example, an implementation of the factorial function is presented as follows that involves the use of an annotated for-loop:

fn fact {n:nat} (n: int n): int = let
  var x: int
  var res: int = 1
(*
  // the loop invariant indicates that
  // the value of [x] is [n+1] at the point where the loop exits
*)
  val () = for* {i:nat | i <= n+1} .<n+1-i>.
    (x: int i): (x: int (n+1)) => (x := 0; x <= n ; x := x+1) res := res * x
  // end of [val]
in
  res
end // end of [fact]

The annotated invariant for the for-loop follows the keyword for*, and it is separated from the rest of the loop by the symbol =>. This invariant consists of two parts, separated by the colon symbol (:). The part before the colon symbol essentially means that there is a natural number i satisfying i <= n+1 such that the variable x is of the type int(i) at the entry point of the for-loop, that is, at the point immediately after the loop initialization and before the loop test. There is also a termination metric in this part whose meaning should be evident. The part after the colon symbol essentially means that the variable x is of the type int(n+1) at the exit point of the loop.

Annotating loops with invariants correctly can be difficult, sometimes. If complex loop invariants are needed, we recommend that the programmer avoid constructing loops directly. Instead, the programmer can implement tail-recursive functions in place of loops. Note that each tail-recursive function in ATS is guaranteed to be compiled into a loop.

The code used for illustration is available here.