Effective ATS:
Linear Streams for Memory-Clean Programs

I find that linear streams are a true gem in ATS. It seems a bit unfortunate that this gem is largely hidden from most programmers. I would like to present a few short examples in this artice that can demonstrate a typical use of linear streams in the construction of memory-clean programs. Note that a program being memory-clean means that the program must free every byte of memory allocated during its execution at the point when it terminates. It should be soon clear that the presented memory-clean programs also have a very small memory footprint, making them suitable to run on devices where memory is extremely limited (e.g., Arduino Uno).

A stream of Fibonacci numbers

A (functional) stream of Fibonacci numbers can be constructed by calling the following function fibseq:

//
fn fibseq() =
  (fix f(n0:int, n1:int): stream(int) => $delay(stream_cons(n0, f(n1, n0+n1))))(0, 1)
//
For someone who may not be familiar with streams in ATS, please find a chapter on stream-based lazy evaluation in this tutorial on programming features in ATS.

Given a stream xs and a natural number n, the function stream_nth_exn returns element n in xs (or it raises an exception if there is no such an element in xs). As an example, the value of the 10th Fibonacci number can be computed as follows:

val fib10 = stream_nth_exn(fibseq(), 10)
After stream_nth_exn returns, the stream returned by fibseq() is in an expanded form that contains at least element 0, element 1, ..., element 10. In order to compute element n in the stream, we need memory of size proportional to n.

A linear stream of Fibonacci numbers

A linear stream of Fibonacci numbers can be constructed by calling the following function fibseq_vt:

//
fn fibseq_vt()  =
  (fix f(n0:int, n1: int): stream_vt(int) => $ldelay(stream_vt_cons(n0, f(n1, n0+n1))))(0, 1)
//
For someone who may not be familiar with linear streams in ATS, please find a chapter on linear stream-based lazy evaluation in this tutorial on programming features in ATS.

Given a linear stream xs and a natural number n, the function stream_vt_nth_exn returns element n in xs (or it raises an exception if there is no such an element in xs). As an example, the value of the 10th Fibonacci number can be computed as follows:

val fib10 = stream_vt_nth_exn(fibseq_vt(), 10)
After stream_vt_nth_exn returns, the linear stream returned by fibseq_vt() is completely freed. In order to compute element n in the linear stream, we only need memory of size O(1). In general, a node in a linear stream is freed before the next node is generated, implying that only one node is present during the processing of the linear stream. For instance, the function stream_vt_nth_exn can be implemented as follows:
fun
{a:t@ype}
stream_vt_nth_exn
(
  xs: stream_vt(a), n: intGte(0)
) = loop(xs, n) where
{
//
fun
loop{n:nat} .<n>.
(
  xs: stream_vt(a), n: int(n)
) : a =
(
  case+ !xs of
  | ~stream_vt_nil() =>
      $raise StreamSubscriptExn()
  | ~stream_vt_cons(x, xs) =>
      if n = 0 then (~xs; x) else loop(xs, pred(n))
) (* end of [loop] *)
//
} (* end of [stream_vt_nth_exn] *)
A linear stream is represented as a linear thunk (that is, a nullary closure-function); the thunk evaluates to either a null node (indicating the end of the stream) or a non-null node containing both an element and another thunk (for subsequent use), and it then frees itself. Therefore, processing the linear stream constructed by a call to fibseq_vt only needs memory for storing one node and two thunks; each node contains one value of the type int and one pointer; each thunk contains two values of the type int and one pointer. In order to test that this is indeed the case, we implement the following two functions in C for freeing and allocating memory:
%{^
//
#define NM 3
//
int used[NM];
//
typedef
struct{ void* _[2]; } block_t;
block_t smem[NM];
//
void
atsruntime_mfree_user(void *p) {
  void *p0 = &smem[0];
  used[((char*)p - (char*)p0)/sizeof(block_t)] = 0;
}
//
void*
atsruntime_malloc_user(size_t bsz) {
  int i;
  for (i = 0; i < NM; i += 1)
  {
    if (used[i] == 0) { used[i] = 1; return &smem[i]; }
  }
  return 0;
}
//
%} // end of [%{^]
By passing the compilation flag -DATS_MEMALLOC_USER to atscc, one can generate C code from ATS source that makes use of the functions atsruntime_mfree_user and atsruntime_malloc_user for freeing and allocating memory, respectively. A simple experiment can confirm that only 3 blocks of memory is needed for processing the linear stream returned by fibseq_vt(), where each block consists of two consecutive words.

A stream-based solution to the eight-queen puzzle

The eight-queen puzzle asks a player to find one way to put 8 queen pieces on a chess board such that there exists no piece that can capture another one.

Let us use an integer list of length n for a (partial) chess board consisting of the first n rows such that there is a queen piece on each row; each integer in the list represents the column number of the queen piece on the corresponding row. The following declaration introduces a function qsolve that takes a given natural number n and returns a stream of lists containing all the possible ways to put n queen pieces on the first n rows of a chess board such that no piece can capture another one:

//
fun qsolve{n:nat}(int(n)): stream(list(int, n))
//
Clearly, calling qsolve on 8 returns all of the solutions to the eight-queen puzzle.

An implementation of qsolve is given a follows:

//
#define N 8
//
implement
qsolve{n}(n) =
(
if
n = 0
then
(
  stream_make_sing(list_nil)
) (* end of [then] *)
else let
//
fun
test
{ i:int
| 0 < i; i <= n
} .<n-i>.
(
  x: int
, i: int(i), xs: list(int, n-i)
) :<> bool =
(
case+ xs of
| list_nil() => true
| list_cons(x1, xs) =>
    if (x != x1 && abs(x-x1) != i)
      then test(x, i+1, xs) else false
  // end of [list_cons]
)
//
fun
extend
{x:nat | x <= N} .<N-x>.
(
  x: int(x), xs: list(int, n-1)
) :<> stream(list(int, n)) = $delay
(
//
if x < N then (
  if test(x, 1, xs)
    then (
      stream_cons(list_cons(x, xs), extend(x+1, xs))
    ) else !(extend(x+1, xs))
  // end of [if]
) else stream_nil(*void*)
//
) (* end of [extend] *)
//
in
//
stream_concat
(
  stream_map_fun(qsolve(n-1), lam(xs) => extend(0, xs))
) (* end of [stream_concat] *)
//
end // end of [else]
//
) (* end of [qsolve] *)
//
The function test checks whether a given row can be added to a (partial) solution (to form another solution containing one more row). Given a (partial) solution, the function extend returns a stream of lists containing all of the valid solutions that extend the given one with one more row. Note that calling the function stream_map_fun returns a stream of elements, where each element is a stream of lists such that each list represents a (partial) solution to the eight-queen puzzle, and calling the function stream_concat on the stream of streams returned by stream_map_fun yields a stream of lists containing all the possible ways to put n queen pieces on the first n rows such that no piece can capture another one.

A linear stream-based solution to the eight-queen puzzle

We declare the interface for a linear version of qsolve as follows:

//
fun qsolve_vt{n:nat}(int(n)): stream_vt(list_vt(int, n))
//
Given a natural number n, qsolve_vt returns a linear stream of linear lists containing all the (partial) solutions to putting n queen pieces on the first n rows (of a chess board) such that no piece can capture another one.

The following implementation of qsolve_vt is largely parallel to the above one for qsolve:

//
#define N 8
//
implement
qsolve{n}(n) =
(
if
n = 0
then
$ldelay
stream_vt_cons
(
  list_vt_nil
, stream_vt_make_nil()
) (* end of [then] *)
else let
//
fun
test
{ i:int
| 0 < i; i <= n
} .<n-i>.
(
  x: int
, i: int(i), xs: !list_vt(int, n-i)
) :<> bool =
(
case+ xs of
| list_vt_nil() => true
| list_vt_cons(x1, xs) =>
    if (x != x1 && abs(x-x1) != i)
      then test(x, i+1, xs) else false
  // end of [list_cons]
)
//
fun
extend
{x:nat | x <= N} .<N-x>.
(
  x: int(x), xs: list_vt(int, n-1)
) :<!wrt> stream_vt(list_vt(int, n)) = $ldelay
(
//
(
if x < N then (
  if test(x, 1, xs)
    then
    stream_vt_cons
    (
      list_vt_cons(x, list_vt_copy(xs))
    , extend(x+1, xs)
    )
    else !(extend(x+1, xs))
  // end of [if]
) else (list_vt_free(xs); stream_vt_nil(*void*))
) : stream_vt_con(list_vt(int, n))
//
,
//
list_vt_free(xs) // it is called when the stream is freed
//
)  (* end of [extend] *)
//
in
//
stream_vt_concat
(
  stream_vt_map_fun(qsolve(n-1), lam(xs) => extend(0, xs))
) (* end of [stream_vt_concat] *)
//
end // end of [else]
//
) (* end of [qsolve_vt] *)
//
While it is difficult to analyze statically how many bytes of memory are needed in order to process all the elements in the stream returned by qsolve_vt(8), we can readily obtain an estimate of this number experimentally. For instance, some data gathered at run-time indicate that 97 blocks of memory is adequate, where each block consists of 3 consecutive words.

If we try to call qsolve_vt(N) for a large N (e.g., choosing N to be 100), then we can see that the run-time execution of the call immediately reaches a steady state of memory consumption. While the call itself can never return (due to the algorithm for qsolve_vt being exponential in terms of N), its execution goes on smoothly without causing any dreadful page faults. This is really a thing of beauty to observe!

Compiling and Testing

Please find in the following files the entirety of the code presented in this article:

Fibonacci.dats
Fibonacci_vt.dats
QueensPuzzle.dats
QueensPuzzle_vt.dats
In addition, there is an accompanying Makefile for compiling and testing the code.


This article is written by Hongwei Xi.