Example: String Processing

A string in ATS is represented in the same manner as in C: It is sequence of adjacently stored non-null characters followed by the null character, and its length is the number of non-null characters in the sequence. Conventionally, such strings are often referred to as C-style strings, which are notoriously difficult to be processed safely (as indicated by so many bugs and breaches due to misusing such strings). As a matter of fact, ATS is the first practical programming language that I know can fully support safe processing of C-style strings. In ATS, string is a type constructor of the sort (int) -> type . Given a static integer n, string(n) is the type for strings of the length n. Note that string also refers to a non-dependent type for strings of unspecified length, which is basically equivalent to the type String defined as follows:

typedef String = [n:nat] string (n)

The following two functions are commonly used for traversing a given string:

fun string_is_at_end
  {n:int} {i:nat | i <= n}
  (str: string n, i: size_t i): bool (i == n)
// end of [string_is_at_end]

fun string_isnot_at_end
  {n:int} {i:nat | i <= n}
  (str: string n, i: size_t i): bool (i < n)
// end of [string_isnot_at_end]

Obviously, either one of them can be implemented based on the other. As an example, the following code implements a function that computes the length of a string:

fun string_length {n:nat}
  (str: string n): size_t n = let
  fun loop {i:nat | i <= n}
    (str: string n, i: size_t i): size_t (n) =
    if string_isnot_at_end (str, i) then loop (str, i+1) else i
  // end of [loop]
in
  loop (str, 0)
end // end of [string_length]

Note that the function loop in the body of string_length is defined tail-recursively. Although this implementation of string_length looks fairly plain now, it was actually an exciting achievement in the pursuit of supporting practical programming with dependent types.

The following two functions are for accessing and updating characters stored in strings:

typedef c1har = [c:char | c <> '\000'] char (c)

fun string_get_char_at
  {n:int} {i:nat | i < n} (str: string n, i: size_t i): c1har
overload [] with string_set_char_at

fun string_set_char_at {n:int}
  {i:nat | i < n} (str: string n, i: size_t i, c: c1har): void
overload [] with string_set_char_at

The type constructor char is of the sort (char) -> t@ype, which takes a static character c to form a singleton type char(c) for the only character equal to c. Thus, the type c1har is for all non-null characters. The following defined function string_find traverses a string from left to right to see if a given character occurs in the string:

typedef sizeLt (n:int) = [i:nat | i < n] size_t (i)

fun string_find {n:nat}
  (str: string n, c0: char): option0 (sizeLt n) = let
  fun loop {i:nat | i <= n}
    (str: string n, c0: char, i: size_t i): option0 (sizeLt n) =
    if string_isnot_at_end (str, i) then
      if (c0 = str[i]) then option0_some (i) else loop (str, c0, i+1)
    else option0_none () // end of [if]
  // end of [loop]
in
  loop (str, c0, 0)
end // end of [string_find]

If the character c0 occurs in the string str, then a value of the form option0_some(i) is returned, when i refers to the position of the first occurrence of c0 (counting from left to right). Otherwise, the value option0_none() is returned.

There is some inherent inefficiency in the implementation of string_find: A given position i is first checked to see if it is strictly less than the length of the string str by calling string_isnot_at_end, and, if it is, the character stored at the position in the string is fetched by calling string_get_char_at. These two function calls are merged into one function call in the following implementation:

//
// This implementation does the same as [string_find]
// but should be more efficient.
//
fun string_find2 {n:nat}
  (str: string n, c0: char): option0 (sizeLt n) = let
  fun loop {i:nat | i <= n} (
    str: string n, c0: char, i: size_t i
  ) : option0 (sizeLt n) = let
    val c = string_test_char_at (str, i)
  in
    if c != '\000' then
      if (c0 = c) then option0_some (i) else loop (str, c0, i+1)
    else option0_none () // end of [if]
  end // end of [loop]
in
  loop (str, c0, 0)
end // end of [string_find2]

The interface for the function string_test_char_at is given as follows:

fun string_test_char_at {n:int}
  {i:nat | i <= n} (str: string n, i: size_t i)
  : [c:char | (c <> NUL && i < n) || (c == NUL && i >= n)] char c
// end of [string_test_char_at]

By checking the return value of a call to string_test_char_at, we can readily tell whether the position i is at the end of the string str.

Handling strings safely and efficiently is a complicated matter in programming language design, and a great deal of information about a programming language can often be revealed by simply studying its treatment of strings. In ATS, properly processing C-style strings also makes essential use of linear types, which I will cover in another part of this book.