Example: String Processing

A string in ATS is represented in the same manner as in C: It is a sequence of adjacently stored non-null characters followed by the null character, and its length is the number of non-null characters in the sequence. Conventionally, such strings are often referred to as C-style strings, which are notoriously difficult to be processed safely (as is clearly indicated by so many bugs and breaches due to misusing such strings). As a matter of fact, ATS is the first practical programming language that I know can fully support safe processing of C-style strings. In ATS, string is a type constructor of the sort (int) -> type. Given a static integer n, string(n) is the type for strings of length n. Note that string also refers to a non-dependent type for strings of unspecified length, which is basically equivalent to the type String defined as follows:

typedef String = [n:nat] string (n)

The following two functions are commonly used for traversing a given string:

// fun string_is_atend {n:int}{i:nat | i <= n} (str: string (n), i: size_t (i)): bool (i==n) // end of [string_is_atend] // fun string_isnot_atend {n:int}{i:nat | i <= n} (str: string (n), i: size_t (i)): bool (i < n) // end of [string_isnot_atend] //

Obviously, either one of them can be implemented based on the other. As an example, the following code implements a function that computes the length of a given string:

fun string_length {n:nat} ( str: string (n) ) : size_t (n) = let fun loop {i:nat | i <= n} .<n-i>. (str: string n, i: size_t i): size_t (n) = if string_isnot_atend (str, i) then loop (str, succ(i)) else i // end of [loop] in loop (str, i2sz(0)) end // end of [string_length]

Note that the function loop in the body of string_length is defined tail-recursively, which can then be translated into a genuine loop in the generated C code. Although this implementation of string_length looks fairly plain right now, it was actually an exciting achievement in the pursuit of practical programming with dependent types.

The following two functions are for accessing and updating characters stored in strings:

// typedef charNZ = [c:int | c != 0] char(c) // fun string_get_at {n:int}{i:nat | i < n}(str: string n, i: size_t i): charNZ overload [] with string_get_at // fun string_set_char_at {n:int}{i:nat | i < n}(str: string n, i: size_t i, c: charNZ): void overload [] with string_set_char_at //

The type constructor char is of the sort (char) -> t@ype, which takes a static character c to form a singleton type char(c) for the only character equal to c. Thus, the type charNZ is for all non-null characters. The following defined function string_find traverses a string from left to right to check whether a given character occurs in the string:

// typedef sizeLt (n:int) = [i:nat | i < n] size_t (i) // fun string_find{n:nat} ( str: string n, c0: char ) : Option (sizeLt n) = let typedef res = sizeLt (n) fun loop{i:nat | i <= n} ( str: string n, c0: char, i: size_t i ) : Option (res) = let val isnot = string_isnot_atend (str, i) in if isnot then if (c0 = str[i]) then Some{res}(i) else loop (str, c0, succ(i)) else None () // end of [if] end (* end of [loop] *) in loop (str, c0, i2sz(0)) end // end of [string_find] //

If the character c0 occurs in the string str, then a value of the form Some(i) is returned, when i refers to the position of the first occurrence of c0 (counting from left to right). Otherwise, the value None() is returned.

There is some inherent inefficiency in the implementation of string_find: A given position i is first checked to see if it is strictly less than the length of the string str by calling string_isnot_atend, and, if it is, the character stored at the position in the string is fetched by calling string_get_at. These two function calls are merged into one in the following implementation:

// // This implementation does the same as [string_find] // but should be more efficient. // fun string_find2{n:nat} ( str: string n, c0: char ) : Option (sizeLt n) = let // fun loop{i:nat | i <= n} ( str: string n , c0: char, i: size_t i ) : Option (sizeLt n) = let typedef res = sizeLt (n) val c = string_test_at (str, i) in if c != '000' then ( if (c0 = c) then Some{res}(i) else loop(str, c0, succ(i)) // end of [if] ) else None((*void*)) // end of [if] end // end of [loop] // in loop (str, c0, i2sz(0)) end // end of [string_find2]

The interface for the function string_test_at is given as follows:

fun string_test_at {n:int}{i:nat | i <= n} ( str: string (n), i: size_t (i) ) : [c:char | (c != NUL && i < n) || (c == NUL && i >= n)] char(c) // end of [string_test_at]

By checking the return value of a call to string_test_at, we can readily tell whether the position i is at the end of the string str.

Handling strings safely and efficiently is a complicated matter in programming language design, and a great deal of information about a programming language can often be revealed by simply studying its treatment of strings. In ATS, properly processing C-style strings also makes essential use of linear types, which I will cover in another part of this book.