Strings and String Buffers


The representation of a string in ATS is exactly the same as in C, allowing most functions declared in the header file <string.h> to be imported into ATS directly. A variety of string functions are declared in prelude/SATS/string.sats.

Given an integer I, a string of length I is a pointer to some memory location where a seqence of bytes are stored: the first I bytes in this sequence are non-null bytes, and the next one is the null byte, and then the rest of bytes are unspecified. We use string both as a type and a type constructor of the sort int -> type. The type string is for strings of unspecified length while the type string (I) is for strings whose length equals I, and the following casting function is relating the former to the latter:

castfn string1_of_string (s: string): [n:nat] string n

Common Functions for Processing Strings

The following function string_is_at_end is of great use for processing strings sequentially:
fun string_is_at_end
  {n,i:nat | i <= n} (s: string n, i: int i):<> bool (i == n)
  = "atspre_string_is_at_end"
Given a string of length n and a natural number i bounded by n, the function string_is_at_end tests whether i equals n. Essentially, it checks whether the ith character in the string, where counting starts from 0, is the null byte.

As an example, the following implementation of the length function for strings makes use of the function string_is_at_end:

fn string_length {n:nat}
  (s: string n):<> size_t n = loop (s, 0) where {
  fun loop {i:nat | i <= n} .<n-i>.
    (s: string n, i: size_t i):<> size_t n =
    if string_is_at_end (s, i) then i else loop (s, i+1)
  // end of [loop]
} // end of [string_length]
The following two functions string_get_char_at and string_set_char_at are for accessing and updating charaters stored in a string:
#define NUL '\000'

fun string_get_char_at {n:nat}
  (s: string n, i: natLt n):<> [c:char | c <> NUL] char c

fun string_set_char_at {n:nat} {c:char | c <> NUL}
  (s: string n, i: natLt n, c: char c):<!ref> void

overload [] with string_get_char_at
overload [] with string_set_char_at
Note that only non-null characters can be stored in a string.

String Buffers

A string buffer in ATS is just a linear string. Given two integers M and N satisfying M <= N, the type strbuf (M, N) is for a sequence of M bytes such that the first N bytes in this sequence are not null and the next one (following the first N bytes) is null. The following declared functions for string buffers are just the variant of the previously introduced functions for strings:
fun strbuf_is_at_end
  {m,n,i:nat | i <= n} (sb: &strbuf (m, n), i: int i):<> bool (i == n)

fun strbuf_get_char_at {m,n:nat}
  (sb: &strbuf (m, n), i: natLt n):<> [c:char | c <> NUL] char c

fun strbuf_set_char_at {m,n:nat} {c: char | c <> NUL}
  (sb: &strbuf (m, n), i: natLt n, c: char c):<> void
As an example, the following program implements a function that turns each lowercase letter in a string buffer into the corresponding uppercase one:
fn strbuf_toupper {m,n:nat}
  (s: &strbuf (m, n)):<> void = loop (s, 0) where {
  extern fun toupper (c: c1har):<> c1har = "atspre_char_toupper"
  fun loop {i:nat | i <= n} .<n-i>. (s: &strbuf (m, n), i: size_t i):<> void =
    if strbuf_is_at_end (s, i) then ()
    else let
      val () = s[i] := toupper (s[i]) in loop (s, i+1)
    end // end of [if]
} // end of [strbuf_toupper]
Note that c1har is just a shorthand for [c:char | c <> NUL] char c, which is a type for all non-null characters. Let us see another example. In C, the length of a string can be computed as follows:
int strlen (char *s) {
  char *p = s ;
  while (*p != '\000') ++p ;
  return (p - s) ;
}
We give as follows an implementation for computing the length of a string buffer that corresponds to the above C code:
extern fun strlen {m,n:nat} (s: &strbuf (m, n)):<> size_t n

implement strlen {m,n} (s) = let
  stadef bsz = sizeof(byte)
  macdef bsz = sizeof<byte>
  fun loop {m,n:nat} {l:addr} {ofs:int} .<m>. (
      pf: !strbuf (m, n) @ l
    , pf_mul: MUL (n, bsz, ofs)
    | p: ptr l
    ) :<> ptr (l + ofs) = let
    prval (pf1, pf2) = strbuf_v_uncons (pf)
    val c = !p
  in
    if (c = NUL) then let
      prval strbufopt_v_none pf2 = pf2
      prval MULbas () = pf_mul
    in
      pf := strbuf_v_null (pf1, pf2); p
    end else let
      prval () = eqsize_byte_char ()
      prval strbufopt_v_some pf2 = pf2
      prval pf1_mul = mul_add_const {~1} (pf_mul)
      val p_end = loop (pf2, pf1_mul | p+bsz)
    in
      pf := strbuf_v_cons (pf1, pf2); p_end
    end // end of [if]
  end // end of [loop]
  val p_beg = &s
  prval pf_mul = mul_istot {n,bsz} ()
  val p_end = loop (view@ s, pf_mul | p_beg)
  prval () = eqsize_byte_one () where {
    extern praxi eqsize_byte_one (): [bsz == 1] void
  } // end of [val]
  prval () = mul_elim {n,1} (pf_mul)
in
  size1_of_ptrdiff1 (p_end - p_beg)
end // end of [strlen]
Please see prelude/SATS/string.sats for various dataview constructors and proof functions in this implementation. In general, it can be quite involved in ATS to handle strings in a style that is often employed in C code.

String Creation

There are various function for creating strings that are delcared in prelude/SATS/string.sats. For instance, the following function string_make_substring creates a string by copying a segment of another string:
fun string_make_substring
  {n:int} {st,ln:nat | st + ln <= n}
  (s: string n, st: int st, ln: int ln):<> string ln
In order to implement a string creation function, we often need a function such as the following one for initializing a string buffer:
fun strbuf_initialize_cloptr {m,n:nat | m > n} (
    pf_buf: !b0ytes m @ l >> strbuf (m, n) @ l
  | p: ptr l, n: int n, f: !natLt n - c1har
  ) :<> void
An implementation of strbuf_initialize_cloptr is given below:
implement strbuf_initialize_cloptr {m,n}
  (pf_buf | p_buf, n, f) = loop (pf_buf | p_buf, f, 0) where {
  fun loop {i:nat | i <= n} {l:addr} .<n-i>. (
      pf: !b0ytes (m-i) @ l >> strbuf (m-i, n-i) @ l
    | p: ptr l, f: !sizeLt n -<cloptr> c1har, i: size_t i)
    :<cloref> void = let
    prval () = eqsize_byte_char ()
    prval (pf1, pf2) = array_v_uncons {byte?} (pf)
    prval pf1 = char_v_of_b0yte_v (pf1)
  in
    if i < n then let
      val () = !p := f (i)
      val () = loop (pf2 | p + sizeof<byte>, f, i + 1)
      prval () = pf := strbuf_v_cons (pf1, pf2)
    in
      // empty
    end else let
      val () = !p := NUL
      prval () = pf := strbuf_v_null (pf1, pf2)
    in
      // empty
    end // end of [if]
  end // end of [loop]
} // end of [val]
With strbuf_initialize_cloptr, the string creation function string_make_substring can be readily implemented as follows:
implement string_make_substring (str, st, ln) = let
  val (pf_gc, pf_buf | p_buf) = malloc_gc (ln + 1)
  prval () = free_gc_elim (pf_gc) // give the certificate to GC
  val () = strbuf_initialize_cloptr (pf_buf | p_buf, ln, lam (i) => str[st + i])
in
  string1_of_strbuf (pf_buf | p_buf)
end // end of [string_make_substring]
Note that a linear closure (cloptr) representing lam (i) => str[st + i] is formed for each call to strbuf_initialize_cloptr and this closure is freed once the call returns. The string created by a call to string_make_substring is persistent, and it can only be freed by GC.

The code used for illustration is available here.