Unsafe C-style Programming in ATS

ATS is probably not a programming language easy for one to write code in. While ATS provides many features to support safe (low-level) programming, it may take a long time and some great efforts for a programmer to learn and then master these features before he or she can make effective use of them. In this section, I would like to present some ATS code written in C-style that makes typical use of certain unsafe programming features in ATS. This is a programming style that should be familiar to any programmer who can write C code competently.

There are always occasions where one may find it sensible to program in unsafe C-style. Sometimes, one just wants to get a running implementation and then relies on testing to detect and fix bugs. Sometimes, one simply does not know enough of ATS needed to implement a function in a safe programming manner. This list of occasions can be readily extended as one wishes. I myself do unsafe C-style programming in ATS frequently, and I see it as a necessary skill for anyone who not just only wants to be able to write code in ATS but also wants to do it highly productively. Let us now see a concrete example of unsafe C-style programming in ATS.

Suppose that we want to implement a function for comparing two given strings according to the standard lexicographic ordering. Let us name the function strcmp and give it the following interface:

fun strcmp (str1: string, str2: string): int

Given two strings str1 and str2, strcmp(str1, str2) is expected to return 1, -1, and 0 if str1 is greater than, less than, and equal to str2, respectively. An implementation of strcmp is given as follows:

staload UN = "prelude/SATS/unsafe.sats" (* ****** ****** *) implement strcmp (str1, str2) = let // fun loop ( p1: ptr, p2: ptr ) : int = let // val c1 = $UN.ptr0_get<uchar>(p1) val c2 = $UN.ptr0_get<uchar>(p2) // in case+ 0 of | _ when c1 > c2 => 1 | _ when c1 < c2 => ~1 | _ (* c1 = c2 *) => ( if $UN.cast{int}(c1) = 0 then 0 else loop (ptr0_succ<uchar>(p1), ptr0_succ<uchar>(p2)) // end of [if] ) end (* end of [loop] *) // in loop (string2ptr(str1), string2ptr(str2)) end (* end of [strcmp] *)

For a programmer familar with C, the above implementation of strcmp should be easily accessible. There are a variety of unsafe functions declared in unsafe.sats. Given a type T and a pointer p, ptr0_get<T>(p) fetches the value of the type T stored at the location to which p points. Note that ptr0_get is inherently unsafe as there is simply no guarantee that p actually points to a valid memory location where a value of the type T is stored. The function cast, which is also inherently unsafe, casts the type of a given value into any chosen type. The function template ptr0_succ, which is declared in pointer.sats, is type-safe. Given a type T, ptr0_succ<T>(p) returns the pointer that is n bytes after p, where n equals the size of T.

Please find the entire code for this example on-line.

For a function like strcmp, one can readily implement it in C directly. For instance, an implementation of strcmp in C, which is essentially a translation of the above implementation of strcmp in ATS, is given as follows:

int strcmp (char *p1, char *p2)
{
  int res ;
  unsigned char c1, c2;
  while (1)
  {
    c1 = *p1; c2 = *p2;
    if (c1 > c2) { res =  1; break; } ;
    if (c1 < c2) { res = -1; break; } ;
    if ((int)c1==0) { res = 0 ; break ; } else { p1++; p2++; } ;
  }
  return res ;
}

However, writing ATS code in C-style can often have advantages over writing C code directly. For instance, there is direct support in ATS but not in C for implementing function templates. In C, one is essentially forced to rely on rather involved use of macros to implement function templates, which makes the code not only difficult to follow but also notoriously error-prone. Let us now see as follows a function template implementation in ATS that is partly type-unsafe.

Suppose we want a function for copying into a given array the elements stored in a list. Let us name the function array_copy_from_list and give it the following interface:

fun{a:t@ype} array_copy_from_list (A: array0(a), xs: list0(a)): void

Given a type T, array0(T) is for an array0-value containing a pointer p and a size n such that p points to a C-style array storing n elements of the type T.

For the moment, let us require that the size of the array A equals the length of the list xs when array_copy_from_list(A, xs) is called. Following is an implementation of array_copy_from_list in ATS that makes use of an unsafe function (ptr0_set) declared in unsafe.sats:

staload UN = "prelude/SATS/unsafe.sats" (* ****** ****** *) implement {a}(*tmp*) array_copy_from_list (A, xs) = let // fun loop ( p: ptr, xs: list0(a) ) : void = ( case+ xs of | list0_nil() => () | list0_cons(x, xs) => let val () = $UN.ptr0_set<a>(p, x) in loop (ptr0_succ<a>(p), xs) end // end of [list0_cons] ) (* end of [loop] *) // in loop (array0_get_ref(A), xs) end // end of [array_copy_from_list]

Given a type T, a pointer p, and a value x of the type T, ptr0_set<T>(p, x) stores the value x at the location to which p points. Like ptr0_get, ptr0_set is inherently unsafe as there is simply no guarantee that p actually points to a valid memory location where a value of the type T can be stored. The function array0_get_ref, which is declared in array0.sats, returns the pointer to the C-style array associated with a given array0-value.

Please find the entire code for this example on-line.