Effective ATS: Copying Files

While the task of copying files is conceptually simple, it can still be quite interesting to implement it in ATS.

Attempt One

In order to copy the content of one file into another, we need a means to refer to the involved files. In Linux, the notion of file descriptor serves precisely this purpose. Although we know that a file descriptor is represented as an integer, it seems appropriate to make it abstract as is done in the following declaration:
abst@ype fildes = int
Often it is a good practice to give a name to an abstract type that is less likely to cause collision and then introduce a short alias for the name. For instance, the following declarations demonstrate such a practice:
abst@ype fildes_t0ype = int
stadef fildes: t@ype = fildes_t0ype
My naming convention uses the special identifier [t0ype] to indicate a type of the sort t@ype, that is, a type of unknown size. Note that the stadef-declaration can also be replaced with the following one:
typedef fildes = fildes_t0ype
Now let us name the file-copying function [fcopy1] and give it the following interface:
fun fcopy1 (src: fildes, dst: fildes): void
How should we implement [fcopy1]? For the moment, let us try to answer this question in a somewhat abstract manner.

Clearly, we should be able to read chars from [src] and also write chars into [dst]. So let us assume that the following two functions are available for use:

fun readch (src: fildes): char
fun writech (src: fildes, c: char): void
There is yet one more thing: We should be able to tell whether we have finished reading all the chars from a given file. One simple way to do this is to require that [readch] return a special value to indicate the end of a file being reached. For this purpose, we modify the interface of [readch] as follows:
fun readch (src: fildes): int
We use natural numbers, that is, non-negative integers for valid chars and a negative integer (e.g., -1) for the special value (indicating that the end of [src] is reached). We can now readily implement [fcopy1] as follows:
implement
fcopy1 (src, dst) = let
  val c = readch (src)
in
//
if c >= 0 then
  (writech (dst, c); fcopy1 (src, dst))
// end of [if]
//
end (* end of [fcopy1] *)
At this point, the obviously question is: How can functions [readch] and [writech] be implemented? They can be implemented based on the system calls [read] and [write]. Please find a completed running implementation of file-copying based on [fcopy1] in fcopy1.dats, where [readch] and [writech] are implemented in C directly.

Of course, there is a lot of criticism that can be laid upon the above implementation of file-copying. For instance, it is terribly inefficient; it does not support any error-handling at all; etc. I will attempt to address these issues in the following presentation. However, one thing that is extremely positive about this implementation is the introduction of functions [readch] and [writech], which adds a layer to shield system calls [read] and [write] from being used directly. This is a programming style I would recommend highly and repeatedly. What seems really unfortunate to me is that popular books on systems programming (e.g. APUE) often do very little, if at all, to advocate this very useful programming style!

Attempt Two

Obviously, [fcopy1] is very inefficient for copying files as it calls [read] and [write] each time to read and write only one char, respectively. If multiple chars are to be read at once, then a buffer (that is, some memory) must be made available to store them. As it is largely straightforward to handle only cases where such a buffer is statically allocated, I will focus on a solution that can also cope with dynamically allocated buffers. Let us first introduce an abstract type for buffers:
absvtype buffer_vtype = ptr
vtypedef buffer = buffer_vtype
Actually, [buffer_vtype] is introduced as a viewtype, that is, a linear type, and the following functions are for creating and destroying a buffer, respectively:
fun buffer_create (m: size_t): buffer
fun buffer_destroy (buf: buffer): void
As we also need to test whether a buffer contains any data or not, let us introduce the following function for this purpose:
fun buffer_isnot_empty (buf: !buffer): bool
In addition, let us use the names [readbuf] and [writebuf] for functions reading and writing multiple chars, respectively, and assign to them the following types:
fun readbuf (src: fildes, buf: !buffer): void
fun writebuf (dst: fildes, buf: !buffer): void
Let [fcopy2] be given the same interface as [fcopy1]. The following code gives a straightforward implementation of [fcopy2] based on [readbuf] and [writebuf]:
implement
fcopy2 (src, dst) = let
//
fun loop
(
  src: fildes, dst: fildes, buf: !buffer
) : void = let
  val () = readbuf (src, buf)
  val isnot = buffer_isnot_empty (buf)
in
  if isnot then let
    val () = writebuf (dst, buf) in loop (src, dst, buf)
  end else () // end of [if]
//
end // end of [loop]
//
val buf =
  buffer_create (i2sz(BUFSZ))
val () = loop (src, dst, buf)
val () = buffer_destroy (buf)
//
in
  // nothing
end (* end of [fcopy2] *)
Note that [BUFSZ] is a compile-time integer constant and [i2sz] is a cast-function for casting an integer of the type [int] to one of the type [size_t]. Please find the code of a completed running implementation of file-copying based on [fcopy2] in fcopy2.dats, where [readbuf] and [writebuf] are implemented in C directly.

Attempt Three

While the inefficiency of [fcopy1] is addressed in the implementation of [fcopy2], the absence of error-handling is not. I now give another implementation of file-copying and address the issue of error-handling in this implementation.

Clearly, a call to [read] or [write] can fail due to a variety of reasons. If such a failure occurs, we should probably stop file-copying and report an error. Let us introduce another file-copying function [fcopy3] as follows:

fun fcopy3 (src: fildes, dst: fildes, nerr: &int): void
The third argument [nerr] of [fcopy3] is a call-by-reference integer. In other words, what is passed as the third argument of [fcopy3] is the address of a left-value. If an error caused by [read] or [write] occurs during file-copying, then the value of the integer stored in [nerr] should be increased. To achieve this, we can modify the types of [readbuf] and [writebuf] as follows:
fun readbuf (src: fildes, buf: !buffer, nerr: &int): void
fun writebuf (dst: fildes, buf: !buffer, nerr: &int): void
The function [readbuf] calls [read]; if this call reports an error, then [readbuf] should increase the value of the integer stored in its third argument. The function [writebuf] does the same with [write]. The following code gives an implementation of [fcopy3] similar to that of [fcopy2]:
implement
fcopy3 (src, dst, nerr) = let
//
fun loop
(
  src: fildes, dst: fildes, buf: !buffer, nerr: &int
) : void = let
  val nerr0 = nerr
  val () = readbuf (src, buf, nerr)
  val isnot = buffer_isnot_empty (buf)
in
  if isnot then let
    val () = writebuf (dst, buf, nerr)
  in
    if nerr = nerr0 then loop (src, dst, buf, nerr) else ((*error*))
  end else () // end of [if]
//
end // end of [loop]
//
val buf = buffer_create (i2sz(BUFSZ))
val ((*void*)) = loop (src, dst, buf, nerr)
val ((*void*)) = buffer_destroy (buf)
//
in
  // nothing
end (* end of [fcopy3] *)
Note that the loop function exits whenever an error due to [read] or [write] is reported. Please find the code of a completed running implementation of file-copying based on [fcopy3] in fcopy3.dats, where [readbuf] and [writebuf] are implemented in C directly.

This article is written by Hongwei Xi.