ATS is a feature-rich language, and dependent types, linear types and embeddable templates can be seen as three of its most prominent features. In this article, I intend to present a template-based approach to facilitating parallel computation. More specifically, I would like to demonstrate how a template-based package (of the name atscntrb-bucs320-divideconquerpar) can be readily employed to parallelize a program that one constructs based on the so-called divide-and-conquer problem-solving strategy.
In problem-solving, divide-and-conquer (D&C) is a strategy that divides a given problem of certain size into a collection of subproblems of lesser size and then solves the subproblems (recursively) and then combines the obtained solutions to the subproblems to form a solution to the original given problem. Please find a template-based approach to (D&C) on-line.
For parallelizing D&C, we are to make use of a workshop, which can be thought of as a collection of workers (where each worker is just a running thread). For example, we may create a workshop of 4 workers on a machine of 4 cores so that each worker can run as a thread on one distinct core.
As D&C is inherently recursive, we must address the potential issue of a worker becoming blocked when waiting for a recursive call to return if we implement D&C in the direct style (in contrast to the continuation-passing style (CPS)). In addition, using the direct style means that we have to insert code to perform synchronization explicitly (for instance, to notify a blocked worker to continue once all of the recursive calls made by it have returned). On the other hand, the issue of a worker becoming blocked can be completely avoided if we are to implement D&C in the CPS-style. Without blocking, the need for explicit synchronization is also gone.
Please find the entirety of the code for the template-based package atscntrb-bucs320-divideconquerpar plus some examples on-line, where an implementation of D&C in parallel is given in the CPS-style. And further explanation on parallelization via CPS can be found in this article. The rest of this article focuses on demonstrating a typical use of this package in the construction of certain parallel recursive programs.
The following simple recursive function Fibo computes Fibonacci numbers:
fun
Fibo(n: int): int =
if (n >= 2) then Fibo(n-1)+Fibo(n-2) else n
Suppose that we want to parallelize Fibo so as to
allow multiple cores to be utilized together to perform computation.
One simple way to proceed is to first install the following npm packages:
atscntrb-hx-fworkshop atscntrb-hx-threadkit atscntrb-bucs320-divideconquer atscntrb-bucs320-divideconquerparPlease visit www.npmjs.com for authoritative information on npm packages if needed.
The following lines are typically included in a setting for gaining access to functions declared in the packages atscntrb-bucs320-divideconquer and atscntrb-bucs320-divideconquerpar:
(* ****** ****** *) // #include "$PATSHOMELOCS/atscntrb-bucs320-divideconquerpar/mydepies.hats" #include "$PATSHOMELOCS/atscntrb-bucs320-divideconquerpar/mylibies.hats" // (* ****** ****** *) // #staload $DivideConquer // opening namespace #staload $DivideConquerPar // opening namespace // (* ****** ****** *)Conventionally, the file mydepies.hats in a package is for statically loading namespaces in various packages the current package depends on, which may not be present, and the file mylibies.hats for namespaces the current package provides, which should always be present.
Let us choose the name ParFibo for the parallelized version of Fibo, which is given the following interface:
fun ParFibo(fws: fworkshop, n: int): int
The type fworkshop is for a workshop of workers (of which
each is just a running thread). For implementation, please study the
package atscntrb-hx-fworkshop. An implementation of
ParFibo is given as follows:
implement ParFibo (fws, n) = DivideConquer$solve<>(n) where { // // $tempenver indicates // to the compiler that [fws] // needs to be inserted into the // environment of each closure-function // generated during the compilation of // the following body of [ParFibo] // val () = $tempenver(fws) // implement DivideConquer$base_test<>(n) = (n <= CUTOFF) implement DivideConquer$base_solve<>(n) = Fibo( n ) // implement DivideConquer$divide<>(n) = g0ofg1($list{int}(n-1, n-2)) implement DivideConquer$conquer$combine<>(_(*n*), rs) = rs[0] + rs[1] // implement DivideConquerPar$fworkshop<>((*void*)) = FWORKSHOP_chanlst(fws) // } (* end of [ParFibo] *)Note that some detailed explanation of this style of template-based programming can be found on-line. The given implementation of the function template DivideConquerPar$fworkshop indicates that a list-based channel (chanlst) is used for storing the subproblems created during the divide phase in D&C: Each worker fetches from the channel the next subproblem to solve and also inserts into the channel the subproblems it creates when solving one. It is also possible for one to choose an array-based channel (channel) for storing subproblems, and more relevant details can be found in some examples available on-line.
In the following code, a workshop fws is first created and then two workers are added into the workshop:
implement main0(argc, argv) = { // #define N 2 // val fws = $FWS.fworkshop_create_exn() // val added = $FWS.fworkshop_add_nworker(fws, N) val () = prerrln!("the number of workers = ", added) // val n0 = ( if argc >= 2 then g0string2int(argv[1]) else 45 ) : int // end of [root] // val () = println!("ParFibo(", n0, ") = ", ParFibo(fws, n0)) // } (* end of [main0] *)Calling ParFibo on the workshop fws and a chosen integer n0 makes use of the two workers in a parallel computation that yields the value of Fibo(n0). For instance, the following times were reported when I called ParFibo to compute Fibo(45) (which equals 102334155):
real: 0m0.417s user: 0m0.828sThe real time being about half of the user time is due to two workers running on two cores in parallel.
Computing Fibonacci numbers is a bit boring. As another example of D&C in parallel, let us implement a higher-order function DirWalk for recursively traversing a given directory and processing the files encountered during the traversal. The interface of DirWalk is given as follows:
fun
DirWalk
( fws: fworkshop
, fname: string, fopr: cfun(string, int)): int
The argument fopr
is supposed to process a non-directory
file to which a path (represented as a string) is given. That
fopr returns an int-value is just an arbitrary
choice made to simplify the presentation.
An implementation of DirWalk is given as follows:
implement DirWalk (fws, fname, fopr) = let // val () = $tempenver(fws) val () = $tempenver(fopr) // implement DivideConquer$base_test<> (fname) = ( test_file_isdir(fname) <= 0 ) // DivideConquer$base_test<> // implement DivideConquer$base_solve<> (fname) = fopr(fname) // // implement DivideConquer$divide<> (dir) = ( // let // val files = streamize_dirname_fname(dir) val files = stream_vt_filter_cloptr<string> (files, lam(x) => ~dir_skipped(x)) val files = stream_vt_map_cloptr<string><string> (files, lam(file) => string_append3(dir, "/", file)) // in list0_of_list_vt(stream2list_vt(files)) end // end of [let] // ) (* end of [DivideConquer$divide<>] *) // implement DivideConquer$conquer$combine<> (_, rs) = ( list0_foldleft<int><int>(rs, 0, lam(res, r) => res + r) ) // implement DivideConquerPar$fworkshop<>((*void*)) = FWORKSHOP_chanlst(fws) // in DivideConquer$solve<>( fname ) end // end of [DirWalk]
Let us focus on the implementation of DivideConquer$divide; given a path to a directory, the function streamize_dirname_fname returns a (linear) stream of names of the files residing inside the directory; the following call to stream_vt_filter_cloptr removes the special names "." and ".." from the stream; the following call to stream_vt_map_cloptr maps each name in the stream to a path formed by joining the name of the directory, the special separator "/" and the name; and finally the stream is turned into a list to be returned.
The implementation of DivideConquer$conquer$combine simply returns the tally of all of the int-values obtained from solving subproblems (by recursively calling DirWalk on them).
The testing code for DirWalk returns the number of (non-directory) files in a given directory while printing (onto the stdout) the path to each file followed by the number of lines in the file.
While DirWalk can indeed traverse a given directory recursively in parallel, there is no guarantee that parallel traversal is more efficient than sequential traversal. A reasonable expectation would be that the time saved by calling fopr in parallel can compensate the overhead incurred due to parallelism. Unlike the previous example of ParFibo, there can be a lot of I/O operations during the evaluation of a call to DirWalk, adding another layer of uncertainty.
Please find in the following files the entirety of the code presented in this article:
ParFibo.dats DirWalk.datsThere is also an accompanying Makefile that can be used for compiling and testing the code (via the make utility).
This article is written by Hongwei Xi.