Abstract:
Parallel independent disks can
enhance the performance of external memory (EM) algorithms, but the programming
task is often difficult. Each disk can service only one read or write request at
a time; the challenge is to keep the disks as busy as possible. In this
article, we develop a randomized allocation discipline for parallel independent
disks, called randomized cycling.We show how it can be
used as
the basis for an efficient distribution sort algorithm, which we call randomized
cycling distribution sort (RCD).We prove that the expected I/O complexity
of RCD is optimal. The analysis uses a novel reduction to a scenario
with significantly fewer probabilistic interdependencies.We demonstrate RCD’s
practicality by experimental simulations. Using the randomized cycling
discipline, algorithms
developed for the unrealistic multihead disk model can be simulated on the realistic parallel disk model for the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time. In particular, algorithms based upon the well-known distribution and merge paradigms of EM computation can be optimally extended from a single disk to parallel disks.