Abstract:
W. Dittrich, D. Hutchinson and A.
Maheshwari, “Blocking in Parallel Multisearch Problems”,
Theory of Computing Systems, 34(2):
145-189, 2001 (invited papers from the 1998 ACM-SPAA conference).
External memory (EM) algorithms are designed for
computational problems in which the size of the internal memory of the computer
is only a small fraction of the problem size. Block-wise access to data
is a central theme in the design of
efficient EM algorithms. A similar requirement arises in the transmission of
data between processors in high performance parallel computation systems, for
which block-wise communication is a crucial issue.
We consider {\em multisearch} problems, where a large number
of queries are to be simultaneously processed and satisfied by navigating through
large data structures on parallel computers. Our examples originate as
algorithms for parallel machines, and we adapt them to the EM situation where
the queries and data structure are considered to be much larger than the size
of the available internal memory.
This paper presents techniques to achieve blocking for I/O
as well as for communication in multisearch on the BSP and EM-BSP models. We describe improvements to the \emph{1-optimal} BSP* multisearch
algorithm of \cite{BDM97} which permit larger search trees to be handled.
In the area of EM algorithms new algorithms for multisearch
in balanced trees are described. For search trees of size $O(n\log
n)$ where $n$ is the number of queries, we obtain a work-optimal, parallel,
randomized EM multisearch algorithm whose I/O and communication time are
smaller, asymptotically, than the computation time. We obtain a deterministic
version via a similar technique. These algorithms are obtained via the
simulation techniques of \cite{DDH97, dittrich:thesis,
DDHM99, DDHM99a, hutchinson:thesis}.
For larger trees we describe a parallel, EM algorithm which is
\emph{1-optimal} considering computation, communication, and I/O (for suitable
parameter constellations) plus I/O-optimal.
We give a lower bound to the number of I/O operations required for
filtering $n$ queries through a binary or multi-way search tree of size $m$
when $m\geq n^{2+\epsilon}$, for a constant
$\epsilon>0$.