Summary: | The discipline of remote sensing is concerned with observing the earth's suface using different portions of the electro-magnetic spectrum. Earth orbiting satellites will soon collect terabytes of data per day with increased accuracy. Automated parallel algorithms are essential to quickly process this large amount of data. Data parallel languages have been used effectively for the diverse algorithms found in such systems. With improved network technology, it is now feasible to build data parallel supercomputers using traditional RISC-based workstations connected by a high-speed network. This dissertation presents Cluster-C$\sp\*$, an architecture that implements the data parallel language C$\sp\*$ on a cluster of workstations. A specialized language run-time system and network protocols effectively integrate the cluster components to form a dedicated, efficient multiprocessor environment. A series of analytic, empirical, and simulation techniques quantify the cluster's performance. A nine program test suite, derived from remote sensing and image understanding algorithms, provides a basis for cluster evaluation. An in-depth look at the communication behavior of the test suite supports prediction of algorithm performance on the cluster, as well as important architectural design insights. The test suite is executed on a cluster of 8 HP 720 workstations and a 32-node (128 vector unit) CM-5 to establish a concrete performance baseline. The result is that, under some conditions, the cluster is faster on an absolute scale, and that on a relative, per-node scale, the cluster delivers superior performance in all cases. Finally, a trace-driven simulator, based on these empirical measurements, supports predictions of the cluster's scalability and performance when equipped with next generation workstation and network technologies. Simulations show that Gigabit networks have the necessary bandwidth to build clusters with hundreds of nodes. Furthermore, even a modestly enhanced cluster, consisting of 16 high-end workstations connected by a 600 Mbps token ring will out-perform a 32-node CM-5 in all but a few cases.
|