Parallel simulations using recurrence relations and relaxation

This thesis develops and evaluates a number of efficient algorithms for performing parallel simulations. These algorithms achieve approximate linear speed-up, in the sense that their run times are in the order of O(n/p), when n is the size of the problem and p is the number of processors employed. T...

Full description

Bibliographic Details
Main Author: McGough, Andrew Stephen
Published: University of Newcastle Upon Tyne 2000
Subjects:
005
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340720
Description
Summary:This thesis develops and evaluates a number of efficient algorithms for performing parallel simulations. These algorithms achieve approximate linear speed-up, in the sense that their run times are in the order of O(n/p), when n is the size of the problem and p is the number of processors employed. The systems that are being simulated are related to ATM switches and sliding window communication protocols. The algorithms presented first are concern with the parallel generation and merging of bursty arrival sources, marking and deleting of lost cells due to buffer overflows and computation of departure instants. They work well on shared memory multiprocessors. However, different techniques need to be emulated in order to achieve similar speed-ups on a distributed cluster of workstations. The main obstacle is the inter-process communication overhead. To overcome it, new algorithms are developed that reduce considerably the amount of information transferred between processors. They are applied both to the ATM switch and to the sliding window protocol with feedbacks. In all cases, the methodology relies in reducing the simulation task to a set of recurrence relations. The latter are solved using the techniques of parallel prefix computation, parallel merging and relaxing. The effectiveness of these algorithms is evaluated by comparing their run times with that of an optimized sequential algorithm. A number of experiments are carried out on a 12-processor shared memory system, and also on a distributed cluster of 12 processors connected by a fast Ethernet.