Summary: | 碩士 === 臺灣大學 === 資訊工程學研究所 === 95 === Almost every computation job in the cluster or grid systems requires input data in order to find the solution, and the computation cannot proceed without the required data become available. As a result a proper interleaving of data transfer and job execution has a significant impact on the overall efficiency. In this paper we analyze the computational complexity of the shared data job scheduling problem, with and without consideration of storage capacity constraint. We show that if there is an upper bound on the server capacity, the problem is NP-complete, even when each job depends on at most three data. On the other hand, if there is no upper bound on the server capacity, we show that there exists an efficient algorithm that gives optimal job schedule when each job depends on at most two data. We also give an effective heuristic algorithm that gives good schedule for cases where there is no limit on the number of data a job may access. Experimental results indicate that this heuristic algorithm performs very well, and gives near optimal solutions.
|