Summary: | 碩士 === 靜宜大學 === 資訊工程學系碩士班 === 98 === Current parallel hardware architectures commonly used to execute parallel code can be broadly categories into shared-memory (SM), distributed-memory (DM), and distributed shared-memory (DSM) systems. DM systems and clusters have been popular because of their price/performance ratio and scalability. Even the high performance computing machine is a multi-core node interconnected by a high speed network. To write a parallel FFT code on these machines, in this thesis, we implemented a hybrid MPI and OpenMP parallel programming for FFT. The parallel MPI FFT code uses mainly SPMD and master-worker programming styles at the first level of parallelism. OpenMP is used to exploit second level of parallelism within a node, most other implementation, at this level of parallelism, a naïve loop level parallelization by adding omp for directive of OpenMP has been used. We implemented this level of parallelism within a node using an OpenMP SPMD style instead. Hence, our implementation style is SPMD within SPMD. We present our implementation and discuss the experimental results. We compare the results with MPI version of FFTW. Our hybrid model show promising and it will have better performance when our pure MPI FFT code is further improved.
|