A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing
碩士 === 國立中山大學 === 資訊工程學系研究所 === 102 === This thesis designed an instruction-level-parallelism processor for the embedded system with general purpose computations. The hardware of the embedded system is small-scalar then currently popular CPU or GPU. We exploit some techniques to enhance the instruct...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/61446172694694401702 |
id |
ndltd-TW-102NSYS5392024 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NSYS53920242017-04-23T04:27:01Z http://ndltd.ncl.edu.tw/handle/61446172694694401702 A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing 在SIMD處理器上透過包裝暫存器的方法研究可平行化特性上的限制 Rou-Jia Chen 陳柔佳 碩士 國立中山大學 資訊工程學系研究所 102 This thesis designed an instruction-level-parallelism processor for the embedded system with general purpose computations. The hardware of the embedded system is small-scalar then currently popular CPU or GPU. We exploit some techniques to enhance the instruction scheduling time of our SIMD processor. By applying branch-and-bound ways to modify algorithm that maintain optimality includes PRSR (pseudo random shift register), memorization, and register grouping. And we also support heuristic ways that is a mental shortcut that allow us to solve exhaustive searching quickly and efficiently such as unrolling optimization, instruction distribution, and sign constraint. Through register packing and loop unrolling, we applied our SIMD processor on Mibench and have a compatible performance with VLIW processor; moreover, our register packing allows for a vector-wide load from the SRAM. Such a load is a natural fit to a SIMD and achieves significant speedups, when our allocator is used. Steve W. Haga 希家史提夫 2014 學位論文 ; thesis 92 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊工程學系研究所 === 102 === This thesis designed an instruction-level-parallelism processor for the embedded system with general purpose computations. The hardware of the embedded system is small-scalar then currently popular CPU or GPU. We exploit some techniques to enhance the instruction scheduling time of our SIMD processor.
By applying branch-and-bound ways to modify algorithm that maintain optimality includes PRSR (pseudo random shift register), memorization, and register grouping. And we also support heuristic ways that is a mental shortcut that allow us to solve exhaustive searching quickly and efficiently such as unrolling optimization, instruction distribution, and sign constraint.
Through register packing and loop unrolling, we applied our SIMD processor on Mibench and have a compatible performance with VLIW processor; moreover, our register packing allows for a vector-wide load from the SRAM. Such a load is a natural fit to a SIMD and achieves significant speedups, when our allocator is used.
|
author2 |
Steve W. Haga |
author_facet |
Steve W. Haga Rou-Jia Chen 陳柔佳 |
author |
Rou-Jia Chen 陳柔佳 |
spellingShingle |
Rou-Jia Chen 陳柔佳 A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing |
author_sort |
Rou-Jia Chen |
title |
A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing |
title_short |
A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing |
title_full |
A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing |
title_fullStr |
A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing |
title_full_unstemmed |
A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing |
title_sort |
study of the limits of parallelism available in simd processors through register packing |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/61446172694694401702 |
work_keys_str_mv |
AT roujiachen astudyofthelimitsofparallelismavailableinsimdprocessorsthroughregisterpacking AT chénróujiā astudyofthelimitsofparallelismavailableinsimdprocessorsthroughregisterpacking AT roujiachen zàisimdchùlǐqìshàngtòuguòbāozhuāngzàncúnqìdefāngfǎyánjiūkěpíngxínghuàtèxìngshàngdexiànzhì AT chénróujiā zàisimdchùlǐqìshàngtòuguòbāozhuāngzàncúnqìdefāngfǎyánjiūkěpíngxínghuàtèxìngshàngdexiànzhì AT roujiachen studyofthelimitsofparallelismavailableinsimdprocessorsthroughregisterpacking AT chénróujiā studyofthelimitsofparallelismavailableinsimdprocessorsthroughregisterpacking |
_version_ |
1718443173954453504 |