|
|
|
|
LEADER |
02815 am a22003253u 4500 |
001 |
85937 |
042 |
|
|
|a dc
|
100 |
1 |
0 |
|a Ansel, Jason Andrew
|e author
|
100 |
1 |
0 |
|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
|e contributor
|
100 |
1 |
0 |
|a Ansel, Jason Andrew
|e contributor
|
100 |
1 |
0 |
|a Pacula, Maciej
|e contributor
|
100 |
1 |
0 |
|a Wong, Yee Lok
|e contributor
|
100 |
1 |
0 |
|a Chan, Cy
|e contributor
|
100 |
1 |
0 |
|a Olszewski, Marek Krystyn
|e contributor
|
100 |
1 |
0 |
|a O'Reilly, Una-May
|e contributor
|
100 |
1 |
0 |
|a Amarasinghe, Saman P.
|e contributor
|
700 |
1 |
0 |
|a Pacula, Maciej
|e author
|
700 |
1 |
0 |
|a Wong, Yee Lok
|e author
|
700 |
1 |
0 |
|a Chan, Cy
|e author
|
700 |
1 |
0 |
|a Olszewski, Marek Krystyn
|e author
|
700 |
1 |
0 |
|a O'Reilly, Una-May
|e author
|
700 |
1 |
0 |
|a Amarasinghe, Saman P.
|e author
|
245 |
0 |
0 |
|a SiblingRivalry: Online Autotuning Through Local Competitions
|
260 |
|
|
|b Association for Computing Machinery,
|c 2014-03-27T20:28:48Z.
|
856 |
|
|
|z Get fulltext
|u http://hdl.handle.net/1721.1/85937
|
520 |
|
|
|a Modern high performance libraries, such as ATLAS and FFTW, and programming languages, such as PetaBricks, have shown that autotuning computer programs can lead to significant speedups. However, autotuning can be burdensome to the deployment of a program, since the tuning process can take a long time and should be re-run whenever the program, microarchitecture, execution environment, or tool chain changes. Failure to re-autotune programs often leads to widespread use of sub-optimal algorithms. With the growth of cloud computing, where computations can run in environments with unknown load and migrate between different (possibly unknown) microarchitectures, the need for online autotuning has become increasingly important. We present SiblingRivalry, a new model for always-on online autotuning that allows parallel programs to continuously adapt and optimize themselves to their environment. In our system, requests are processed by dividing the available cores in half, and processing two identical requests in parallel on each half. Half of the cores are devoted to a known safe program configuration, while the other half are used for an experimental program configuration chosen by our self-adapting evolutionary algorithm. When the faster configuration completes, its results are returned, and the slower configuration is terminated. Over time, this constant experimentation allows programs to adapt to changing dynamic environments and often outperform the original algorithm that uses the entire system.
|
520 |
|
|
|a United States. Dept. of Energy (DOE Award DE-SC0005288)
|
546 |
|
|
|a en_US
|
655 |
7 |
|
|a Article
|
773 |
|
|
|t Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems - CASES '12
|