Summary: | As DRAM is facing the scaling difficulty in terms of energy cost and reliability, some nonvolatile storage materials were proposed to be the substitute or supplement of main memory. Phase Change Memory (PCM) is one of the most promising nonvolatile memory that could be put into use in the near future. However, before becoming a qualified main memory technology, PCM should be designed reliably so that it can ensure the computer system's stable running even when errors occur. The typical wear-out errors in PCM have been well studied, but the transient errors, that caused by high-energy particles striking on the complementary metal-oxide semiconductor (CMOS) circuit of PCM chips or by resistance drifting in multi-level cell PCM, have attracted little focus. In this paper, we propose an innovative mechanism, Local-ECC-Global-ECPs (LEGE), which addresses both soft errors and hard errors (wear-out errors) in PCM memory systems. Our idea is to deploy a local error correction code (ECC) section to every data line, which can detect and correct one-bit errors immediately, and a global error correction pointers (ECPs) buffer for the whole memory chip, which can be reloaded to correct more hard error bits. The local ECC is used to detect and correct the unknown one-bit errors, and the global ECPs buffer is used to store the corrected value of hard errors. In comparison to ECP-6, our method provides almost identical lifetimes, but reduces approximately 50% storage overhead. Moreover, our structure reduces approximately 3.55% access latency overhead by increasing 1.61% storage overhead compared to PAYG, a hard error only solution.
|