The Reliability of dynamic random access memory chips
Date
1993
Authors
Pelton, Timothy W.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advances in dynamic random access memory (DRAM) chip capacity have required designers to develop and incorporate many new techniques to ensure viable yields and to maintain adequate reliability. One technique used is the incorporation of error correction code circuitry which increases the yield of chips that are functionally fault free, and reduces the probability of failure by hiding some of the errors caused by new faults.
The majority of DRAM chip failures that occur during normal operations are attributed to transient faults (primarily caused by a-particles). We develop a method for analyzing the reliability of DRAM chips during normal operation and suggest modifications to the chip circuitry to reduce the effect of transient faults.
We propose two enhancements which make more extensive use of the built in error correction ability of current DRAM chips. The first scheme is described as parallel scrubbing. Data contained in the chip is checked and corrected (if necessary and possible) in parallel with normal access cycles. The second scheme improves on the first, but has a higher overhead cost. It involves implementing systematic parallel scrubbing using counters to provide addresses to idle areas of memory thereby ensuring that the data in the chip is scrubbed uniformly.
In order to assess the efficacy of our two schemes we have developed a model for estimating the reliability of DRAM chips using Markov and series models. Our model determines the probability that an uncorrectable error exists on the chip over time and accounts for the soft error rate, the distribution of accesses, the access rate, and the number of permanent faults present on the chip. Our model can be adapted to analyze the reliability of any standard DRAM chip.
In our analysis we find that our model is helpful in identifying the changing probabilities and trends. We find that a DRAM chip using the first scheme, parallel scrubbing, is almost 31 times less likely to develop an uncorrectable error than the standard DRAM chip, and that a chip using the second scheme, systematic scrubbing, is more than 240,000 times less likely to develop an uncorrectable error. This leads us to the conclusion that our schemes offer a significant potential for improvement to the reliability of DRAM chips.
Description
Keywords
UN SDG 9: Industry, Innovation, and Infrastructure