220 Intel
®
E7520 Memory Controller Hub (MCH) Datasheet
Functional Description
5.4.2.6 DIMM Error Rate Threshold Counters
The error rate threshold counters of the memory subsystem utilize a simple leaky bucket counter in
order to flag excessive error rates rather than a total error count. The operational model is
straightforward: set the fail-over threshold register to a non-zero value in the THRESH_DED or
THRESH_SEC0-3 registers to enable the feature. If the count of errors on any DIMM exceeds the
threshold after a programmable time period, an error will occur and sparing fail-over will
commence. The counter registers themselves are implemented as “leaky buckets,” such that they
do not contain an absolute cumulative count of all errors since power-on; rather, they contain an
aggregate count of the number of errors received over a running time period. The “drip rate” of the
bucket is selectable by software via the SPARECTL register, so it is possible to set the threshold to
a value that will never be reached by a “healthy” memory subsystem experiencing the rate of errors
expected for the size and type of memory devices in use.
5.4.2.6.1 Error Threshold Methodology
Figure 5-1 illustrates the Error Threshold methodology with the assumption that errors are
accumulated at the Expected Average Rate (EAR). The EAR is defined as the rate at which a
properly functioning DIMM accumulates errors. At the end of the first time period, the sum of the
current errors plus the residue from past errors, zero in this case, is then divided by 2 to become the
resultant value used for the comparison. The resultant value becomes the residue for the next time
period (second in this example). At the end of the second time period, the counter contains the sum
of new errors and the non-zero error residue. This, in turn, is divided by 2 to become the new
resultant value for comparison and so on.
Figure 5-2 shows the comparison error count at the end of each time period. Assuming a counter
accumulates errors at the average rate, the counter will reach a steady state after approximately five
time periods. Given this, setting the threshold value at or just above the EAR will result in a
threshold event if a significant surge in errors occurs.
Figure 5-1. Error Ramp Rate
Figure 5-2. Error Count For Comparison
0
0.50
1.00
1.50
2.00
TP#1 TP#2 TP#3 TP#4 TP#5 TP#6
EAR
2X EAR
Personality Repetitive after 5 Time Units.
Average
TP#1 TP#2 TP#3 TP#4 TP#5 TP#6
0
0.50
EAR 1.00
(1+0)/2 (1+0.5)/2 (1+0.75)/2 (1+0.875)/2 (1+0.9375)/2 (1+0.96875)/2(Errors+residue)/2
0.5 0.75 0.875 0.9375 0.9843750.96875
Resultant value