248 Intel
®
E7520 Memory Controller Hub (MCH) Datasheet
Functional Description
5.11.3.7 Locking DRAM Address and Syndrome on Errors
The two error logging registers for correctable errors, DRAM_SEC_ADD and
DRAM_SEC_SYNDROME are locked when bits 0 or 8 of either the DRAM_FERR or
DRAM_NERR is set. If both of these bits are '0', then the two logging registers may be updated. If
either is set to '1', then the two registers will retain their value even if new correctable errors are
found. This allows the first error to be captured and held instead of retaining the last. Corrected
data errors as a result of scrubber-initiated traffic will be reflected in these error registers.
The logging register for uncorrectable errors, DRAM_DED_ADD is locked when bits 1 or 9 in
either the DRAM_FERR or DRAM_NERR is set. This register holds the address of uncorrectable
errors on data reads not initiated by the scrubber.
The logging register for Scrub detected errors, DRAM_SCRUB_ADD should be locked when bits
2 or 10 of either the DRAM_FERR or DRAM_NERR is set. This register holds the address for
scrubber-initiated transactions for either periodic memory scrubbing or sparing.
The logging register for Retry detected errors, DRAM_RETRY_ADD should be locked when bits
5 or 13 of either the DRAM_FERR or DRAM_NERR is set. This register is locked when the
determination is made that a retry to this address must be performed.
When the FERR/NERR registers are cleared, the logging registers are free to update their contents
until such time that either of these FERR/NERR registers again lock.
5.11.3.8 Memory Error Counters
The MCH provides error counters for each DIMM (single-channel mode) or DIMM pair (dual
channel mode). There is a correctable error counter and an uncorrectable error counter for each
DIMM or DIMM pair. The intent of these counters is primarily for support of the DIMM sparing
feature. Compared against configurable threshold values, the values of these counters determine
when sparing is invoked. The values in these counters decay over time, since the total number of
errors over time is not interesting, but the count of errors within a window of time.
5.11.3.9 PCI Express Errors and Errors on Behalf of PCI Express
MCH-specific error detection, masking, and escalation mechanisms operate on a parallel path to
their standardized counterparts included in the PCI Express Interface Specification, Rev 1.0a. PCI
Express errors are classified as either correctable or uncorrectable. Uncorrectable errors are further
broken down as fatal or non-fatal.
PCI Express specified correctable errors are logged in the Correctable Error Status register (Device
2-7, Function 0, Offset 110 - 113h), unless they are masked by a corresponding bit in the
Correctable Error Detect Mask register (Device 2-7, Function 0, Offset 150-153h).
PCI Express specified uncorrectable errors are logged in the Uncorrectable Error Status register
(Device 2-7, Function 0, Offset 104-107h), unless they are masked by a corresponding bit in the
Uncorrectable Error Detect Mask register (Device 2-7, Function 0, Offset 14C-14Fh). The
Uncorrectable Error Severity register (Device 2-7, Function 0, Offset 10C - 10Fh) determines if
bits in the Uncorrectable Status register are treated as uncorrectable fatal or uncorrectable non-fatal
errors. The Device Status register (6Eh) bits are set when the corresponding category of bit is set in
the uncorrectable and correctable status registers.
Reporting of non-masked error bits to the root complex hierarchy of PCI Express error registers is
controlled on three different levels. Individual errors are masked for reporting by the Uncorrectable
Error Mask (Device 2-7, Function 0, Offset 108-10Bh) and the Correctable Error Mask (Device
2-7, Function 0, Offset 114-117h) registers. Individual error category (fatal, non-fatal, correctable,