-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCE can't trigger dimm error when error occurs on the purley platform #67
Comments
More information please. e.g. a log? |
When trigger a memory corrected error,it can produce the mcelog,which includes sockets,channel of the fail memory. But the dimm-error-trigger get the result, whose socket number is 0, channel number is 1, and the dimm number is -1. It just means it can't find the right dimm number. |
And another question, how does mce get the dimm number? Thank you! |
in the file skylake_xeon.c, skylake_s_decode_model decode the memory error by mc13, but intel replys that be changed to CSR, mc13 is be reserved. |
On Skylake machine check banks 13-18 are used to report errors from the memory controllers (2 memory controllers on each socket, 3 channels on each memory controller ... so 6 banks needed in total). mcelog hasn't been able to convert addresses to DIMMs for a few generations of CPUs because interleaving of addresses between memory controllers and channels isn't reported in the machine check bank. Load the skx_edac.ko Linux EDAC driver if you need errors decoded to a specific DIMM. |
No description provided.
The text was updated successfully, but these errors were encountered: