Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCE can't trigger dimm error when error occurs on the purley platform #67

Open
jian265 opened this issue May 2, 2018 · 5 comments
Open

Comments

@jian265
Copy link

jian265 commented May 2, 2018

No description provided.

@andikleen
Copy link
Owner

More information please. e.g. a log?

@jian265
Copy link
Author

jian265 commented May 7, 2018

When trigger a memory corrected error,it can produce the mcelog,which includes sockets,channel of the fail memory. But the dimm-error-trigger get the result, whose socket number is 0, channel number is 1, and the dimm number is -1. It just means it can't find the right dimm number.

@jian265
Copy link
Author

jian265 commented May 7, 2018

And another question, how does mce get the dimm number? Thank you!

@jian265
Copy link
Author

jian265 commented May 8, 2018

in the file skylake_xeon.c, skylake_s_decode_model decode the memory error by mc13, but intel replys that be changed to CSR, mc13 is be reserved.
case 16: case 17: case 18:
Wprintf("MemCtrl: ");
if (EXTRACT(status, 27, 27))
decode_bitfield(status, memctrl_mc13);
else
decode_bitfield(status, mc_bits);
break;

@aegl
Copy link
Collaborator

aegl commented May 10, 2018

On Skylake machine check banks 13-18 are used to report errors from the memory controllers (2 memory controllers on each socket, 3 channels on each memory controller ... so 6 banks needed in total).

mcelog hasn't been able to convert addresses to DIMMs for a few generations of CPUs because interleaving of addresses between memory controllers and channels isn't reported in the machine check bank.

Load the skx_edac.ko Linux EDAC driver if you need errors decoded to a specific DIMM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants