Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/mce/tests/page test failure #90

Open
lumanyu180 opened this issue May 18, 2021 · 6 comments
Open

/mce/tests/page test failure #90

lumanyu180 opened this issue May 18, 2021 · 6 comments

Comments

@lumanyu180
Copy link

hi,
I test mcelog , but got the error,
page-soft-then-hard.conf: triggers did not trigger as expected: 4 != 6
My mcelog version is : mcelog mcelog-144-10.94d853b2ea81.el7
I see it inject 6 error,and page-soft-then-hard.conf has memory-ce-threshold = 1 / 1h,so i think mcelog should run triggers 6 times. Why page-soft-then-hard.conf has trigger: 4 ?

@andikleen hope for your reply , Thanks very much!

@andikleen
Copy link
Owner

andikleen commented May 19, 2021 via email

@lumanyu180
Copy link
Author

hi ,

Sorry, i don't understand your meaning. I just want to say: the test suit failed. I don't kown why it failed. And other test suit for example cache is succefull. when run cache test suit , the result is : cache.conf: triggers trigger as expected ! . But i run page test suit, the result is :page-soft-then-hard.conf: triggers did not trigger as expected: 4 != 6. Why ?

hope for your replay.
Thanks very much !
@andikleen

@andikleen
Copy link
Owner

I don't know. It could be either the kernel or mcelog. Do you use a standard kernel? What does the system log say?

@lumanyu180
Copy link
Author

hi:
I use a standard kernel with 4.9.29. This is the system log:
[root@H3C page]# cat page-soft-then-hard.log
mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
Running trigger ../trigger' Hardware event. This is not a software error. MCE 0 CPU 0 BANK 2 MISC 0 ADDR 1e9f000 TIME 946951762 Tue Jan 4 02:09:22 2000 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error STATUS 8c000000000000b0 MCGSTATUS 0 MCGCAP 1000c16 APICID 0 SOCKETID 0 PPIN 8c000140000000b1 CPUID Vendor Intel Family 6 Model 86 Running trigger ../trigger'
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 2
MISC 50000 ADDR 1e9f000
TIME 946951762 Tue Jan 4 02:09:22 2000
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER AC_CHANNEL1_ERR
Transaction: Address/Command error
STATUS 8c000140000000b1 MCGSTATUS 0
MCGCAP 1000c16 APICID 0 SOCKETID 0
PPIN 8c000000000000b0
CPUID Vendor Intel Family 6 Model 86
Running trigger ../trigger' Hardware event. This is not a software error. MCE 2 CPU 0 BANK 2 MISC 0 ADDR 1e9f000 TIME 946951762 Tue Jan 4 02:09:22 2000 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error STATUS 8c000000000000b0 MCGSTATUS 0 MCGCAP 1000c16 APICID 0 SOCKETID 0 PPIN 8c000000000000b0 CPUID Vendor Intel Family 6 Model 86 Running trigger ../trigger'
Hardware event. This is not a software error.
MCE 3
CPU 0 BANK 2
MISC 0 ADDR 760f000
TIME 946951762 Tue Jan 4 02:09:22 2000
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR
Transaction: Address/Command error
STATUS 8c000000000000b0 MCGSTATUS 0
MCGCAP 1000c16 APICID 0 SOCKETID 0
PPIN 8c000000000000b0
CPUID Vendor Intel Family 6 Model 86
Running trigger ../trigger' mcelog: Too many trigger children running already Hardware event. This is not a software error. MCE 4 CPU 0 BANK 2 MISC 0 ADDR 7d7d000 TIME 946951762 Tue Jan 4 02:09:22 2000 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error STATUS 8c000000000000b0 MCGSTATUS 0 MCGCAP 1000c16 APICID 0 SOCKETID 0 PPIN 8c000000000000b0 CPUID Vendor Intel Family 6 Model 86 Running trigger ../trigger'
mcelog: Too many trigger children running already
Hardware event. This is not a software error.
MCE 5
CPU 0 BANK 2
MISC 0 ADDR 5381000
TIME 946951762 Tue Jan 4 02:09:22 2000
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR
Transaction: Address/Command error
STATUS 8c000000000000b0 MCGSTATUS 0
MCGCAP 1000c16 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 86
[root@H3C page]#

Can you tell me why page-soft-then-hard.conf write # trigger: 4 ? I think it inject 6 errors. so why trigger is 4 ? The page-soft-then-hard.log show there are 6 errors ,
hope for your replay.
Thanks very much !
@andikleen

@QiuxuZhuo
Copy link

QiuxuZhuo commented Jun 17, 2021

@lumanyu180
From your log " cat page-soft-then-hard.log", I can only see that 3 errors injected and the script was triggered for 3 times. The number of injected errors matched the number of triggering. So I don't see the mismatch happen. Did you paste the full log?
-Qiuxu

@lumanyu180
Copy link
Author

@lumanyu180
From your log " cat page-soft-then-hard.log", I can only see that 3 errors injected and the script was triggered for 3 times. The number of injected errors matched the number of triggering. So I don't see the mismatch happen. Did you paste the full log?
-Qiuxu

Hi,
Thanks for your reply. I past the full log . I think there are 6 times "Running trigger ../trigger" and MCE 0 , MCE 1, MCE 2, MCE 3, MCE 4, MCE 5, also 6 times . So i think it inject 6 errors ,and triggers are also 6 times。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants