Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug/Help] 访问dashboard页面会出现 500 错误并导致 slurm adapter 崩溃 #1427

Open
1 task done
link89 opened this issue Sep 13, 2024 · 0 comments
Open
1 task done

Comments

@link89
Copy link
Contributor

link89 commented Sep 13, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

发生了什么 | What happened

访问 dashboard 时会出现 500 错误,同时观察到 scow-slurm-adapter 崩溃,adapter 侧的信息在另一issue 报告:
PKUHPC/scow-slurm-adapter#20

期望结果 | What did you expect to happen

可正常访问dashboard

之前运行正常吗? | Did this work before?

未知

复现方法 | Steps To Reproduce

打开dashboard页面

运行环境 | Environment

- OS: CentOS 7.4
- Scheduler:
- Docker:
- Docker-compose:
- SCOW cli:
- SCOW: 1.6.3
- Adapter: 1.6.0

备注 | Anything else?

出错时的后端完整日志如下

auth-1           | {"level":30,"time":"2024-09-13T02:43:56.868Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jm","req":{"method":"GET","url":"/public/validateToken?token=1fcf1865-30b1-481f-a3fa-d8c0550bf3f3","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46992},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:56.869Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jm","res":{"statusCode":200},"responseTime":1.267364501953125,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:56.872Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jn","req":{"method":"GET","url":"/user?identityId=whxu","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46994},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.039Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jn","msg":"Command execCommand getent passwd whxu, options %o"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.040Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jn","res":{"statusCode":200},"responseTime":167.78284358978271,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.049Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jo","req":{"method":"GET","url":"/public/validateToken?token=1fcf1865-30b1-481f-a3fa-d8c0550bf3f3","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46992},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.050Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jo","res":{"statusCode":200},"responseTime":1.0001049041748047,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.051Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jp","req":{"method":"GET","url":"/user?identityId=whxu","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46994},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.208Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jp","msg":"Command execCommand getent passwd whxu, options %o"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.208Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jp","res":{"statusCode":200},"responseTime":156.89488220214844,"msg":"request completed"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:57.213Z","pid":18,"hostname":"8a5ee9512306","req":"2u","path":"/scow.common.ConfigService/GetClusterConfigFiles","msg":"Starting request"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:57.283Z","pid":18,"hostname":"8a5ee9512306","req":"2u","path":"/scow.common.ConfigService/GetClusterConfigFiles","msg":"Request completed."}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.292Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jq","req":{"method":"GET","url":"/public/validateToken?token=1fcf1865-30b1-481f-a3fa-d8c0550bf3f3","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46992},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.294Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jq","res":{"statusCode":200},"responseTime":1.0684814453125,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.295Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jr","req":{"method":"GET","url":"/user?identityId=whxu","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46994},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.453Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jr","msg":"Command execCommand getent passwd whxu, options %o"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:57.453Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jr","res":{"statusCode":200},"responseTime":158.34103775024414,"msg":"request completed"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:57.458Z","pid":19,"hostname":"98fd1a8e6509","req":"8z","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:57.462Z","pid":19,"hostname":"98fd1a8e6509","req":"8z","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Current clusters list: Cluster ID: chenglab, Current Status: ACTIVATED"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:57.462Z","pid":19,"hostname":"98fd1a8e6509","req":"8z","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Request completed."}
gateway-1        | 192.168.6.1 - - [13/Sep/2024:02:43:57 +0000] "GET /dashboard HTTP/1.1" 200 39689 "http://localhost:8081/auth/public/auth?callbackUrl=http%3A%2F%2Flocalhost%3A8081%2Fapi%2Fauth%2Fcallback" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0"
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.064Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-js","req":{"method":"GET","url":"/public/validateToken?token=1fcf1865-30b1-481f-a3fa-d8c0550bf3f3","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46992},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.065Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-js","res":{"statusCode":200},"responseTime":1.2494373321533203,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.067Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jt","req":{"method":"GET","url":"/user?identityId=whxu","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46994},"msg":"incoming request"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.093Z","pid":18,"hostname":"8a5ee9512306","req":"2v","path":"/scow.portal.AppService/ListAvailableApps","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.097Z","pid":19,"hostname":"98fd1a8e6509","req":"90","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.099Z","pid":19,"hostname":"98fd1a8e6509","req":"90","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Current clusters list: Cluster ID: chenglab, Current Status: ACTIVATED"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.100Z","pid":19,"hostname":"98fd1a8e6509","req":"90","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Request completed."}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.101Z","pid":18,"hostname":"8a5ee9512306","msg":"Checking activation status of clusters with ids ([\"chenglab\"]) "}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.101Z","pid":18,"hostname":"8a5ee9512306","req":"2v","path":"/scow.portal.AppService/ListAvailableApps","msg":"Request completed."}
gateway-1        | 192.168.6.1 - - [13/Sep/2024:02:43:58 +0000] "GET /api/app/listAvailableApps?cluster=chenglab HTTP/1.1" 304 0 "http://localhost:8081/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0"
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.136Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-ju","req":{"method":"GET","url":"/public/validateToken?token=1fcf1865-30b1-481f-a3fa-d8c0550bf3f3","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46992},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.142Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-ju","res":{"statusCode":200},"responseTime":6.131214141845703,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.145Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jv","req":{"method":"GET","url":"/user?identityId=whxu","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":47024},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.236Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jt","msg":"Command execCommand getent passwd whxu, options %o"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.236Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jt","res":{"statusCode":200},"responseTime":168.90215969085693,"msg":"request completed"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.241Z","pid":18,"hostname":"8a5ee9512306","req":"2w","path":"/scow.portal.DashboardService/GetQuickEntries","msg":"Starting request"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.241Z","pid":18,"hostname":"8a5ee9512306","req":"2w","path":"/scow.portal.DashboardService/GetQuickEntries","msg":"Request completed."}
gateway-1        | 192.168.6.1 - - [13/Sep/2024:02:43:58 +0000] "GET /api/dashboard/getQuickEntries HTTP/1.1" 304 0 "http://localhost:8081/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0"
gateway-1        | 192.168.6.1 - - [13/Sep/2024:02:43:58 +0000] "GET /manifest.json HTTP/1.1" 304 0 "http://localhost:8081/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0"
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.302Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jv","msg":"Command execCommand getent passwd whxu, options %o"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.302Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jv","res":{"statusCode":200},"responseTime":157.04853534698486,"msg":"request completed"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.306Z","pid":18,"hostname":"8a5ee9512306","req":"2x","path":"/scow.portal.ConfigService/GetClusterInfo","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.309Z","pid":19,"hostname":"98fd1a8e6509","req":"91","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.311Z","pid":19,"hostname":"98fd1a8e6509","req":"91","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Current clusters list: Cluster ID: chenglab, Current Status: ACTIVATED"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.311Z","pid":19,"hostname":"98fd1a8e6509","req":"91","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Request completed."}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.313Z","pid":18,"hostname":"8a5ee9512306","msg":"Checking activation status of clusters with ids ([\"chenglab\"]) "}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.313Z","pid":18,"hostname":"8a5ee9512306","req":"2x","path":"/scow.portal.ConfigService/GetClusterInfo","msg":"Calling actions on cluster chenglab"}
portal-server-1  | {"level":50,"time":"2024-09-13T02:43:58.453Z","pid":18,"hostname":"8a5ee9512306","req":"2x","path":"/scow.portal.ConfigService/GetClusterInfo","msg":"Cluster ops fails at {\"code\":1,\"details\":\"Call cancelled\",\"metadata\":{}}"}
portal-server-1  | {"level":50,"time":"2024-09-13T02:43:58.453Z","pid":18,"hostname":"8a5ee9512306","req":"2x","path":"/scow.portal.ConfigService/GetClusterInfo","err":{"type":"ServiceError","message":"","stack":"Error\n    at /app/apps/portal-server/build/utils/clusters.js:52:19\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async callOnOne (/app/apps/portal-server/build/utils/clusters.js:42:12)\n    at async Object.getClusterInfo (/app/apps/portal-server/build/services/config.js:75:27)\n    at async augmentedImplementations.<computed> [as getClusterInfo] (/app/node_modules/.pnpm/@[email protected]_@[email protected]/node_modules/@ddadaal/tsgrpc-server/lib/server.js:76:33)","code":13,"details":"Cluster ID : chenglab, Details : Error: 1 CANCELLED: Call cancelled","metadata":{"is_scow_error":["1"],"scow_error_code":["ADAPTER_CALL_ON_ONE_ERROR"],"clustererrors":["[{\"clusterId\":\"chenglab\",\"details\":{\"code\":1,\"details\":\"Call cancelled\",\"metadata\":{}}}]"]}},"msg":"Error occurred. Return the error."}
gateway-1        | 192.168.6.1 - - [13/Sep/2024:02:43:58 +0000] "GET /api/dashboard/getClusterInfo?clusterId=chenglab HTTP/1.1" 500 226 "http://localhost:8081/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0"
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.497Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jw","req":{"method":"GET","url":"/public/validateToken?token=1fcf1865-30b1-481f-a3fa-d8c0550bf3f3","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46992},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.498Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jw","res":{"statusCode":200},"responseTime":1.2353410720825195,"msg":"request completed"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.500Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jx","req":{"method":"GET","url":"/user?identityId=whxu","hostname":"auth:5000","remoteAddress":"192.168.6.1","remotePort":46994},"msg":"incoming request"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.649Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jx","msg":"Command execCommand getent passwd whxu, options %o"}
auth-1           | {"level":30,"time":"2024-09-13T02:43:58.650Z","pid":18,"hostname":"dfa1a4e21d46","reqId":"req-jx","res":{"statusCode":200},"responseTime":149.41473960876465,"msg":"request completed"}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.654Z","pid":18,"hostname":"8a5ee9512306","req":"2y","path":"/scow.portal.ConfigService/GetClusterNodesInfo","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.657Z","pid":19,"hostname":"98fd1a8e6509","req":"92","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Starting request"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.660Z","pid":19,"hostname":"98fd1a8e6509","req":"92","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Current clusters list: Cluster ID: chenglab, Current Status: ACTIVATED"}
mis-server-1     | {"level":30,"time":"2024-09-13T02:43:58.660Z","pid":19,"hostname":"98fd1a8e6509","req":"92","path":"/scow.server.ConfigService/GetClustersRuntimeInfo","msg":"Request completed."}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.661Z","pid":18,"hostname":"8a5ee9512306","msg":"Checking activation status of clusters with ids ([\"chenglab\"]) "}
portal-server-1  | {"level":30,"time":"2024-09-13T02:43:58.661Z","pid":18,"hostname":"8a5ee9512306","req":"2y","path":"/scow.portal.ConfigService/GetClusterNodesInfo","msg":"Calling actions on cluster chenglab"}
portal-server-1  | {"level":50,"time":"2024-09-13T02:43:58.664Z","pid":18,"hostname":"8a5ee9512306","req":"2y","path":"/scow.portal.ConfigService/GetClusterNodesInfo","msg":"Cluster ops fails at {\"code\":12,\"message\":\"unimplemented\",\"details\":\"The scheduler API version can not be confirmed.To use this method, the scheduler adapter must be upgraded to the version 1.6.0 or higher.\"}"}
portal-server-1  | {"level":50,"time":"2024-09-13T02:43:58.664Z","pid":18,"hostname":"8a5ee9512306","req":"2y","path":"/scow.portal.ConfigService/GetClusterNodesInfo","code":12,"message":"unimplemented","details":"The scheduler API version can not be confirmed.To use this method, the scheduler adapter must be upgraded to the version 1.6.0 or higher.","msg":"Error occurred. Return the error."}
portal-web-1     | Error: 12 UNIMPLEMENTED: The scheduler API version can not be confirmed.To use this method, the scheduler adapter must be upgraded to the version 1.6.0 or higher.
portal-web-1     |     at callErrorFromStatus (/app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
portal-web-1     |     at Object.onReceiveStatus (/app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/client.js:193:76)
portal-web-1     |     at Object.onReceiveStatus (/app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
portal-web-1     |     at Object.onReceiveStatus (/app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
portal-web-1     |     at /app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78
portal-web-1     |     at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
portal-web-1     | for call at
portal-web-1     |     at ServiceClientImpl.makeUnaryRequest (/app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/client.js:161:32)
portal-web-1     |     at ServiceClientImpl.getClusterNodesInfo (/app/node_modules/.pnpm/@[email protected]/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
portal-web-1     |     at /app/node_modules/.pnpm/@[email protected]_@[email protected]/node_modules/@ddadaal/tsgrpc-client/lib/unary.js:18:13
portal-web-1     |     at new Promise (<anonymous>)
portal-web-1     |     at asyncClientCall (/app/node_modules/.pnpm/@[email protected]_@[email protected]/node_modules/@ddadaal/tsgrpc-client/lib/unary.js:15:12)
portal-web-1     |     at /app/apps/portal-web/.next/server/pages/api/dashboard/getClusterNodesInfo.js:1:1935
portal-web-1     |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
portal-web-1     |   code: 12,
portal-web-1     |   details: 'The scheduler API version can not be confirmed.To use this method, the scheduler adapter must be upgraded to the version 1.6.0 or higher.',
portal-web-1     |   metadata: Metadata {
portal-web-1     |     internalRepr: Map(2) { 'content-type' => [Array], 'date' => [Array] },
portal-web-1     |     options: {}
portal-web-1     |   }
portal-web-1     | }
gateway-1        | 192.168.6.1 - - [13/Sep/2024:02:43:58 +0000] "GET /api/dashboard/getClusterNodesInfo?cluster=chenglab HTTP/1.1" 500 32 "http://localhost:8081/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant