Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug/Help] 访问mis/init时一直 reported 500 error #1439

Open
1 task done
xaserver opened this issue Sep 23, 2024 · 2 comments
Open
1 task done

[Bug/Help] 访问mis/init时一直 reported 500 error #1439

xaserver opened this issue Sep 23, 2024 · 2 comments

Comments

@xaserver
Copy link

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

发生了什么 | What happened

mis-server 与portal-server 容器一直再restart中
[root@master HMS]# ./cli compose ps
INFO: Loaded plugins: []
WARN[0000] /opt/HPCStack/HMS/docker-compose-1727059507151.yml: the attribute version is obsolete, it will be ignored, please remove it to avoid potential confusion
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
hms-audit-db-1 mysql:8 "docker-entrypoint.s…" audit-db About a minute ago Up About a minute 3306/tcp, 33060/tcp
hms-audit-server-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" audit-server About a minute ago Up 50 seconds 80/tcp, 3000/tcp, 5000/tcp
hms-auth-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" auth About a minute ago Up About a minute 80/tcp, 3000/tcp, 5000/tcp
hms-db-1 mysql:8 "docker-entrypoint.s…" db About a minute ago Up About a minute 3306/tcp, 33060/tcp
hms-gateway-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" gateway About a minute ago Up About a minute 3000/tcp, 0.0.0.0:80->80/tcp, :::80->80/tcp, 5000/tcp
hms-log-1 fluentd:v1.14.0-1.0 "tini -- /bin/entryp…" log About a minute ago Up About a minute 5140/tcp, 0.0.0.0:24224->24224/tcp, 0.0.0.0:24224->24224/udp, :::24224->24224/tcp, :::24224->24224/udp
hms-mis-server-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" mis-server About a minute ago Restarting (1) 16 seconds ago
hms-mis-web-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" mis-web About a minute ago Up About a minute 80/tcp, 3000/tcp, 5000/tcp
hms-novnc-1 ghcr.io/pkuhpc/novnc-client-docker:master "/docker-entrypoint.…" novnc About a minute ago Up About a minute 80/tcp
hms-portal-server-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" portal-server About a minute ago Restarting (1) 24 seconds ago
hms-portal-web-1 mirrors.pku.edu.cn/pkuhpc-icode/scow:master "./entrypoint.sh" portal-web About a minute ago Up About a minute 80/tcp, 3000/tcp, 5000/tcp
hms-redis-1 redis:alpine "docker-entrypoint.s…" redis About a minute ago Up About a minute 6379/tcp

mis-server-1 日志:
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:12.834Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Hook is not configured."}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.175Z","pid":18,"hostname":"b3c5b2169cb6","version":{"commit":"9e67d1efe1735d53212d27aff99217f9bb203af7"},"msg":"@scow/mis-server: "}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.175Z","pid":18,"hostname":"b3c5b2169cb6","config":{"HOST":"0.0.0.0","PORT":5000,"LOG_LEVEL":"info","LOG_PRETTY":false,"SSH_PRIVATE_KEY_PATH":"/root/.ssh/id_rsa","SSH_PUBLIC_KEY_PATH":"/root/.ssh/id_rsa.pub","AUTH_URL":"","DB_PASSWORD":"must!chang3this"},"msg":"Loaded env config"}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.430Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Update cluster entity started."}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.437Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Current clusters list: Cluster ID: hpc01, Current Status: ACTIVATED"}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.438Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Checking if root can login to HPC by login node 10.10.8.170"}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.530Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Login to 10.10.8.170 as root failed."}
mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.530Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Root cannot login to HPC by login node 10.10.8.170. err: {"level":"client-authentication"}"}
mis-server-1 | /app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/client.js:865
mis-server-1 | const err = new Error('All configured authentication methods failed');
mis-server-1 | ^
mis-server-1 |
mis-server-1 | Error: All configured authentication methods failed
mis-server-1 | at doNextAuth (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/client.js:865:21)
mis-server-1 | at tryNextAuth (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/client.js:1082:7)
mis-server-1 | at USERAUTH_FAILURE (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/client.js:430:11)
mis-server-1 | at 51 (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/protocol/handlers.misc.js:408:16)
mis-server-1 | at Protocol.onPayload (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/protocol/Protocol.js:2059:10)
mis-server-1 | at AESGCMDecipherBinding.decrypt (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/protocol/crypto.js:1086:26)
mis-server-1 | at Protocol.parsePacket [as _parse] (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/protocol/Protocol.js:2028:25)
mis-server-1 | at Protocol.parse (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/protocol/Protocol.js:313:16)
mis-server-1 | at Socket. (/app/node_modules/.pnpm/[email protected]/node_modules/ssh2/lib/client.js:775:21)
mis-server-1 | at Socket.emit (node:events:519:28) {
mis-server-1 | level: 'client-authentication'
mis-server-1 | }
mis-server-1 |
mis-server-1 | Node.js v20.17.0
mis-server-1 exited with code 1

portal-server-1 日志:
portal-server-1 | {"level":30,"time":"2024-09-23T02:50:08.270Z","pid":18,"hostname":"853f5aa43e3b","version":{"commit":"9e67d1efe1735d53212d27aff99217f9bb203af7"},"msg":"Running @scow/portal-server"}
portal-server-1 | {"level":30,"time":"2024-09-23T02:50:08.271Z","pid":18,"hostname":"853f5aa43e3b","config":{"HOST":"0.0.0.0","PORT":5000,"LOG_LEVEL":"info","LOG_PRETTY":false,"PORTAL_BASE_PATH":"/","MIS_DEPLOYED":true,"MIS_SERVER_URL":"mis-server:5000","SSH_PRIVATE_KEY_PATH":"/root/.ssh/id_rsa","SSH_PUBLIC_KEY_PATH":"/root/.ssh/id_rsa.pub","DOWNLOAD_CHUNK_SIZE":3145728,"SCOWD_SSL_ENABLED":false,"SCOWD_SSL_CA_CERT_PATH":"","SCOWD_SSL_SCOW_CERT_PATH":"","SCOWD_SSL_SCOW_PRIVATE_KEY_PATH":""},"msg":"Loaded env config"}
portal-server-1 | node:internal/process/promises:391
portal-server-1 | triggerUncaughtException(err, true /* fromPromise */);
portal-server-1 | ^
portal-server-1 |
portal-server-1 | Error: 14 UNAVAILABLE: Name resolution failed for target dns:mis-server:5000
portal-server-1 | at callErrorFromStatus (/app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
portal-server-1 | at Object.onReceiveStatus (/app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/client.js:193:76)
portal-server-1 | at Object.onReceiveStatus (/app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
portal-server-1 | at Object.onReceiveStatus (/app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
portal-server-1 | at /app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78
portal-server-1 | at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
portal-server-1 | for call at
portal-server-1 | at ServiceClientImpl.makeUnaryRequest (/app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/client.js:161:32)
portal-server-1 | at ServiceClientImpl.getClustersRuntimeInfo (/app/node_modules/.pnpm/@grpc[email protected]/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
portal-server-1 | at /app/node_modules/.pnpm/@ddadaal+tsgrpc-client@0.17.7_@grpc[email protected]/node_modules/@ddadaal/tsgrpc-client/lib/unary.js:18:13
portal-server-1 | at new Promise ()
portal-server-1 | at asyncClientCall (/app/node_modules/.pnpm/@ddadaal+tsgrpc-client@0.17.7_@grpc[email protected]/node_modules/@ddadaal/tsgrpc-client/lib/unary.js:15:12)
portal-server-1 | at libGetCurrentActivatedClusters (/app/libs/server/build/misCommon/clustersActivation.js:37:61)
portal-server-1 | at createServer (/app/apps/portal-server/build/app.js:54:89)
portal-server-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
portal-server-1 | at async main (/app/apps/portal-server/build/index.js:16:20) {
portal-server-1 | code: 14,
portal-server-1 | details: 'Name resolution failed for target dns:mis-server:5000',
portal-server-1 | metadata: Metadata { internalRepr: Map(0) {}, options: {} }
portal-server-1 | }
portal-server-1 |
portal-server-1 | Node.js v20.17.0

期望结果 | What did you expect to happen

能告诉我哪个环节有问题呢?

之前运行正常吗? | Did this work before?

第一次部署

复现方法 | Steps To Reproduce

照这官网教程部署

运行环境 | Environment

- OS: rocky 9.4
- Scheduler:Slurm slurm-24.05.3/slurm-23.11.10/slurm-23.02.8
- Docker:27.3.1
- Docker-compose: v2.29.5
- SCOW cli:1.6.2
- SCOW:1.6.2
- Adapter:1.6.0

备注 | Anything else?

No response

@link89
Copy link
Contributor

link89 commented Sep 24, 2024

经历过类似的问题,如果 mis 无法连接到 adapter 就会自动退出。所以可以

  1. 先检查下 adapter 是否仍运行正常
  2. mis 是否可以连接到 adapter

@link89
Copy link
Contributor

link89 commented Sep 24, 2024

mis-server-1 | {"level":30,"time":"2024-09-23T02:47:13.530Z","pid":18,"hostname":"b3c5b2169cb6","msg":"Root cannot login to HPC by login node 10.10.8.170. err: {"level":"client-authentication"}"}

这个日志说明无法以 root 身份从 mis-server 登录 login节点

需要在 mis-server 所在的机器上生成证书,然后把公钥加到 login节点的root 用户下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants