Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] BroadcastJob status abnormal #1704

Open
ying2025 opened this issue Aug 15, 2024 · 1 comment
Open

[BUG] BroadcastJob status abnormal #1704

ying2025 opened this issue Aug 15, 2024 · 1 comment
Assignees
Labels
kind/bug Something isn't working

Comments

@ying2025
Copy link

ying2025 commented Aug 15, 2024

What happened:
BroadcastJob is running, and the pod already running, after activeDeadlineSeconds arrived the BroadcastJob type of job status is fail. But the pod has already been successfully run.
apiVersion: apps.kruise.io/v1alpha1 kind: BroadcastJob metadata: creationTimestamp: "2024-08-15T07:03:30Z" generation: 1 name: 1723705410-preheat-image resourceVersion: "180784897" uid: e6aacebb-2630-4dc8-8292-0a3897384a5b spec: completionPolicy: activeDeadlineSeconds: 1800 ttlSecondsAfterFinished: 30 type: Always failurePolicy: restartLimit: 3 type: FailFast parallelism: 20 template: metadata: creationTimestamp: "2024-08-15T07:03:30Z" spec: containers: - image: nginx:latest imagePullPolicy: Always name: preheat-image resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst nodeSelector: node-role.kubernetes.io/worker: "true" restartPolicy: Never schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: active: 1 desired: 1 failed: 0 phase: running startTime: "2024-08-15T07:03:30Z" succeeded: 0
image

What you expected to happen:
BroadcastJob status is success, this code why not calculate running pod as success?

activePods, failedPods, succeededPods := filterPods(job.Spec.FailurePolicy.RestartLimit, pods)

How to reproduce it (as minimally and precisely as possible):
once the pod running, the BroadcastJob type of job is success
Anything else we need to know?:
logs: I0815 07:03:30.362776 7 broadcastjob_controller.go:200] Job 1723705410-preheat-image has ActiveDeadlineSeconds, will resync after 1800 seconds
I0815 07:03:30.363118 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 1/1 nodes remaining to schedule pods
I0815 07:03:30.363143 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=0, failed=0
I0815 07:03:30.392223 7 pod_readiness_controller.go:101] Starting to process Pod prdsafe/1723705410-preheat-image-8xqpn
I0815 07:03:30.392911 7 broadcastjob_controller.go:727] Controller 1723705410-preheat-image created pod 1723705410-preheat-image-8xqpn
I0815 07:03:30.392969 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.392982 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 362757824, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:30.393421 7 recorder.go:103] events "msg"="Normal" "message"="Created pod: 1723705410-preheat-image-8xqpn" "object"={"kind":"BroadcastJob","namespace":"prdsafe","name":"1723705410-preheat-image","uid":"e6aacebb-2630-4dc8-8292-0a3897384a5b","apiVersion":"apps.kruise.io/v1alpha1","resourceVersion":"180784893"} "reason"="SuccessfulCreate"
I0815 07:03:30.405421 7 broadcastjob_controller.go:200] Job 1723705410-preheat-image has ActiveDeadlineSeconds, will resync after 1800 seconds
I0815 07:03:30.405819 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:30.405845 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.405880 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.405893 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 405404075, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:30.411097 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:30.411118 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.411150 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.411188 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:30.420871 7 pod_readiness_controller.go:106] Finish to process Pod prdsafe/1723705410-preheat-image-8xqpn, elapsedTime 28.65228ms
I0815 07:03:30.420947 7 pod_readiness_controller.go:101] Starting to process Pod prdsafe/1723705410-preheat-image-8xqpn
I0815 07:03:30.420991 7 pod_readiness_controller.go:106] Finish to process Pod prdsafe/1723705410-preheat-image-8xqpn, elapsedTime 44.625µs
I0815 07:03:30.421275 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:30.421300 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.421332 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.421344 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:33.902923 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:33.902950 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:33.902980 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:33.902995 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:34.935621 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:34.935649 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:34.935678 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:34.935693 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}

Environment:

  • Kruise version:
  • kruise/kruise-manager:v1.5.2
  • Kubernetes version (use kubectl version):
  • 1.19
  • Install details (e.g. helm install args):
  • Others:
@ying2025 ying2025 added the kind/bug Something isn't working label Aug 15, 2024
@zmberg
Copy link
Member

zmberg commented Aug 30, 2024

@ying2025 Can you show the complete BroadcastJob yaml? I have tried the kruise demo, and can't reproduce the above issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants