Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Kubernetes 1.23 jobs #29387

Merged
merged 2 commits into from
Apr 28, 2023
Merged

Conversation

xmudrii
Copy link
Member

@xmudrii xmudrii commented Apr 28, 2023

Experimenting with removing Kubernetes 1.23 jobs (ref #29230, #29379, #29327)

This PR removes Kubernetes 1.23 jobs.

Disclaimer: this is done in a hacky way and I'll try my best to explain what's going on and what issues we have.

Version markers

For context, version markers are explained here: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/kubernetes-versions.md

We have the following version markers in play:

  • k8s-master
    • Current and expected usage: points to the master branch in k/k
  • k8s-beta
    • Current usage: points to the latest stable release in time frame from cutting the release branch to cutting the next release branch
    • Expected usage: points to the latest stable release in time frame from cutting the release branch to cutting the .0 release
  • k8s-stable1 to k8s-stable4
    • Current usage and expected usage: points to the appropriate stable release

The biggest issue that we have with k8s-beta is that we use it for way too long. It should be used only in the "transition period" between cutting the release branch and publishing the .0 release. However, we don't push out k8s-beta after .0 release, instead we continue using it until the next branch cut.

This can be quite confusing for those eventually depending on those markers. We discussed it previously on Slack and some idea was that we should fix this: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1680813414229639

Fixing this is going to be important especially for the next part.

Removing jobs for the EOL release branch

Removing jobs for the EOL release branch is often done by generating jobs for the new release branch + some manual clean ups. In other words, we basically did this: make -C releng prepare-release-branch

There are two problems with that:

  • this is going to also create jobs for 1.28 which we don't want at this point in the 1.28 release cycle
  • the logic behind this Make target is implemented on premises that k8s-beta is used all the time and not only in the transition period

When we run that command, the result will be:

  • 1.28 jobs are generated (these can be removed manually very easily)
  • 1.27 - 1.24 jobs are correctly pushed/rotated to stable1 - stable4
  • jobs in generated.yaml are updated
  • testgrid dashboards for 1.28 are generated

Bold steps are problematic:

  • generated.yaml is auto-generated and shouldn't be touched
  • even if we touch generated.yaml and remove 1.28 jobs, tests are going to fail because we have 1.28 testgrid dashboards that are not used at all

Mitigations

What we need actually is just config-rotator and regenerating generated.yaml. In other words, we don't need config-forker and all the logic that generates 1.28 jobs.

It's possible to execute both manually, however, before we can do that, we have to figure out how to approach k8s-beta jobs. The best idea I was able to come up is to just comment k8s-beta in releng/test_config.yaml. That way, regenerating generated.yaml is actually going to remove all k8sbeta tests which we in fact don't need at this stage because that marker shouldn't be in use until we don't cut 1.28 release branch.

With that done, the following commands can be used to do the magic:

go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.24.yaml -new stable4 -old stable3
go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.25.yaml -new stable3 -old stable2
go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.26.yaml -new stable2 -old stable1
go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.27.yaml -new stable1 -old beta

releng/generate_tests.py --yaml-config-path=/Users/marko/Projects/src/k8s.io/test-infra/releng/test_config.yaml

At this stage, we're almost ready. Concretely:

  • there are some testgrid dashboards that are not in use after removing 1.23 jobs and those dashboards must be removed
  • because we don't have k8s-beta jobs anymore we have to comment some dashboards
  • we need to clean up old jobs, i.e. remove 1.23.yaml and some Windows jobs

After that, we should be in a good state. 🎉

Generating 1.28 jobs

Generating 1.28 jobs is not going to be trivial. We can't use the Make target (make -C releng prepare-release-branch) because we already used config-rotator and using it again in the same release cycle would end up in skipping one version marker. For example, 1.28 would be added as k8s-beta, but 1.27 would end up jumping from stable1 to stable2. Instead, 1.27 should stay at stable1 until we don't push out 1.28 to stable1 next time we're removing jobs.

Instead of using the Make target, we have to:

  • uncommeting k8s-beta marker and relevant jobs and dashboards
  • run config-forker manually to create 1.28 jobs
  • regenerate generated.yaml so k8sbeta jobs are added
  • add testgrid dashboards for 1.28

In fact, most of those tasks should be pretty easy. The real challenge is making sure we adhere to this, i.e. that we don't forget about this issue. This is especially important because the handbook is instructing to use the Make target.

Follow ups

We have an issue with some fork annotations. Some of them are lost in 1.23.yaml and we need to look into that. That's mostly like irrelevant to this PR, it's mostly like that we forgot to update those annotations when introducing stable4 marker a while ago.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 28, 2023
@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject area/testgrid sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. labels Apr 28, 2023
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Apr 28, 2023
@xmudrii
Copy link
Member Author

xmudrii commented Apr 28, 2023

/assign @cici37 @ramrodo @palnabarun
cc @kubernetes/release-engineering

Signed-off-by: Marko Mudrinić <[email protected]>
@xmudrii xmudrii changed the title [WIP] Remove Kubernetes 1.23 jobs Remove Kubernetes 1.23 jobs Apr 28, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2023
@xmudrii
Copy link
Member Author

xmudrii commented Apr 28, 2023

/hold
for discussion

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2023
Copy link
Member Author

@xmudrii xmudrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some notes to guide the review process. I need to add some TODOs and I'll do that later today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this file because it seems related to presubmits for k/k 1.23. We should double check with Windows folks to make sure this is okay.

Comment on lines -4 to -45
- name: ci-kubernetes-e2e-gce-cos-k8sbeta-alphafeatures
gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-alphafeatures
column_header:
- configuration_value: node_os_image
- configuration_value: master_os_image
- configuration_value: Commit
- configuration_value: infra-commit
- name: ci-kubernetes-e2e-gce-cos-k8sbeta-default
gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-default
column_header:
- configuration_value: node_os_image
- configuration_value: master_os_image
- configuration_value: Commit
- configuration_value: infra-commit
- name: ci-kubernetes-e2e-gce-cos-k8sbeta-ingress
gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-ingress
column_header:
- configuration_value: node_os_image
- configuration_value: master_os_image
- configuration_value: Commit
- configuration_value: infra-commit
- name: ci-kubernetes-e2e-gce-cos-k8sbeta-reboot
gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-reboot
column_header:
- configuration_value: node_os_image
- configuration_value: master_os_image
- configuration_value: Commit
- configuration_value: infra-commit
- name: ci-kubernetes-e2e-gce-cos-k8sbeta-serial
gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-serial
column_header:
- configuration_value: node_os_image
- configuration_value: master_os_image
- configuration_value: Commit
- configuration_value: infra-commit
- name: ci-kubernetes-e2e-gce-cos-k8sbeta-slow
gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-slow
column_header:
- configuration_value: node_os_image
- configuration_value: master_os_image
- configuration_value: Commit
- configuration_value: infra-commit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is auto-generated and will be back in place after uncommenting k8s-beta.

Comment on lines -81 to -83
- name: pull-kubernetes-e2e-gce-ubuntu
test_group_name: pull-kubernetes-e2e-gce-ubuntu
base_options: width=10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job is removed in 1.23 so we don't need this tab at all.

Comment on lines -32 to -34
- name: pull-e2e-podutil
test_group_name: pull-kubernetes-node-e2e-podutil
base_options: width=10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job is removed in 1.23 so we don't need this tab at all.

releng/README.md Outdated Show resolved Hide resolved
releng/test_config.yaml Show resolved Hide resolved
releng/test_config.yaml Show resolved Hide resolved
releng/test_config.yaml Show resolved Hide resolved
@ramrodo
Copy link
Member

ramrodo commented Apr 28, 2023

@xmudrii Thanks for the great explanation and links for reference! I understand better the version markers now.

I think that adding the TODOs comments to be aware when we release 1.28 will be a great guide.

Maybe we could add this issue/guide to the Handbook to have this issue present and be followed in the upcoming release.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2023
@dims
Copy link
Member

dims commented Apr 28, 2023

/approve

Thanks for the extensive notes. I do shudder at when we have to create 1.28 jobs for sure!

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2023
Signed-off-by: Marko Mudrinić <[email protected]>
@xmudrii
Copy link
Member Author

xmudrii commented Apr 28, 2023

I added TODO(1.28) comments where needed. I also created:

I also added this on meeting agenda for the SIG Release meeting planned for May 2nd.

@cici37 @ramrodo @dims PTAL again, I pushed a commit to add TODOs. When ready, please unhold the PR.

@dims
Copy link
Member

dims commented Apr 28, 2023

/lgtm

will leave it to release folks to unhold.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2023
@ramrodo
Copy link
Member

ramrodo commented Apr 28, 2023

Great! Everything looks good to me
/unhold
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, ramrodo, xmudrii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8a25ca6 into kubernetes:master Apr 28, 2023
@k8s-ci-robot
Copy link
Contributor

@xmudrii: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key release-1.23-windows-presubmits.yaml using file ``
  • key release-1.23-windows.yaml using file ``
  • key generated.yaml using file config/jobs/kubernetes/generated/generated.yaml
  • key 1.23.yaml using file ``
  • key 1.24.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.24.yaml
  • key 1.25.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.25.yaml
  • key 1.26.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.26.yaml
  • key 1.27.yaml using file config/jobs/kubernetes/sig-release/release-branch-jobs/1.27.yaml

In response to this:

Experimenting with removing Kubernetes 1.23 jobs (ref #29230, #29379, #29327)

This PR removes Kubernetes 1.23 jobs.

Disclaimer: this is done in a hacky way and I'll try my best to explain what's going on and what issues we have.

Version markers

For context, version markers are explained here: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/kubernetes-versions.md

We have the following version markers in play:

  • k8s-master
  • Current and expected usage: points to the master branch in k/k
  • k8s-beta
  • Current usage: points to the latest stable release in time frame from cutting the release branch to cutting the next release branch
  • Expected usage: points to the latest stable release in time frame from cutting the release branch to cutting the .0 release
  • k8s-stable1 to k8s-stable4
  • Current usage and expected usage: points to the appropriate stable release

The biggest issue that we have with k8s-beta is that we use it for way too long. It should be used only in the "transition period" between cutting the release branch and publishing the .0 release. However, we don't push out k8s-beta after .0 release, instead we continue using it until the next branch cut.

This can be quite confusing for those eventually depending on those markers. We discussed it previously on Slack and some idea was that we should fix this: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1680813414229639

Fixing this is going to be important especially for the next part.

Removing jobs for the EOL release branch

Removing jobs for the EOL release branch is often done by generating jobs for the new release branch + some manual clean ups. In other words, we basically did this: make -C releng prepare-release-branch

There are two problems with that:

  • this is going to also create jobs for 1.28 which we don't want at this point in the 1.28 release cycle
  • the logic behind this Make target is implemented on premises that k8s-beta is used all the time and not only in the transition period

When we run that command, the result will be:

  • 1.28 jobs are generated (these can be removed manually very easily)
  • 1.27 - 1.24 jobs are correctly pushed/rotated to stable1 - stable4
  • jobs in generated.yaml are updated
  • testgrid dashboards for 1.28 are generated

Bold steps are problematic:

  • generated.yaml is auto-generated and shouldn't be touched
  • even if we touch generated.yaml and remove 1.28 jobs, tests are going to fail because we have 1.28 testgrid dashboards that are not used at all

Mitigations

What we need actually is just config-rotator and regenerating generated.yaml. In other words, we don't need config-forker and all the logic that generates 1.28 jobs.

It's possible to execute both manually, however, before we can do that, we have to figure out how to approach k8s-beta jobs. The best idea I was able to come up is to just comment k8s-beta in releng/test_config.yaml. That way, regenerating generated.yaml is actually going to remove all k8sbeta tests which we in fact don't need at this stage because that marker shouldn't be in use until we don't cut 1.28 release branch.

With that done, the following commands can be used to do the magic:

go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.24.yaml -new stable4 -old stable3
go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.25.yaml -new stable3 -old stable2
go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.26.yaml -new stable2 -old stable1
go run ./releng/config-rotator -config-file $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/1.27.yaml -new stable1 -old beta

releng/generate_tests.py --yaml-config-path=/Users/marko/Projects/src/k8s.io/test-infra/releng/test_config.yaml

At this stage, we're almost ready. Concretely:

  • there are some testgrid dashboards that are not in use after removing 1.23 jobs and those dashboards must be removed
  • because we don't have k8s-beta jobs anymore we have to comment some dashboards
  • we need to clean up old jobs, i.e. remove 1.23.yaml and some Windows jobs

After that, we should be in a good state. 🎉

Generating 1.28 jobs

Generating 1.28 jobs is not going to be trivial. We can't use the Make target (make -C releng prepare-release-branch) because we already used config-rotator and using it again in the same release cycle would end up in skipping one version marker. For example, 1.28 would be added as k8s-beta, but 1.27 would end up jumping from stable1 to stable2. Instead, 1.27 should stay at stable1 until we don't push out 1.28 to stable1 next time we're removing jobs.

Instead of using the Make target, we have to:

  • uncommeting k8s-beta marker and relevant jobs and dashboards
  • run config-forker manually to create 1.28 jobs
  • regenerate generated.yaml so k8sbeta jobs are added
  • add testgrid dashboards for 1.28

In fact, most of those tasks should be pretty easy. The real challenge is making sure we adhere to this, i.e. that we don't forget about this issue. This is especially important because the handbook is instructing to use the Make target.

Follow ups

We have an issue with some fork annotations. Some of them are lost in 1.23.yaml and we need to look into that. That's mostly like irrelevant to this PR, it's mostly like that we forgot to update those annotations when introducing stable4 marker a while ago.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject area/testgrid cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants