Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable: new release on 2023-09-05 (38.20230819.3.0) #756

Closed
42 of 43 tasks
marmijo opened this issue Aug 22, 2023 · 10 comments
Closed
42 of 43 tasks

stable: new release on 2023-09-05 (38.20230819.3.0) #756

marmijo opened this issue Aug 22, 2023 · 10 comments

Comments

@marmijo
Copy link
Member

marmijo commented Aug 22, 2023

First, verify that you meet all the prerequisites

Edit the issue title to include today's date. Once the pipeline spits out the new version ID, you can append it to the title e.g. (31.20191117.3.0).

Pre-release

Promote testing changes to stable

Manual alternative

Sometimes you need to run the process manually like if you need to add an extra commit to change something in manifest.yaml. The steps for this are:

  • git fetch upstream
  • git checkout stable
  • git reset --hard upstream/stable
  • /path/to/fedora-coreos-releng-automation/scripts/promote-config.sh testing
  • Open PR against the stable branch on https://github.com/coreos/fedora-coreos-config

Build

Sanity-check the build

Using the the build browser for the stable stream:

  • Verify that the parent commit and version match the previous stable release (in the future, we'll want to integrate this check in the release job)
    • x86_64
    • aarch64
    • ppc64le
    • s390x
  • Check kola extended upgrade runs to make sure they didn't fail
    • x86_64
    • aarch64
    • ppc64le
    • s390x
  • Check kola AWS runs to make sure they didn't fail
    • x86_64
    • aarch64
  • Check kola OpenStack runs to make sure they didn't fail
    • x86_64
    • aarch64
  • Check kola Azure run to make sure it didn't fail
    • x86_64
  • Check kola GCP runs to make sure they didn't fail
    • x86_64
    • aarch64

⚠️ Release ⚠️

IMPORTANT: this is the point of no return here. Once the OSTree commit is
imported into the unified repo, any machine that manually runs rpm-ostree upgrade will have the new update.

Run the release job

  • Run the release job, filling in for parameters stable and the new version ID
  • Post a link to the job as a comment to this issue
  • Wait for job to finish

At this point, Cincinnati will see the new release on its next refresh and create a corresponding node in the graph without edges pointing to it yet.

Refresh metadata (stream and updates)

  • Wait for all releases that will be released simultaneously to reach this step in the process
  • Go to the rollout workflow, click "Run workflow", and fill out the form
Rollout general guidelines
Risk Day of the week Rollout Start Time Time allocation
risky Tuesday 2PM UTC 72H
common Tuesday 2PM UTC 48H
rapid Tuesday 2PM UTC 24H

When setting a rollout start time ask "when would be the best time to react to
any errors or regressions from updates?". Commonly we select 2PM UTC so that the
rollout's start at 10am EST(±1 for daylight savings), but these can be fluid and
adjust after talking with the fedora-coreos IRC. Note, this is impacted by the
day of the week and holidays.

The later in the week the release gets held up due to unforeseen issues the more
likely the rollout time allocation will need to shrink or the release will need
to be deferred.

Manual alternative
  • Make sure your fedora-coreos-stream-generator binary is up-to-date.

From a checkout of this repo:

  • Update stream metadata, by running:
fedora-coreos-stream-generator -releases=https://fcos-builds.s3.amazonaws.com/prod/streams/stable/releases.json  -output-file=streams/stable.json -pretty-print
  • Add a rollout. For example, for a 48-hour rollout starting at 10 AM ET the same day, run:
./rollout.py add stable <version> "10 am ET today" 48
  • Commit the changes and open a PR against the repo
Update graph manual check
curl -H 'Accept: application/json' 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=x86_64&stream=stable&rollout_wariness=0'
curl -H 'Accept: application/json' 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=aarch64&stream=stable&rollout_wariness=0'
curl -H 'Accept: application/json' 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=ppc64le&stream=stable&rollout_wariness=0'
curl -H 'Accept: application/json' 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=s390x&stream=stable&rollout_wariness=0'

NOTE: In the future, most of these steps will be automated.

Housekeeping

  • If one doesn't already exist, open an issue in this repo for the next release in this stream. Use the approximate date of the release in the title.
  • Issues opened via the previous link will automatically create a linked Jira card. Assign the GitHub issue and Jira card to the next person in the rotation.
@dustymabe
Copy link
Member

dustymabe commented Aug 30, 2023

For this round we are going to do a normal promotion but then pin the kernel on the existing versions currently in stable 38.20230806.3.0.

i.e. when we ship this update the kernel should not have changed from last time and the values should be:

  • ppc64le: kernel-6.3.12-200.fc38
  • else: kernel-6.4.7-200.fc38

This is due to coreos/fedora-coreos-tracker#1555

@dustymabe
Copy link
Member

@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

Arch Latest link Result
x86_64 Build ✔️
aarch64 Build ✔️
ppc64le Build ✔️
s390x Build ✔️

@c4rt0 c4rt0 changed the title stable: new release on 2023-09-04 stable: new release on 2023-09-05 (38.20230819.3.0) Sep 5, 2023
@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

Check kola Azure run to make sure it didn't fail

Result: FAIL. Known issue, see: coreos/fedora-coreos-tracker#1553

@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

Release job

@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

The release process presents a warning, summarising the release as unstable.
Publish stage returns an error:

[2023-09-05T14:03:06.331Z] 2023-09-05T14:03:06Z plume: 
couldn't publish image in ap-northeast-3: couldn't grant launch permission on ami-020bcd0304c8ef6bb: 
ResourceLimitExceeded: You have reached your quota of 430 for the number of public images allowed in this Region. 
Deregister unused public images or make them private, or request an increase in your public AMIs quota.

[2023-09-05T14:03:06.331Z] 	status code: 400, request id: a4298eeb-4c87-4e00-9c8d-cf8d1ca8ac13

Out of all regions, only ap-northeast-3 returns ResourceLimitExceeded for all three releases (stable, next & testing).

@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

After discussing the above with other team members, I am proceeding with the rollout.

@dustymabe
Copy link
Member

Out of all regions, only ap-northeast-3 returns ResourceLimitExceeded for all three releases (stable, next & testing).

I fixed this by removing launch permissions from some old images and then marking the image public:

aws ec2 modify-image-attribute --launch-permission 'Add=[{Group=all}]' --image-id ami-020bcd0304c8ef6bb

@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

Rollout was failing due to an error at the 'Open pull request' stage:

  Attempting creation of pull request
  Error: invalid json response body at https://api.github.com/repos/coreos/fedora-coreos-streams/pulls reason: Unexpected end of JSON input

PR with solution

@c4rt0
Copy link
Member

c4rt0 commented Sep 5, 2023

Rollout PR

@c4rt0 c4rt0 closed this as completed Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants