Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ceph device persistent #773

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bogdando
Copy link
Contributor

Persist /dev/vg2/data-lv2 as a systemd service to align ceph deployment with that ci-framework does for Ceph deployment on standalone tripleo.

Add standalone_revert.sh script to ensure the time is synchronized and /dev/vg2/data-lv2 device is recreated, after restoring VM from the clean snapshot.

Add env vars to allow ssh commands functional after revert is done (for Makefile targets standalone_deploy and standalone_revert).

@openshift-ci openshift-ci bot requested review from fao89 and karelyatin March 13, 2024 16:39
@bogdando bogdando requested review from jistr and fultonj and removed request for fao89 and karelyatin March 13, 2024 16:40
@fao89
Copy link
Contributor

fao89 commented Mar 13, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Mar 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bogdando, fao89

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/e0d81657cbbb4bb58bb401a70ffd2cb1

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 30m 07s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 10m 59s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 05m 36s
cifmw-data-plane-adoption-osp-17-to-extracted-crc FAILURE in 43m 58s

@bogdando
Copy link
Contributor Author

recheck rdoproject.org/github-check no logs

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/cfab01e44e094bf894303b55d0bf4181

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 05s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 18m 56s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 05m 55s
cifmw-data-plane-adoption-osp-17-to-extracted-crc FAILURE in 42m 16s

@bogdando
Copy link
Contributor Author

recheck rdoproject.org/github-check no logs

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/962de64242fb4cb49414dadddd259a2d

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 34m 15s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 13m 06s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 04m 50s
cifmw-data-plane-adoption-osp-17-to-extracted-crc FAILURE in 43m 18s

Persist /dev/vg2/data-lv2 as a systemd service to align ceph deployment
with that ci-framework does for Ceph deployment on standalone tripleo.

Add standalone_revert.sh script to ensure the time is synchronized
and /dev/vg2/data-lv2 device is recreated, after restoring VM from
the clean snapshot.

Add env vars to allow ssh commands functional after revert is done
(for Makefile targets standalone_deploy and standalone_revert).

Signed-off-by: Bohdan Dobrelia <[email protected]>
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/366fb829535a4b09908840fe0f7ac8f3

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 15s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 09m 48s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 06m 45s
cifmw-data-plane-adoption-osp-17-to-extracted-crc FAILURE in 46m 45s

@fao89
Copy link
Contributor

fao89 commented Mar 20, 2024

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/b48e21eadf2d437ba33b404f0e24709c

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 52m 17s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 20m 31s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 13m 51s
cifmw-data-plane-adoption-osp-17-to-extracted-crc FAILURE in 42m 40s

cat /tmp/ceph-osd-losetup.service | sudo tee /etc/systemd/system/ceph-osd-losetup.service
sudo chmod 0644 /etc/systemd/system/ceph-osd-losetup.service
sudo systemctl daemon-reload
sudo systemctl enable --now ceph-osd-losetup.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? In my env i can snapshot-restore using the make targets and the loopback device is still present. Are you snapshotting some other way?

Copy link
Contributor Author

@bogdando bogdando Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing shows that ensuring the unit is started after reboot is sufficient (and having it stopped results in missing loop device). Reverting doesn't cause problems here. So I will rework this

SSH_OPT="-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i $SSH_KEY_FILE"

virsh snapshot-revert --domain edpm-compute-${EDPM_COMPUTE_SUFFIX} --snapshotname clean
ssh $SSH_OPT root@$IP systemctl stop chronyd ';' chronyd -q \'pool pool.ntp.org iburst\' ';' systemctl start chronyd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on setting the clock, which doesn't seem to be done automatically after reverting from the snapshot.

But why the manual setting with pool.ntp.org? That will get blocked inside our network. For me just systemctl restart chronyd seems to bring the clock up to date (and it's using the configured NTP server).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I think we can split this off into a different patch

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants