Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require the latest testcloud with the race fix #3244

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

psss
Copy link
Collaborator

@psss psss commented Sep 26, 2024

Let's try if it's better now!

Copy link

Failed to load packit config file:

Please correct data and retry.

For more info, please check out the documentation or contact the Packit team. You can also use our CLI command validate-config or our pre-commit hooks for validation of the configuration.

2 similar comments
Copy link

Failed to load packit config file:

Please correct data and retry.

For more info, please check out the documentation or contact the Packit team. You can also use our CLI command validate-config or our pre-commit hooks for validation of the configuration.

Copy link

Failed to load packit config file:

Please correct data and retry.

For more info, please check out the documentation or contact the Packit team. You can also use our CLI command validate-config or our pre-commit hooks for validation of the configuration.

@psss psss added the ci | full test Pull request is ready for the full test execution label Sep 26, 2024
@psss psss force-pushed the testcloud-race-condition-fix branch from 3a0c785 to cf841b7 Compare September 26, 2024 15:26
Copy link

Failed to load packit config file:

Cannot parse package config. ValidationError({'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}}), 'jobs': {0: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 2: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 7: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 8: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 9: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."], 'dist_git_branches': ['Expected \'list[str]\' or \'dict[str, dict[str, list]]\', got [{\'fedora-40\': {\'additional_repos\': [\'https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/\']}}] (type <class \'list\'>).\nExample -> "dist_git_branches": {"fedora-rawhide": {"fast_forward_merge_into": ["f40"]}, epel9: {}}}']}}})}, 10: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."], 'dist_git_branches': ['Expected \'list[str]\' or \'dict[str, dict[str, list]]\', got [{\'fedora-40\': {\'additional_repos\': [\'https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/\']}}] (type <class \'list\'>).\nExample -> "dist_git_branches": {"fedora-rawhide": {"fast_forward_merge_into": ["f40"]}, epel9: {}}}']}}})}, 11: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}}})

For more info, please check out the documentation or contact the Packit team. You can also use our CLI command validate-config or our pre-commit hooks for validation of the configuration.

2 similar comments
Copy link

Failed to load packit config file:

Cannot parse package config. ValidationError({'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}}), 'jobs': {0: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 2: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 7: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 8: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 9: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."], 'dist_git_branches': ['Expected \'list[str]\' or \'dict[str, dict[str, list]]\', got [{\'fedora-40\': {\'additional_repos\': [\'https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/\']}}] (type <class \'list\'>).\nExample -> "dist_git_branches": {"fedora-rawhide": {"fast_forward_merge_into": ["f40"]}, epel9: {}}}']}}})}, 10: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."], 'dist_git_branches': ['Expected \'list[str]\' or \'dict[str, dict[str, list]]\', got [{\'fedora-40\': {\'additional_repos\': [\'https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/\']}}] (type <class \'list\'>).\nExample -> "dist_git_branches": {"fedora-rawhide": {"fast_forward_merge_into": ["f40"]}, epel9: {}}}']}}})}, 11: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}}})

For more info, please check out the documentation or contact the Packit team. You can also use our CLI command validate-config or our pre-commit hooks for validation of the configuration.

Copy link

Failed to load packit config file:

Cannot parse package config. ValidationError({'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}}), 'jobs': {0: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 2: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 7: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 8: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}, 9: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."], 'dist_git_branches': ['Expected \'list[str]\' or \'dict[str, dict[str, list]]\', got [{\'fedora-40\': {\'additional_repos\': [\'https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/\']}}] (type <class \'list\'>).\nExample -> "dist_git_branches": {"fedora-rawhide": {"fast_forward_merge_into": ["f40"]}, epel9: {}}}']}}})}, 10: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."], 'dist_git_branches': ['Expected \'list[str]\' or \'dict[str, dict[str, list]]\', got [{\'fedora-40\': {\'additional_repos\': [\'https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/\']}}] (type <class \'list\'>).\nExample -> "dist_git_branches": {"fedora-rawhide": {"fast_forward_merge_into": ["f40"]}, epel9: {}}}']}}})}, 11: {'packages': defaultdict(<class 'dict'>, {'tmt': {'value': {'targets': ["Expected 'list[str]' or 'dict[str,dict]', got [{'fedora-40': {'additional_repos': ['https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/']}}] (type <class 'list'>)."]}}})}}})

For more info, please check out the documentation or contact the Packit team. You can also use our CLI command validate-config or our pre-commit hooks for validation of the configuration.

@psss psss force-pushed the testcloud-race-condition-fix branch from cf841b7 to 87573ee Compare September 26, 2024 15:30
@psss psss added the plugin | testcloud The testcloud virtual provision plugin label Sep 26, 2024
@psss psss linked an issue Sep 26, 2024 that may be closed by this pull request
@psss
Copy link
Collaborator Author

psss commented Sep 30, 2024

@thrix, according to the pipeline.log the artifacts seems to be set up correctly:

"artifacts": [
    {
        "id": "8083948:fedora-40-x86_64",
        "install": true,
        "order": 30,
        "packages": [
            "tmt+report-junit-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+provision-beaker-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+test-convert-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+all-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+provision-virtual-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+export-polarion-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+provision-container-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch",
            "tmt+report-polarion-1.37.dev888-1.20240927082453070315.pr3244.36.g4ee23632.fc40.noarch"
        ],
        "type": "fedora-copr-build"
    },
    {
        "id": "https://download.copr.fedorainfracloud.org/results/frantisekz/testcloud-wip/fedora-40-x86_64/",
        "install": true,
        "order": 20,
        "packages": null,
        "type": "repository"
    }
],

Any idea why the installation still fails? And picks some old build from the testcloud copr repo?

@psss psss added this to the 1.38 milestone Oct 1, 2024
@psss
Copy link
Collaborator Author

psss commented Oct 1, 2024

@frantisekz, seems we finally succeeded in executing the tests! But quite many of them are red. Most of them related to reboot. But I can see one Failed to connect in 300s. error in /tests/prepare/multihost as well.

@frantisekz
Copy link
Collaborator

frantisekz commented Oct 3, 2024

I am digging and experimenting through, will pick my brain here:

What (I may understand wrong) seems weird... (I'll use my specific paths for the VM that actually failed and ssh attempt would reject ssh auth)

if I compare public ssh keys given to the instance's cloud-init via user-data (/var/tmp/tmt/testcloud/instances/tmt-065-tshRMCFO/meta/user-data) and stored locally in /var/tmp/tmt/run-065/tests/example/plan/provision/default-2 they sometimes differ, eg.

$ cd /var/tmp/tmt/testcloud/instances/tmt-065-tshRMCFO/meta
$ cat user-data
<snip>
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBAdmJ8+8ZqReCTQnRLSzriNrvNlP8Nmj7TgLJ9m5hbl+A70gOF5JO9i+St37wMXyUg3k/RFMHnjSC1jgonKwKM0= fanys@fanys-laptop
<snip>
$ cd /var/tmp/tmt/run-065/tests/example/plan/provision/default-2
$ cat id_ecdsa.pub
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOFLzOoR+i+kahmFTyT/GjeUZMCE3Spgce8r5jTkp2p6HY2vm5Wmx24yzsesPHOp/wvVei0wBz7gqwGqTNo7B9I= fanys@fanys-laptop

I've double checked this should be the same instance in the multihost run:

(Pdb) self.key
[Path('/var/tmp/tmt/run-065/tests/example/plan/provision/default-2/id_ecdsa')]
(Pdb) self.instance_name
'tmt-065-tshRMCFO'

It's possible that this sn't the culprit, that I am missing something and it's expected to behave this way... but confirmation/denial would be welcome ☺️

@psss
Copy link
Collaborator Author

psss commented Oct 4, 2024

Hmmm, sounds weird. @happz, do you think it could be possible that the ssh keys are somehow exchanged/misplaced between the guest instances?

@happz
Copy link
Collaborator

happz commented Oct 4, 2024

Hmmm, sounds weird. @happz, do you think it could be possible that the ssh keys are somehow exchanged/misplaced between the guest instances?

That's definitely one possible cause, but I don't see how it could happen. E.g. generated SSH key pair should be in guest's own workdir, e.g. /var/tmp/tmt/run-017/plan/provision/curl-client vs /var/tmp/tmt/run-017/plan/provision/wget-client.

I'd try to add more debugging lines, printing more paths and contents, trying to pinpoint on which side of the border the swap happens. We can also add guest name into the pubkey by adding "-C", self.safe_name to keygen command.

But: SSH key is generated, stored in files, then self.config.USER_DATA = Template(USER_DATA).safe_substitute(user_name=self.user, public_key=public_key) is called. self.config is created in GuestTestCloud.prepare_config() by calling self.config = testcloud.config.get_config(). testcloud.config.get_config() does this:

def get_config():
    '''Retrieve a config instance. If a config instance has already been parsed,
    reuse that parsed instance.

    :return: :class:`.ConfigData` containing configuration values
    '''

    global _config
    if not _config:
        _config = _parse_config()
    return _config

Isn't it returning a shared, mutable object to all calls once it's created? Aren't all subsequent calls to get_config() getting the very same instance, and therefore overwriting the very same USER_DATA attribute?

@happz
Copy link
Collaborator

happz commented Oct 4, 2024

Yep, I added a guest name to SSH key file, plus a barrier after setting self.config.USER_DATA - once all threads get there, they wait for the rest of the flock before moving on, therefore we should see all three threads eventually see the very same user data, those set by the last thread setting its self.config.USER_DATA.

httpd-server - before the barrier:
#cloud-config
chpasswd:
  list: |
    root:${password}
  expire: false
users:
  - default
  - name: root
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 ...8QUENe70= httpd-server

ssh_pwauth: true
disable_root: false
runcmd:
${runcommands}

wget-client - before the barrier:
#cloud-config
chpasswd:
  list: |
    root:${password}
  expire: false
users:
  - default
  - name: root
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 ...HZLZfZvE= wget-client

ssh_pwauth: true
disable_root: false
runcmd:
${runcommands}

curl-client - before the barrier:
#cloud-config
chpasswd:
  list: |
    root:${password}
  expire: false
users:
  - default
  - name: root
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 ...alrH5Zi4= curl-client

ssh_pwauth: true
disable_root: false
runcmd:
${runcommands}

curl-client - after the barrier:
#cloud-config
chpasswd:
  list: |
    root:${password}
  expire: false
users:
  - default
  - name: root
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 ...alrH5Zi4= curl-client

ssh_pwauth: true
disable_root: false
runcmd:
${runcommands}

wget-client - after the barrier:
#cloud-config
chpasswd:
  list: |
    root:${password}
  expire: false
users:
  - default
  - name: root
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 ...alrH5Zi4= curl-client

ssh_pwauth: true
disable_root: false
runcmd:
${runcommands}

httpd-server - after the barrier:
#cloud-config
chpasswd:
  list: |
    root:${password}
  expire: false
users:
  - default
  - name: root
ssh_authorized_keys:
  - ecdsa-sha2-nistp256 ...alrH5Zi4= curl-client

ssh_pwauth: true
disable_root: false
runcmd:
${runcommands}

As you can see, curl-client won, and forced the other two guests to use incorrect SSH keys.

diff --git a/tmt/steps/provision/testcloud.py b/tmt/steps/provision/testcloud.py
index 0e12dfaa..cdcc5ff9 100644
--- a/tmt/steps/provision/testcloud.py
+++ b/tmt/steps/provision/testcloud.py
@@ -12,6 +12,7 @@ import types
 from collections.abc import Iterator
 from string import Template
 from typing import TYPE_CHECKING, Any, Optional, Union, cast
+import threading
 
 import click
 import pint
@@ -39,6 +40,9 @@ if TYPE_CHECKING:
     from tmt.hardware import Size
 
 
+BARRIER = threading.Barrier(3)
+
+
 libvirt: Optional[types.ModuleType] = None
 testcloud: Optional[types.ModuleType] = None
 
@@ -759,7 +763,7 @@ class GuestTestcloud(tmt.GuestSsh):
             self.debug('Generating an ssh key.')
             key_name = f"id_{key_type if key_type is not None else 'rsa'}"
             self.key = [self.workdir / key_name]
-            command = Command("ssh-keygen", "-f", self.key[0], "-N", "")
+            command = Command("ssh-keygen", "-f", self.key[0], "-N", "", '-C', self.safe_name)
             if key_type is not None:
                 command += Command("-t", key_type)
             self._run_guest_command(command, silent=True)
@@ -773,6 +777,14 @@ class GuestTestcloud(tmt.GuestSsh):
         self.config.COREOS_DATA = Template(COREOS_DATA).safe_substitute(
             user_name=self.user, public_key=public_key)
 
+        print(f'{id(self.config)}')
+
+        print(f'{self.name} - before the barrier:' + '\n' + self.config.USER_DATA)
+
+        BARRIER.wait()
+
+        print(f'{self.name} - after the barrier:' + '\n' + self.config.USER_DATA)
+
     def prepare_config(self) -> None:
         """ Prepare common configuration """
         import_testcloud()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci | full test Pull request is ready for the full test execution plugin | testcloud The testcloud virtual provision plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in the testcloud plugin
3 participants