Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-leader unit stuck with hook failed: "certificates-relation-changed" for self-signed-certificates:certificates" #268

Open
jeffreychang911 opened this issue Jul 22, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jeffreychang911
Copy link

jeffreychang911 commented Jul 22, 2024

Steps to reproduce

  1. SolQA deploys Charmed Kubernetes 1.28 on AWS, and then Mongodb-K8s.
  2. 2 out of 3 mongodb-k8s nodes blocked with "hook failed: "certificates-relation-changed" for self-signed-certificates:certificates", and it won't settle after timeout in 1 hr.

Expected behavior

Actual behavior

Versions

Operating system:

Juju CLI: 3.5.2

Juju agent: 3.5.2

Charm revision: rev 43 on 6/edge

charmed kubernetes 1.28

Log output

Juju debug log:

unit-self-signed-certificates-0: 2024-07-20 02:47:01 INFO juju.worker.uniter.operation ran "certificates-relation-changed" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 2024-07-20 02:47:02 INFO unit.mongodb-k8s/0.juju-log certificates:1: Restarting mongod with TLS enabled.
unit-mongodb-k8s-0: 2024-07-20 02:47:02 INFO unit.mongodb-k8s/0.juju-log certificates:1: Deleting TLS certificate from workload container
unit-mongodb-k8s-0: 2024-07-20 02:47:02 ERROR unit.mongodb-k8s/0.juju-log certificates:1: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/./src/charm.py", line 1245, in <module>
    main(MongoDBCharm)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/main.py", line 548, in main
    manager.run()
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/main.py", line 527, in run
    self._emit()
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/main.py", line 516, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/lib/charms/tls_certificates_interface/v3/tls_certificates.py", line 1900, in _on_relation_changed
    self.on.certificate_available.emit(
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/lib/charms/mongodb/v0/mongodb_tls.py", line 225, in _on_certificate_available
    self.charm.delete_tls_certificate_from_workload()
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/./src/charm.py", line 1058, in delete_tls_certificate_from_workload
    container.remove_path(f"{Config.CONF_DIR}/{file}")
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/model.py", line 2785, in remove_path
    self._pebble.remove_path(str(path), recursive=recursive)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/pebble.py", line 2529, in remove_path
    resp = self._request('POST', '/v1/files', None, body)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/pebble.py", line 1859, in _request
    response = self._request_raw(method, path, query, headers, data)
  File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/pebble.py", line 1912, in _request_raw
    raise ConnectionError(
ops.pebble.ConnectionError: Could not connect to Pebble: socket not found at '/charm/containers/mongod/pebble.socket' (container restarted?)

Additional context

SolQA testrun - https://solutions.qa.canonical.com/testruns/5cac57f9-8c93-43e5-bc0e-bbbbf1098c82
Juju crashdump - https://oil-jenkins.canonical.com/artifacts/5cac57f9-8c93-43e5-bc0e-bbbbf1098c82/generated/generated/mongodb-k8s/crashdump-2024-07-20-03.49.02.tar.gz

@jeffreychang911 jeffreychang911 added the bug Something isn't working label Jul 22, 2024
Copy link

@jeffreychang911 jeffreychang911 changed the title Uncaught exception ops.pebble.ConnectionError non-leader unit stuck with hook failed: "certificates-relation-changed" for self-signed-certificates:certificates" Jul 22, 2024
@Gu1nness
Copy link
Contributor

Gu1nness commented Aug 1, 2024

I took the time to investigate and gathered some information.
We just merged a PR that should fix it : #288
When this is released and we have a new version ready for deployment, we can retry the bench @jeffreychang911

@jeffreychang911
Copy link
Author

Tested with revision 50 in this run, and crashdump.

I still see same error in original descriptions, and some new error below

unit-mongodb-k8s-2: 2024-08-20 19:40:51 ERROR unit.mongodb-k8s/2.juju-log certificates:1: Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/./src/charm.py", line 1555, in
main(MongoDBCharm)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/main.py", line 551, in main
manager.run()
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/main.py", line 530, in run
self._emit()
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/main.py", line 519, in _emit
_emit_charm_event(self.charm, self.dispatcher.event_name)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/main.py", line 147, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/framework.py", line 348, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/framework.py", line 860, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/framework.py", line 950, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/lib/charms/tls_certificates_interface/v3/tls_certificates.py", line 1911, in _on_relation_changed
self.on.certificate_available.emit(
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/framework.py", line 348, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/framework.py", line 860, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/framework.py", line 950, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/lib/charms/mongodb/v1/mongodb_tls.py", line 228, in _on_certificate_available
self.charm.restart_charm_services()
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/./src/charm.py", line 1212, in restart_charm_services
container.replan()
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/model.py", line 2259, in replan
self._pebble.replan_services()
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/pebble.py", line 2129, in replan_services
return self._services_action('replan', [], timeout, delay)
File "/var/lib/juju/agents/unit-mongodb-k8s-2/charm/venv/ops/pebble.py", line 2226, in _services_action
raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:

  • Start service "mongod" (cannot start service: exited quickly with code 1)
    ----- Logs from task 0 -----
    2024-08-20T19:40:51Z INFO Most recent service output:
    {"t":{"$date":"2024-08-20T19:40:51.261Z"},"s":"I", "c":"CONTROL", "id":5760901, "ctx":"-","msg":"Applied --setParameter options","attr":{"serverParameters":{"processUmask":{"default":63,"value":31}}}}
    {"t":{"$date":"2024-08-20T19:40:51.261Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":38,"codeName":"FileNotOpen","errmsg":"Can't initialize rotatable log file :: caused by :: Failed to open /var/log/mongodb/mongodb.log"}}}
    2024-08-20T19:40:51Z ERROR cannot start service: exited quickly with code 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants