Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-5307][DPE-5236] update tls certs on node port change + fix flakey TLS tests #35

Open
wants to merge 21 commits into
base: 6/edge
Choose a base branch
from

Conversation

MiaAltieri
Copy link
Collaborator

@MiaAltieri MiaAltieri commented Sep 9, 2024

Issue(s)

DPE-5236 update tls certs on node port change

  1. TLS does not support updating certs on the addition of external connections
  2. TLS does not support updating certs on the removal of external connections
  3. TLS does not support updating certs on the change of k8s IP addresses

DPE-5307 fix flakey TLS tests

  1. TLS tests have a flakey performance

Solution

  1. update tls certs on node port change
    Generate new CSRs when 1/2/3 happens

  2. fix flakey TLS tests

  • update tests to work on K8s
  • update tests to work on routers

@MiaAltieri MiaAltieri changed the title Dpe 5236 update tls certs on node port change WIP - please do not reivew Dpe 5236 update tls certs on node port change Sep 9, 2024
@MiaAltieri MiaAltieri marked this pull request as draft September 9, 2024 14:24
@MiaAltieri MiaAltieri changed the title WIP - please do not reivew Dpe 5236 update tls certs on node port change [DEP-5236] update tls certs on node port change Sep 12, 2024
@MiaAltieri MiaAltieri changed the title [DEP-5236] update tls certs on node port change [DPE-5236] update tls certs on node port change Sep 12, 2024
@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch from 25f4489 to 86b7310 Compare September 12, 2024 12:42
@MiaAltieri MiaAltieri changed the title [DPE-5236] update tls certs on node port change [DPE-5307][DPE-5236] update tls certs on node port change + fix flakey TLS tests Sep 12, 2024
src/node_port.py Outdated
@@ -4,13 +4,14 @@

"""Manager for handling mongos Kubernetes resources for a single mongos pod."""

from typing import Optional
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have reviewed node port for external clients then these changes are the same

@@ -97,45 +99,55 @@ async def toggle_tls_mongos(
f"{certs_app_name}:{CERT_REL_NAME}",
)

await ops_test.model.wait_for_idle(apps=[MONGOS_APP_NAME], idle_period=30)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary to fix flakey TLS tests


return (
f"{MONGO_SHELL} '{client_uri}' --eval 'sh.status()'"
f"{MONGO_SHELL} '{client_uri}' --eval 'db.getUsers()'"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary to fix flakey TLS tests. sh.status is a privileged command

@@ -58,9 +62,62 @@ async def test_mongos_tls_enabled(ops_test: OpsTest) -> None:
)

await integrate_cluster_with_tls(ops_test)
await ops_test.model.wait_for_idle(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary to fix flakey TLS tests

@@ -81,12 +137,15 @@ async def test_mongos_tls_disabled(ops_test: OpsTest) -> None:
timeout=300,
)

await check_mongos_tls_disabled(ops_test)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary to fix flakey TLS tests - shouldn't check if disabled until model is stable

@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch from 55d3fdd to f3a0228 Compare September 12, 2024 12:59
@MiaAltieri MiaAltieri marked this pull request as ready for review September 12, 2024 13:03
@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch from 722bf01 to 8938c34 Compare September 12, 2024 14:06
lib/charms/mongodb/v1/mongodb_tls.py Outdated Show resolved Hide resolved
src/charm.py Show resolved Hide resolved
src/charm.py Outdated Show resolved Hide resolved
src/charm.py Outdated
Comment on lines 462 to 494
def get_k8s_mongos_hosts(self) -> Set:
"""Returns the K8s hosts for mongos"""
hosts = set()
for unit in self.get_units():
hosts.add(self.get_k8s_mongos_host(unit))

return hosts

def get_ext_mongos_host(self, unit: Unit, incl_port=True) -> str | None:
"""Returns the ext hosts for mongos on the provided unit."""
if not self.is_external_client:
return None

try:
unit_ip = self.node_port_manager.get_node_ip(unit.name)
if not incl_port:
return unit_ip

unit_port = self.node_port_manager.get_node_port(
port_to_match=Config.MONGOS_PORT, unit_name=unit.name
)
if unit_ip and unit_port:
return "{unit_ip}:{unit_port}"
else:
raise NoExternalHostError(f"No external host for unit {unit.name}")
except (
NoExternalHostError,
FailedToFindNodePortError,
FailedToFindServiceError,
) as e:
raise FailedToGetHostsError(
"Failed to retrieve external hosts due to %s", e
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I strongly recommend you sort() your hosts.

Off the top of my head, I see that you use these hosts in the mongos_config. Not sure what you do with this, but let's assume you use them in some env-var during service runtime:

HOSTS=host1,host2,host3 start-mongos.sh

Pebble replan looks for new runtime args, so if you now get:

HOSTS=host3,host1,host2 start-mongos.sh

you're going to restart without needing to.

Copy link
Collaborator Author

@MiaAltieri MiaAltieri Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fortunately we do not provide hosts to pebble plan! But I suppose we do supply them to clients, I suppose we could lead to unnecessary relation-changed events. However, we use Sets for hosts since we do many set operations on our hosts in different handlers and python sets make no gaurentee on ordering so I am not sure this suggestion is feasible?

Copy link

@marcoppenheimer marcoppenheimer Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look deeply, but if you do:

hosts = set(["1", "2", "3"])
relation.data[some.app].update({"hosts": ",".join(hosts)})

You'll get a random order, and have extraneous relation-changed events.
Well, strictly speaking you won't have random ordering WITHIN an event, but you will for a different event:

hosts = set(["1", "2", "3"])
print(",".join(hosts))
hosts = set(["3", "1", "2"])
print(",".join(hosts))

Because the sets in the above example are assigned to the same memory address, they 'coincidentally' end up evaluating the same due to a quirk of python's collision resolution (essentially it's the same stuff in the set, so just returns the original set).

But if you re-run it in a different Python execution (aka charm event), the order will now be different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see , thank you very much @marcoppenheimer ! I think we ought to handle this in where we update hosts (the provider lib) which is on the mongodb-vm operator GitHub, I added an issue for it

src/charm.py Show resolved Hide resolved
src/charm.py Outdated Show resolved Hide resolved
src/node_port.py Outdated Show resolved Hide resolved
src/node_port.py Outdated Show resolved Hide resolved
src/charm.py Show resolved Hide resolved
@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch from 86d8690 to 4d31669 Compare September 17, 2024 12:27
@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch from 4727f5c to 4d31669 Compare September 26, 2024 15:52
@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch 3 times, most recently from e4975cd to 938704c Compare September 27, 2024 16:48
@MiaAltieri MiaAltieri force-pushed the DPE-5236-update-tls-certs-on-node-port-change branch from 938704c to 1f63b30 Compare September 27, 2024 17:01
Copy link

@marcoppenheimer marcoppenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as it all looks kosher. Left a minor comment about the pebble restart part, but great job!

Comment on lines 391 to +392
container.replan()
container.restart(Config.SERVICE_NAME)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This will cause a double restart, is that what you want?

If the new layer is not the same as the original, pebble replan --> pebble stop; pebble start. Then you do pebble restart again. It's probably not the end of the world, but you might want to just have container.restart here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants