Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

o/hookstate/ctlcmd: queue service commands if run from default-configure hook #13960

Merged

Conversation

ZeyadYasser
Copy link
Contributor

This PR queues snapctl services' commands to be run after the default-configure similar to what is done for configure hook #4070. This is to avoid a problem where the service doesn't see a new value if it uses snapctl get ..., because it's still not committed by default-configure hook similar to what was described in LP#2047949 .

Fixes: https://bugs.launchpad.net/snapd/+bug/2047949

@github-actions github-actions bot added the Needs Documentation -auto- Label automatically added which indicates the change needs documentation label May 10, 2024
@ZeyadYasser ZeyadYasser force-pushed the default-configure-hook-queue-services branch from 153e4fd to 757129b Compare May 10, 2024 12:22
zyga
zyga previously approved these changes May 22, 2024
Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Please make sure to link updated documentation on the snapcraft forum.

@ZeyadYasser
Copy link
Contributor Author

Please make sure to link updated documentation on the snapcraft forum.

This fix just makes the implementation consistent with the documented behavior https://snapcraft.io/docs/supported-snap-hooks#heading--default-configure and the snapctl docs don't mention this behavior at all.

Do you have a specific doc page in mind that would need updating?

@ZeyadYasser ZeyadYasser marked this pull request as draft May 27, 2024 14:06
@ernestl
Copy link
Collaborator

ernestl commented May 27, 2024

It's not clear why the user still wants to start services from within the default configure hook. My understanding is the the reason for this was specific to the order of configure-hook VS starting services as part of the do-install flow which should not be required anymore unless I am missing something.

A secondary more subtle point, it does not seem like a good idea to start/interfere with service commands before do-install start service is done. Longer term we might want explicitly limit snapctl powers depending on how far the install sequence progressed to avoid creative permutations that cause problems.

@ernestl ernestl marked this pull request as ready for review May 27, 2024 15:07
@ernestl
Copy link
Collaborator

ernestl commented May 27, 2024

It's not clear why the user still wants to start services from within the default configure hook. My understanding is the the reason for this was specific to the order of configure-hook VS starting services as part of the do-install flow which should not be required anymore unless I am missing something.

A secondary more subtle point, it does not seem like a good idea to start/interfere with service commands before do-install start service is done. Longer term we might want explicitly limit snapctl powers depending on how far the install sequence progressed to avoid creative permutations that cause problems.

The reason why the user have service disabled still: They use the snap for core and for classic, and in classic there is no concept of seeding configuration with gadget. They rely on the user manually modifying the configuration and starting the service, so it is required to have it disabled by default.

@pedronis pedronis added the Needs Samuele review Needs a review from Samuele before it can land label May 27, 2024
@ernestl ernestl requested a review from pedronis May 27, 2024 18:29
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some question/comments

overlord/hookstate/ctlcmd/services_test.go Outdated Show resolved Hide resolved
@@ -149,7 +149,7 @@ func runServiceCommand(context *hookstate.Context, inst *servicestate.Instructio
return err
}

if !context.IsEphemeral() && context.HookName() == "configure" {
if !context.IsEphemeral() && (context.HookName() == "configure" || context.HookName() == "default-configure") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do the tasks queued after default-configure exactly end up? it's not very transparent from tests

Copy link
Contributor Author

@ZeyadYasser ZeyadYasser May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added clarifying comments in the test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean after the configure hook? does this solve the original issue people were having in ways they expect?

Copy link
Contributor Author

@ZeyadYasser ZeyadYasser May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after both the configure hook and the health-check hook.

For illustration this is the expected order of tasks for installing a snap which is used in the unit tests:

"prerequisites",
"download-snap",
"validate-snap",
"mount-snap",
"copy-snap-data",
"setup-profiles",
"link-snap",
"auto-connect",
"set-auto-aliases",
"setup-aliases",
"run-hook[install]",
"run-hook[default-configure]",
"start-snap-services",
"run-hook[configure]",
"run-hook[check-health]",
"exec-command",                      <----  task for snapctl command
"service-control",                   <----  task for service operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we probably want to ask the reporter on the bugs if this is too late for them or ok?

Good point, I will check.

Copy link
Member

@farshidtz farshidtz May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZeyadYasser for the fix and your explanations.

The use case is from a snap that has a one-shot service doing several firewall and network configuration to set up a Thread bridge (linked in the bug report). As mentioned in #13960 (comment), the services are disabled by default, so that the user (on classic but also a pre-built Core image) gets a chance to configure for e.g. the networking interface before starting the services. However, the same snap needs to start automatically on Core when the config is seeded from a gadget. (A workaround was suggested by @zyga to keep services enabled by default but not run them when the needed config isn't set; this could work well for a limited config set.)

In our case, with the workflow proposed in this PR, it could work as follows:

  1. default-configure hook calls snapctl start if the autostart config option is set to true.
  2. configure hook does some validation (not done in default-configure to avoid redundant checks).
  3. services start.

But I suppose this isn't valid in every use case, as some applications expect the configure hook to execute while the services are running.

In summary, while the proposed order isn't an issue for our use case, I believe starting the services right after the hook succeeds makes this more inline with the snap best practices.

Side note: I wonder why run-hook[check-health] is ordered between configure and exec-command/service-control tasks.

Another solution I can think of is to add the ability to just enable/disable services via snapctl without starting or stopping them. Then, the default-configure hook could simply call for e.g. the snapctl enable --service <service> (snapctl enable does something else) command (without needing to queue) and the following task (start-snap-services) should just take care of the startup. I've already discussed this with @ernestl, and I'm not sure if this is going to work. I have experimented by calling snapctl stop --disable <service> from install and default-configure hooks for an enabled service, and it works as expected: the service gets disabled and never starts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the earliest we can start those services is after snap-start-services, @ZeyadYasser could you look what would be the complexity of queuing commands from default-configure there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a variant of queueCommand or some special handling code that does the following:

  • if the service commands are coming from default-configure hook
  • simply inject the commands' tasksets after start-snap-services using snapstate.InjectTasks
    • snapstate.InjectTasks takes care of the complexity of adjusting the tasks' dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is now updated to inject service commands after start-snap-services for the default-configure hook.

@ZeyadYasser ZeyadYasser requested a review from pedronis May 29, 2024 11:58
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine with me assuming it solves the original problems as is

@pedronis pedronis requested review from pedronis and removed request for pedronis May 30, 2024 09:33
@ZeyadYasser ZeyadYasser force-pushed the default-configure-hook-queue-services branch from 91078ff to 42355fc Compare June 11, 2024 09:11
@ZeyadYasser
Copy link
Contributor Author

I rebased the PR to include test changes from #14032.

bboozzoo
bboozzoo previously approved these changes Jun 11, 2024
Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are issues related to where to inject if lanes are shared like if transaction=all-snaps is used

overlord/hookstate/ctlcmd/helpers.go Show resolved Hide resolved
@pedronis pedronis dismissed stale reviews from bboozzoo and zyga June 12, 2024 15:25

this will need re-reviews

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, test question


func (s *servicectlSuite) TestQueuedCommandsOrderingConfigureHook(c *C) {
const hook = "configure"
const singleTransaction = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need the variant when this is true? or is the check above not working for that case? if we want not to test we should at least leave a comment somewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It doesn't work for configure hook. It is tricky to get snap name for all task kinds in the case of configure hook so I didn't add the single transaction fix for configure hook since it is a non-issue due how commands are queued at the very end .

I added the test (but commented out) with a comment explaining why it is left untested.

@ZeyadYasser ZeyadYasser requested a review from pedronis July 1, 2024 13:38
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

…re hook

Queue "snapctl restart ..." and "snapctl start ..." commands to be run after
default-configure similar to configure hook. This is to avoid a problem where
the service doesn't see a new value if it uses "snapctl get ...", because it's
still not commited by default-configure hook.

Fixes: https://bugs.launchpad.net/snapd/+bug/2047949

Signed-off-by: Zeyad Gouda <[email protected]>
And add comments explaining tasks relative order.

Signed-off-by: Zeyad Gouda <[email protected]>
Multiple snaps could be installed in a single transaction
where all snap tasksets are in a single lane.

The old simplistic approach of looking up the first
"start-snap-services" fails when we have multiple
tasks for multiple snaps in the same lane.

A test is added to trigger this corner case, and the fix
just checks the snap name associated with the "start-snap-services"
tasks.

Signed-off-by: Zeyad Gouda <[email protected]>
@ZeyadYasser ZeyadYasser force-pushed the default-configure-hook-queue-services branch from 56a6bae to 3bcbee9 Compare July 3, 2024 09:54
@ZeyadYasser
Copy link
Contributor Author

Rebased to fix failing arch tests

@ZeyadYasser ZeyadYasser added this to the 2.64 milestone Jul 3, 2024
Copy link
Collaborator

@ernestl ernestl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ernestl ernestl merged commit 4669c7c into canonical:master Jul 4, 2024
45 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation -auto- Label automatically added which indicates the change needs documentation Needs Samuele review Needs a review from Samuele before it can land
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants