Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub will report success (exit 0) when the scrub is cancelled #128

Open
LordMike opened this issue Oct 4, 2024 · 3 comments · May be fixed by #129
Open

Scrub will report success (exit 0) when the scrub is cancelled #128

LordMike opened this issue Oct 4, 2024 · 3 comments · May be fixed by #129

Comments

@LordMike
Copy link

LordMike commented Oct 4, 2024

I've edited the scrub task to report to a healthcheck site when it starts and stops. Just now, I stopped the scrub using btrfs scrub cancel <mount> and noticed the health site reported the task as having successfully run. This must mean an exit code of 0 is returned.

I report like this:

[Service]
# type=simple doesn't seem to fill out $EXIT_STATUS, so we use 'show' instead.
ExecStartPre=-/bin/bash -c "curl -fsS -m 10 --retry 5 -o /dev/null https://hc-ping.com/MY-ID/start"
ExecStopPost=-/bin/bash -c 'curl -fsS -m 10 --retry 5 -o /dev/null https://hc-ping.com/MY-ID/$(systemctl show -p ExecMainStatus btrfs-scrub.service | cut -d= -f2)'

This seems rather unfortunate, so I wanted to report it so that others may know, or it could be fixed. I believe the scrub tasks should actually have continued (run on all filesystems in the loop as it does) but then finally report a non-zero exit if any of the filesystems had an issue.

@LordMike
Copy link
Author

LordMike commented Oct 4, 2024

I checked out the scrub script, and it surely says:

	if [ "$?" != "0" ]; then
		echo "Scrub cancelled at $MNT"
		exit 1
	fi

I do see Scrub cancelled .. in my journald log, so it should have exited with 1. But if I make my own service like this:

[Service]
Type=simple
ExecStart=bash -c 'sleep 10; exit 1'

It will correctly report 1 for exit code when the service stops:

# systemctl status test
× test.service
     Loaded: loaded (/etc/systemd/system/test.service; static)
     Active: failed (Result: exit-code) since Fri 2024-10-04 14:26:07 CEST; 5min ago
   Duration: 10.008s
   Main PID: 1788283 (code=exited, status=1/FAILURE)
        CPU: 4ms

Oct 04 14:25:57 victoria.home systemd[1]: Started test.service.
Oct 04 14:26:07 victoria.home systemd[1]: test.service: Main process exited, code=exited, status=1/FAILURE
Oct 04 14:26:07 victoria.home systemd[1]: test.service: Failed with result 'exit-code'.

# systemctl show -p ExecMainStatus test.service
ExecMainStatus=1

The btrfs-scrub status:

# systemctl status btrfs-scrub
○ btrfs-scrub.service - Scrub btrfs filesystem, verify block checksums
     Loaded: loaded (/usr/lib/systemd/system/btrfs-scrub.service; static)
    Drop-In: /etc/systemd/system/btrfs-scrub.service.d
             └─override.conf
     Active: inactive (dead) since Fri 2024-10-04 13:52:25 CEST; 39min ago
   Duration: 2min 15.493s
TriggeredBy: ● btrfs-scrub.timer
       Docs: man:fstrim
   Main PID: 1753792 (code=exited, status=0/SUCCESS)
        CPU: 1.712s

Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: Status:           aborted
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: Duration:         0:02:15
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: Total to scrub:   4.60GiB
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: Rate:             34.88MiB/s
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: Error summary:    no errors found
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: flock: getting lock took 0.000003 seconds
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: flock: executing btrfs
Oct 04 13:52:24 victoria.home btrfs-scrub.sh[1753799]: Scrub cancelled at /mnt/bcached
Oct 04 13:52:25 victoria.home systemd[1]: btrfs-scrub.service: Deactivated successfully.
Oct 04 13:52:25 victoria.home systemd[1]: btrfs-scrub.service: Consumed 1.712s CPU time, 9.5M memory peak, 1.0M memory swap peak.

Note how the main PID reports (code=exited, status=0/SUCCESS), even though the log writes out Scrub cancelled at /mnt/bcached

@LordMike
Copy link
Author

LordMike commented Oct 4, 2024

Oh!.. The entire script is wrapped in a { .. } code block for logging purposes..

I've altered the exit 0 after all to be:

EXIT_STATUS=${PIPESTATUS[0]}
exit $EXIT_STATUS

And this works. When I cancel my scrub now, I get an exit code 1 in systemd.

LordMike added a commit to LordMike/btrfsmaintenance that referenced this issue Oct 4, 2024
@LordMike
Copy link
Author

LordMike commented Oct 4, 2024

I've prepared a PR that implements this for the four scripts I found that use this piping-logging method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant