Python script for moving files on a cached disk to a backing mergerFS disk pool.
More information in this blog post: https://blog.muffn.io/posts/part-4-100tb-mini-nas/ (if that link doesn't work it's not released yet.)
The script operates by checking the disk usage of the cache directory. If the usage is above the threshold percentage defined in the configuration file (config.yml
), it will move the oldest files out to the backing storage location until the usage is below a defined target percentage. Empty directories are also cleaned up after files are moved.
The script uses a configuration file to manage settings such as paths, thresholds, and system parameters. It also checks for other instances of itself to prevent multiple concurrent operations, in the event a move process is still occurring from a previous run either because you are using slow storage, running the script too regularly, or both.
The script logs its operations, which includes information on moved files, errors, and other warnings. The logs are rotated based on the file size and backup count defined in config.yml.
- Python 3.6 or higher
- PyYAML (to be installed from
requirements.txt
)
- To get started, clone the repository to your local machine using the following command:
git clone https://github.com/MonsterMuffin/mergerfs-cache-mover.git
- Install the required Python packages using pip:
pip install -r requirements.txt
Copy config.example.yml
to config.yml
and set up your config.yml
with the appropriate values:
CACHE_PATH
: The path to your cache directory. !!THIS IS YOUR CACHE DISK ROOT, NOT MERGERFS CACHE MOUNT!!BACKING_PATH
: The path to the backing storage where files will be moved.LOG_PATH
: The path for the log file generated by the script.AUTO_UPDATE
: Allows the script to update itself from GitHub on ever run.THRESHOLD_PERCENTAGE
: The usage percentage of the cache directory that triggers the file-moving process.TARGET_PERCENTAGE
: The target usage percentage to achieve after moving files.MAX_WORKERS
: The maximum number of parallel file-moving operations.MAX_LOG_SIZE_MB
: The maximum size for the log file before it's rotated.BACKUP_COUNT
: The number of backup log files to maintain.
This script now uses Python's built-in file operations instead of rsync:
shutil.copy2()
: Copies files while preserving metadata.os.chmod()
: Explicitly sets file permissions to match the source.os.chown()
: Attempts to set file ownership to match the source (may require root privileges).os.remove()
: Removes the source file after successful copy.
This script must be run as root (using sudo) for the following reasons:
- File Permissions: Running as root ensures the script can read from and write to all directories, preserving original file permissions and ownership.
- Directory Creation: Root access is required to create directories with the correct permissions in the destination path.
To run the script manually, use the following command from your terminal:
sudo python3 cache-mover.py --console-log
You can also specify --dry-run
sudo python3 cache-mover.py --dry-run --console-log
Of course, this is meant to be run automatically....
Use either a Systemd timer or Crontab entry. I have been moving from crontab to systemd timers myself, but you live your life how you see fit.
- Create a systemd service file
/etc/systemd/system/cache_mover.service
. Change/path/to/cache-mover.py
to where you downloaded the script, obviously.
[Unit]
Description="Cache Mover Script."
[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /path/to/cache-mover.py
- Create a systemd timer file
/etc/systemd/system/cache_mover.timer
. The timer format is not the usual crontab format, find out more if you need help.
[Unit]
Description="Runs Cache Mover Script Daily at 3AM."
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
- Enable and start the timer:
systemctl enable cache_mover.timer
systemctl start cache_mover.timer
- Check timer status:
systemctl list-timers
- Open crontab file for editing:
sudo crontab -e
- Add line to run script. The following example will run the script daily, at 3AM. You can adjust this by using a site such as crontab.guru.
Change
/path/to/cache-mover.py
to where you downloaded the script, obviously.
0 3 * * * /usr/bin/python3 /path/to/cache-mover.py
I have now included an auto-update feature. At runtime, the script checks for updates from the GitHub repository and automatically updates itself if a new version is available.
Note: The auto-update feature is only available in versions after commit b140b0c. Any version before this commit will not have this feature.
- Fixed child process detection
- Fixed existing process detection
- Fixed accidental directory collapse in backend pool upon directory manipulation
- Replaced rsync with Python's built-in file operations for better control and compatibility
- Added explicit permission and ownership preservation
- Added --dry-run option for testing without file movement
- "Improved" empty directory removal process
- Enhanced logging
- Enhanced rsync command in move_file() function:
- Added --preallocate option to improve performance and reduce fragmentation
- Added --hard-links option to preserve hard links during file transfers
- Updated README to reflect new rsync options
- Simplified permission handling in the move_file() function
- Updated rsync command to use --perms option for explicit permission preservation
- Now using --mkpath to resolve issues with base path not existing on destination
- Deprecated USER, GROUP, FILE_CHMOD, and DIR_CHMOD settings from config.yml
- Updated README
- Fixed auto-update functionality
- Resolved issues when run via cron/systemd or outside script directory
- Added AUTO_UPDATE configuration option to enable/disable auto-updates
- Improved script reliability
- Added get_script_dir() function for consistent script directory detection
- Modified get_current_commit_hash() to use the script's directory
- Updated auto_update() function to use the script's directory for Git operations
- Added auto-update feature
- The script now checks for updates from the GitHub repository
- Automatically updates itself if a new version is available
- Improved logging
- Added more detailed logging for the update process
- Code refactoring and optimization
- Much tidier now my Python is slightly less shit, hopefully didn't break anything
- Initial release of the mergerfs-cache-mover script
- Basic functionality for moving files from cache to backing storage
- Configurable settings via config.yml
- Logging with rotation
- Support for both Systemd timer and Crontab scheduling
This has been working well for me, but always take care.