Skip to content

Python script for moving files on a cached disk to a backing pool with mergerFS

License

Notifications You must be signed in to change notification settings

monstermuffin/mergerfs-cache-mover

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mergerfs-cache-mover

Python script for moving files on a cached disk to a backing mergerFS disk pool.

More information in this blog post: https://blog.muffn.io/posts/part-4-100tb-mini-nas/ (if that link doesn't work it's not released yet.)

How It Works

The script operates by checking the disk usage of the cache directory. If the usage is above the threshold percentage defined in the configuration file (config.yml), it will move the oldest files out to the backing storage location until the usage is below a defined target percentage. Empty directories are also cleaned up after files are moved.

The script uses a configuration file to manage settings such as paths, thresholds, and system parameters. It also checks for other instances of itself to prevent multiple concurrent operations, in the event a move process is still occurring from a previous run either because you are using slow storage, running the script too regularly, or both.

Logging

The script logs its operations, which includes information on moved files, errors, and other warnings. The logs are rotated based on the file size and backup count defined in config.yml.

Requirements

  • Python 3.6 or higher
  • PyYAML (to be installed from requirements.txt)

Setup

  1. To get started, clone the repository to your local machine using the following command:
git clone https://github.com/MonsterMuffin/mergerfs-cache-mover.git
  1. Install the required Python packages using pip:
pip install -r requirements.txt

Configuration Setup

Copy config.example.yml to config.yml and set up your config.yml with the appropriate values:

  • CACHE_PATH: The path to your cache directory. !!THIS IS YOUR CACHE DISK ROOT, NOT MERGERFS CACHE MOUNT!!
  • BACKING_PATH: The path to the backing storage where files will be moved.
  • LOG_PATH: The path for the log file generated by the script.
  • AUTO_UPDATE: Allows the script to update itself from GitHub on ever run.
  • THRESHOLD_PERCENTAGE: The usage percentage of the cache directory that triggers the file-moving process.
  • TARGET_PERCENTAGE: The target usage percentage to achieve after moving files.
  • MAX_WORKERS: The maximum number of parallel file-moving operations.
  • MAX_LOG_SIZE_MB: The maximum size for the log file before it's rotated.
  • BACKUP_COUNT: The number of backup log files to maintain.

File Moving Process

This script now uses Python's built-in file operations instead of rsync:

  • shutil.copy2(): Copies files while preserving metadata.
  • os.chmod(): Explicitly sets file permissions to match the source.
  • os.chown(): Attempts to set file ownership to match the source (may require root privileges).
  • os.remove(): Removes the source file after successful copy.

This script must be run as root (using sudo) for the following reasons:

  • File Permissions: Running as root ensures the script can read from and write to all directories, preserving original file permissions and ownership.
  • Directory Creation: Root access is required to create directories with the correct permissions in the destination path.

Usage

To run the script manually, use the following command from your terminal:

sudo python3 cache-mover.py --console-log

You can also specify --dry-run

sudo python3 cache-mover.py --dry-run --console-log

Of course, this is meant to be run automatically....

Automated Execution

Use either a Systemd timer or Crontab entry. I have been moving from crontab to systemd timers myself, but you live your life how you see fit.

Option 1: Systemd Timer

  1. Create a systemd service file /etc/systemd/system/cache_mover.service. Change /path/to/cache-mover.py to where you downloaded the script, obviously.
[Unit]
Description="Cache Mover Script."

[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /path/to/cache-mover.py
  1. Create a systemd timer file /etc/systemd/system/cache_mover.timer. The timer format is not the usual crontab format, find out more if you need help.
[Unit]
Description="Runs Cache Mover Script Daily at 3AM."

[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true

[Install]
WantedBy=timers.target
  1. Enable and start the timer:
systemctl enable cache_mover.timer
systemctl start cache_mover.timer
  1. Check timer status:
systemctl list-timers

Option 2: Crontab

  1. Open crontab file for editing:
sudo crontab -e
  1. Add line to run script. The following example will run the script daily, at 3AM. You can adjust this by using a site such as crontab.guru. Change /path/to/cache-mover.py to where you downloaded the script, obviously.
0 3 * * * /usr/bin/python3 /path/to/cache-mover.py

Auto-Update Feature

I have now included an auto-update feature. At runtime, the script checks for updates from the GitHub repository and automatically updates itself if a new version is available.

Note: The auto-update feature is only available in versions after commit b140b0c. Any version before this commit will not have this feature.

Changelog

v0.96.5

  • Fixed child process detection

v0.96

  • Fixed existing process detection

v0.95

  • Fixed accidental directory collapse in backend pool upon directory manipulation
  • Replaced rsync with Python's built-in file operations for better control and compatibility
  • Added explicit permission and ownership preservation
  • Added --dry-run option for testing without file movement
  • "Improved" empty directory removal process
  • Enhanced logging

v0.92

  • Enhanced rsync command in move_file() function:
    • Added --preallocate option to improve performance and reduce fragmentation
    • Added --hard-links option to preserve hard links during file transfers
  • Updated README to reflect new rsync options

v0.91

  • Simplified permission handling in the move_file() function
  • Updated rsync command to use --perms option for explicit permission preservation
    • Now using --mkpath to resolve issues with base path not existing on destination
  • Deprecated USER, GROUP, FILE_CHMOD, and DIR_CHMOD settings from config.yml
  • Updated README

v0.88

  • Fixed auto-update functionality
    • Resolved issues when run via cron/systemd or outside script directory
    • Added AUTO_UPDATE configuration option to enable/disable auto-updates
  • Improved script reliability
    • Added get_script_dir() function for consistent script directory detection
    • Modified get_current_commit_hash() to use the script's directory
    • Updated auto_update() function to use the script's directory for Git operations

v0.83

  • Added auto-update feature
    • The script now checks for updates from the GitHub repository
    • Automatically updates itself if a new version is available
  • Improved logging
    • Added more detailed logging for the update process
  • Code refactoring and optimization
    • Much tidier now my Python is slightly less shit, hopefully didn't break anything

v0.7

  • Initial release of the mergerfs-cache-mover script
  • Basic functionality for moving files from cache to backing storage
  • Configurable settings via config.yml
  • Logging with rotation
  • Support for both Systemd timer and Crontab scheduling

Fin.

This has been working well for me, but always take care.

About

Python script for moving files on a cached disk to a backing pool with mergerFS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages