Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hole punching in tests #125

Open
bigerl opened this issue Nov 15, 2023 · 5 comments
Open

Hole punching in tests #125

bigerl opened this issue Nov 15, 2023 · 5 comments

Comments

@bigerl
Copy link

bigerl commented Nov 15, 2023

Going through the tests I didn't find any tests for copying files with holes.
The only thing I found for for file copying was this:

static inline void TestCloneOrCopyFileWhole()

Is the code for copying with holes being tested elsewhere? Or is there a reason why it doesn't need to be tested?

@ned14
Copy link
Owner

ned14 commented Nov 15, 2023

The test you identified is for testing that function.

Contribution of better tests would be welcome.

@bigerl
Copy link
Author

bigerl commented Nov 15, 2023

Many thanks for the quick reply. In our team we do not have capacity to work on that ATM. We still need to evaluate if using llfio makes sense for our use case. If so, we would probably make a PR with extended testing of file copying.

@ned14
Copy link
Owner

ned14 commented Nov 15, 2023

I can tell you that function copies Terabytes of data every day in 99.999% uptime cloud systems.

It doesn't mean you wouldn't find a bug when you use it, but for how that system uses that function, it has been extremely reliable.

@bigerl
Copy link
Author

bigerl commented Nov 17, 2023

That is good to know and I read about it in the readme. I really like the library and that it safes you from a lot of error prone fiddling with OS dependent sys calls for file handling.
It's just because we make heavy use of hole punching so we want to make sure that the implementation we use is correct with regards to copying files with holes and also covers all corner cases that could happen.

@ned14
Copy link
Owner

ned14 commented Nov 17, 2023

To my best knowledge the hole punching extents cloning implementation is correct. Implementation of something correct in all corner cases whilst retaining performance and is semantically equivalent across all platforms is surprisingly hard. In fact, in all my career, I'd rank it in the top five hardest things I've ever implemented. It really is quite challenging, and it took me some months of outside-of-work time to get it debugged.

You should be aware that holes don't clone exactly, and there is absolutely nothing anybody can do about it. The code will enumerate the holes in the source, and exactly replicate that in the destination. However afterwards the holes won't match!

I am absolutely certain it isn't my code, so therefore it must be some internal property of the filing system. For large holes, a few Kb one side or another makes no difference, however I had a test file which consisted of a 4Kb allocated region with about 100 Kb of randomly varying hole between them. Most of the holes copied over exactly, some however became bigger, others become smaller, and in a few cases they disappeared.

The test program checked that the contents of the files were identical, and they were. I had the code write out exactly which extents were copied to a log. I examined and compared, and what my code tells the kernel is simply not always honoured.

What my code doesn't do but other hole copying file copying tools do is scan either side of an allocated extent for non zero bytes to "trim" the allocated region down to minimum. My code doesn't do this to be fast, it adds up to a good chunk of performance on fast SSDs.

So consider this a "reasonable best effort" hole cloning implementation. You can if you want always poke in zero byte region trimming yourself if you needed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants