Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting parameters on big endian systems #12

Open
oconnor663 opened this issue Nov 1, 2017 · 13 comments
Open

setting parameters on big endian systems #12

oconnor663 opened this issue Nov 1, 2017 · 13 comments

Comments

@oconnor663
Copy link

It looks like functions like blake2b_param_set_node_offset convert their argument to little-endian on the way in. But those functions aren't exposed in the blake2.h API. Is the expectation that callers (who plan on supporting big endian platforms) will convert their parameters to little endian on their own? Or maybe the plan was to expose the setter functions in blake2.h eventually?

@sneves
Copy link
Member

sneves commented Nov 1, 2017

I don't quite recall, but I do believe the plan was at some point to expose those functions to the outside, for users that would want to make their own portable parameter blocks.

But since no such users ever appeared, those functions ended up never getting exposed. Even internally, for blake2xp, we end up making the parameter block manually anyway.

@oconnor663
Copy link
Author

I wonder if that means CPython's blake2 implementation, for example, has a bug on big endian platforms? They seem to be setting the leaf_length parameter from a native int. I was about to do the same thing myself, as part of working on a Rust wrapper.

@sneves
Copy link
Member

sneves commented Nov 1, 2017

That's an excellent question. I think there's no bug, because the reference implementation explicitly loads the parameter block in little endian order. The optimized x86 implementations do not do this, but everything's assumed to be little-endian in those.

But in all honesty I did not check if there is a bug or not; I don't have big-endian hardware to verify this right now.

@oconnor663
Copy link
Author

oconnor663 commented Nov 1, 2017

I was staring at that earlier, and here was how I read it: The load64 function takes a pointer to little-endian bytes, and loads a native integer from them. However, if you're on PowerPC or whatever, and you execute the line self->param.leaf_length = (unsigned int)leaf_size as Python does, you've written big-endian bytes to the parameter block. The load64 call doesn't expect this, and the result is a flipped value for S->h[i].

I haven't owned PowerPC hardware in...some time...but I might try to fire up QEMU tonight and see if I can test this.

Edit: These look useful for testing
https://stackoverflow.com/questions/3337896/imitate-emulate-a-big-endian-behavior-in-c
https://people.debian.org/~aurel32/qemu/

@sneves
Copy link
Member

sneves commented Nov 1, 2017

Yeah, I think you're right.

In hindsight, the parameter block ought to be an opaque structure, with the functions used to set the appropriate bytes of it.

@oconnor663
Copy link
Author

Confirmed!

image

I don't know of any official test vectors that set leaf_length or node_offset, but when I run the same two hashes on my regular Intel machine, the results are swapped:

Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, hashlib
>>> sys.byteorder
'little'
>>> hashlib.blake2b(b"foo", leaf_size=1).hexdigest()
'8878fcd2202a81a2a26adeb5cc6e240528c80e2b4ec80b940d76686febe91b368b4a98ebb39b21befcc02ffaf7516e27d2894fc1c096b57c0e3dde69e01c5889'
>>> hashlib.blake2b(b"foo", leaf_size=(1<<24)).hexdigest()
'bfd0c39502d381eb4e45189f051fd67fd69fd972ad471550a879347d846c1700927d19065a08bf3acdd8fc209cbd21a9bb8619c56fc79207a1ece990b21ddf2b'

It should be simple to patch CPython, and I'll see if I can add a test for this. On the libb2 side of things, probably we should at least stick some clarifying comments inline in blake2.h? I worry that changing blake2*_init_param to assume native endianness in struct fields would break existing callers who've correctly worked around this. (If there are any?)

Can you think of any other languages or libraries that expose these fields, that we'd want to eyeball for the same bug?

@sneves
Copy link
Member

sneves commented Nov 2, 2017

I've pushed a candidate solution to a separate branch, under which existing correct code should be able to continue functioning properly, but a new API exists to explicit build correct parameter blocks.

To be able to do this, I'm using unnamed structs in unions, which appears to be something only standardized in C11, but recent GCC, Clang, and MSVC all appear to support this.

I don't know the extent of the damage. To my knowledge, most bindings/implementations are limited to the variable length and key parameters, for which endianness does not matter.

@oconnor663
Copy link
Author

Those changes look good to me, though I don't have much experience with this code. Maybe we could stick a comment on the anonymous struct like:

// These struct fields are deprecated, and kept around only for backwards
// compatibility. All new code should use the param_set helper functions
// instead. (In particular, setting the leaf_length or node_offset fields here
// gives incorrect results on big endian platforms, if the caller doesn't
// explicitly swap endianness first. See https://github.com/BLAKE2/libb2/issues/12.)

@oconnor663
Copy link
Author

Also will https://github.com/BLAKE2/BLAKE2 want to make similar changes?

@sneves
Copy link
Member

sneves commented Nov 3, 2017

Yes, the plan is to merge the two repositories at some point. Hopefully soon.

@oconnor663
Copy link
Author

oconnor663 commented Nov 3, 2017

Oh, I think on lines 178-179 of blake2.h you're using BLAKE2S_SALTBYTES and BLAKE2S_PERSONALBYTES when you mean BLAKE2B_.

oconnor663 added a commit to oconnor663/cpython that referenced this issue Nov 3, 2017
All Blake2 params have to be encoded in little-endian byte order. For
the two multi-byte integer params, leaf_length and node_offset, that
means that assigning a native-endian integer to them appears to work on
little-endian platforms, but gives the wrong result on big-endian. The
current libb2 API doesn't make that very clear, and @sneves is working
on new API functions in the GH issue above. In the meantime, we can work
around the problem by explicitly assigning little-endian values to the
parameter block.

See BLAKE2/libb2#12.
tiran pushed a commit to python/cpython that referenced this issue Nov 3, 2017
All Blake2 params have to be encoded in little-endian byte order. For
the two multi-byte integer params, leaf_length and node_offset, that
means that assigning a native-endian integer to them appears to work on
little-endian platforms, but gives the wrong result on big-endian. The
current libb2 API doesn't make that very clear, and @sneves is working
on new API functions in the GH issue above. In the meantime, we can work
around the problem by explicitly assigning little-endian values to the
parameter block.

See BLAKE2/libb2#12.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 3, 2017
…onGH-4250)

All Blake2 params have to be encoded in little-endian byte order. For
the two multi-byte integer params, leaf_length and node_offset, that
means that assigning a native-endian integer to them appears to work on
little-endian platforms, but gives the wrong result on big-endian. The
current libb2 API doesn't make that very clear, and @sneves is working
on new API functions in the GH issue above. In the meantime, we can work
around the problem by explicitly assigning little-endian values to the
parameter block.

See BLAKE2/libb2#12.
(cherry picked from commit dcfb0e3)
tiran pushed a commit to python/cpython that referenced this issue Nov 3, 2017
) (#4262)

All Blake2 params have to be encoded in little-endian byte order. For
the two multi-byte integer params, leaf_length and node_offset, that
means that assigning a native-endian integer to them appears to work on
little-endian platforms, but gives the wrong result on big-endian. The
current libb2 API doesn't make that very clear, and @sneves is working
on new API functions in the GH issue above. In the meantime, we can work
around the problem by explicitly assigning little-endian values to the
parameter block.

See BLAKE2/libb2#12.
(cherry picked from commit dcfb0e3)
oconnor663 added a commit to oconnor663/blake2_c.rs that referenced this issue Nov 4, 2017
oconnor663 added a commit to oconnor663/bao that referenced this issue Nov 9, 2017
It almost certainly doesn't really matter which one we pick. I
originally went with big endian because that's the vaguely standardized
"network byte order". I'm switching to little endian because the BLAKE2
standard made the same switch relative to BLAKE
(https://blake2.net/blake2.pdf), and that seems like a very relevant
precedent to follow. Little endian is also what DJB chose for his NaCl
library.

The main downside of little endian in my mind, is that because it's also
the most common native endianness in practice, implementations are more
likely to write code that "happens to work" on their machine and fails
on big endian architectures. In fact this totally happened in the
official implementation of BLAKE2 (BLAKE2/libb2#12).
However, a contributing factor in that case was the bug only affected a
couple of obscure options, which might actually never have been
exercised on any big endian machines, ever. This isn't the case for bao;
if you flip the endianness of your header length, then almost all of
your hashes will be wrong.
embray pushed a commit to embray/cpython that referenced this issue Nov 9, 2017
…on#4250)

All Blake2 params have to be encoded in little-endian byte order. For
the two multi-byte integer params, leaf_length and node_offset, that
means that assigning a native-endian integer to them appears to work on
little-endian platforms, but gives the wrong result on big-endian. The
current libb2 API doesn't make that very clear, and @sneves is working
on new API functions in the GH issue above. In the meantime, we can work
around the problem by explicitly assigning little-endian values to the
parameter block.

See BLAKE2/libb2#12.
@oconnor663
Copy link
Author

oconnor663 commented Nov 10, 2017

Another design comment about https://github.com/BLAKE2/libb2/tree/fix-bigendian: I think blake2*_param_init should set the official defaults, rather than just zeroing the whole parameter block. That is, I think it should set fanout and depth to 1, and digest_length to OUTBYTES. Almost all callers are going to want to do that, but it's also something they're likely to forget, at least until they debug why their hashes aren't coming out right. That would also remove some duplicated code from blake2*_init and blake2*_init_key.

@sneves
Copy link
Member

sneves commented Nov 12, 2017

Excellent idea, I'll incorporate this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants