Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear away psbFromBytes #291

Open
kozross opened this issue Jul 20, 2022 · 6 comments
Open

Clear away psbFromBytes #291

kozross opened this issue Jul 20, 2022 · 6 comments

Comments

@kozross
Copy link
Contributor

kozross commented Jul 20, 2022

As observed here, this function has some severe issues with referential transparency, and is thus generally not safe. While this PR deprecates both psbFromBytes (and psbZero) using a DEPRECATED pragma, as well as removing its uses internal to cardano-base itself, this function is still used internally to cardano-crypto-class:

  • psbFromByteString
  • IsString (PinnedSizedBytes n)

While these should both be modified to not use psbFromBytes, the IsString instance is rather horrifying: much like the same instance for ByteString, is hugely problematic for mostly the same reasons. I would propose removing this IsString instance wholesale, and replacing it with a quasi-quoter.

@lehins
Copy link
Collaborator

lehins commented Jul 20, 2022

IsString (PinnedSizedBytes n)

Holy 💩

I would propose removing this IsString instance wholesale, and replacing it with a quasi-quoter.

This is a very good suggestion!

@kozross
Copy link
Contributor Author

kozross commented Jul 21, 2022

I have something of a personal usability stake in this, as a task I will have soon will be adding golden tests to ensure that the SECP256k1 serializations are behaving like we expect on a fixed set of examples. Having proper literal syntax (which checks my mistakes and doesn't do things behind my back) is definitely something I would want!

Furthermore, there's also a huge problem with the IsString instance: it produces _un_pinned data. Not only is this super counter-intuitive (the type is named _Pinned_ByteString after all), it's also dangerous, as it means I can't pass such literals to the FFI and expect them to behave themselves.

@tdammers
Copy link
Contributor

tdammers commented Aug 2, 2022

There are more functions in the PinnedSizedBytes module that have referential transparency issues:

  • psbFromByteString (implemented in terms of psbFromBytes)
  • psbFromByteStringCheck (implemented separately, for some reason, but also has the potential to create surprising sharing/non-sharing behavior the same way psbFromBytes does)
  • pinnedByteArrayFromListN (though this one is not exposed, only used inside the module)

In a nutshell, if we want PinnedSizedBytes to be referentially safe in the face of mutations, then all construction and copying must happen in IO (or ST). psbCreate and friends already observe this restriction, so it's actually a bit strange that the offending ones don't.

So IMO the sane solution would be to:

  • Change the offending functions to return IO (PinnedSizedBytes n) instead of PinnedSizedBytes, and eliminate unsafe{Dupable}PerformIO from their implementations
  • Remove the IsString instance, and provide an alternative (psbFromString) implemented in IO, in terms of one of the fixed psbFrom... functions
  • Hunt down all usages of these functions and change the call sites to deal with the IO.

If you then still feel the need for a quasiquoter, it can be implemented fairly easily in terms of the new psbFromString function. It would largely be a cosmetic thing, except that you could add compile-time length checks (which isn't possible with plain string literals).

@kozross
Copy link
Contributor Author

kozross commented Aug 2, 2022

@tdammers - I mention psbFromByteString as a problem in the initial post, so 100% agreed.

psbFromByteStringCheck is actually implemented by making a copy, which should avoid the sharing behaviour. Is there something I'm missing here? The same applies to pinnedByteArrayFromListN as far as I can tell.

I'm not sure changing all of these to IO is necessarily what we want: we can still be safe as the API currently stands as long as we copy each time. psbFromByteString isn't even hard to fix in this regard, and the others, as far as I can tell, are not an issue unless I'm missing something. I'm strongly in favour of a quasiquoter, because checks on literal form (including length) are very desirable for constants in my opinion.

@lehins - thoughts?

@lehins
Copy link
Collaborator

lehins commented Aug 7, 2022

I do agree with @kozross , there are no problems with psbFromByteStringCheck.

I do want to point out that the description of this ticket is a little misleading psbFromBytes, psbFromByteString and pinnedByteArrayFromListN have no problems with referential transparency. It is only when the produced frozen buffer is mutated, that is when the referential transparency is violated, which is exactly the expected behavior of frozen arrays. The problem comes from the fact that the produced PinnedSizedBytes were used incorrectly, i.e. unsafeThaw (or in our cases simply casted to as Ptr and mutated) is what the actual cause of the problem. As @kozross well pointed out, this is the correct solution:

we can still be safe as the API currently stands as long as we copy each time.

In fact the whole PinnedSizedBytes API is unsafe because it does not make the distinction between mutable and frozen state and allows for the buffer to be treated arbitrarily and it is left up to the user to be very careful about what is safe and unsafe. The only defense for such API I have is that all operations have to be done on Ptrs due to FFI, while preserving purity and referential transparency.

So, @tdammers switching all to IO will not help, in fact it will make things a lot worse, because all of the functions in the cardano-crypto-class library are used from pure non-IO code, which would lead for those unsafePerformIOs to travel up into use sites, thus making the codebase a lot more unstable. As far as I can tell current uses of unsafe{Dupable}PerformIO are safe, so we should leave them alone, but we could do a bit of work on making the API safer and I do have some ideas on that front.

@tdammers
Copy link
Contributor

Right, yes. I think what it boils down to is that we need to decide whether we want to present PinnedSizedBytes with value semantics (immutable) or with mutable-reference semantics (immutable reference to mutable value).

ByteArray already provides the reference semantics, so the right thing to do here is to maintain the pure API, but make it so that in-place mutations operate on copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants