Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistent ownership #150

Open
rossberg opened this issue Aug 6, 2020 · 2 comments
Open

More consistent ownership #150

rossberg opened this issue Aug 6, 2020 · 2 comments

Comments

@rossberg
Copy link
Member

rossberg commented Aug 6, 2020

Original comments by @alexcrichton in #145 (comment) :

I think in terms of ownership the API right now is pretty inconsistent, and I think that @sunfishcode's suggestion could greatly improve the consistency and understandability of the API at-a-glance. For example I think it would be more understandable if vectors were only used as either return values or "here's ownership of an allocated chunk of memory". Currently vectors are sometimes used as ownership containers and sometimes used as "here's a temporary view into my data". Additionally the ownership itself is pretty:

  • wasm_functype_new - takes ownership of its parameter/results lists
  • wasm_functype_{params,results} - returns a vector, but not an owned vector
  • wasm_trap_trace - returns an owned vector
  • wasm_module_{new,validate} - takes a vector, but does not take ownership
  • wasm_module_{imports,exports} - returns an owned vector
  • wasm_instance_exports - returns an owned vector
  • wasm_trap_new - vector argument, does not take ownership
  • wasm_trap_message - returns a vector
  • wasm_{import,export}type_new - takes ownership of its module/name arguments
  • wasm_importtype_module - returns a vector, does not yield ownership
  • wasm_{import,export}type_name - returns a vector, does not yield ownership

Given this state the only real way to figure out the ownership of an API seems to be to look at the documentation, from the signature alone it's pretty unclear where ownership is being transferred. For example it seems like a weird gotcha that wasm_trap_new doesn't take ownership of its string argument while wasm_importtype_new does.

I think it could be more consistent to say "using a vector means transfer of ownership". That way if you ever see a *_vec_t (or alias with wasm_name_t or wasm_message_t) in the API you know that an ownership transfer is happening. Otherwise if you see a pointer/length then the standard C idiom of "don't manage the memory you don't own" kicks into play and it's clear you don't need to deallocate it, nor is it deallocated for you if you pass it to an API.

And #145 (comment) :

For example why does wasm_module_imports return an owned vector but wasm_functype_params returns a vector owned by the original type? Additionally why does wasm_importtype_new take ownership of its string arguments while wasm_trap_new not take ownership? (there's also a question of why wasm_trap_* works with nul-terminated strings but that's orthogonal to ownership)

I agree that APIs sometimes need to return two values, such as wasm_importtype_module. I think this can be done like in wasm_instance_new, however, where one return value is "blessed" to be returned from the function and another is returned as an out-parameter. My point is that idiomatically and conventionally I think it would make more sense for usage of *_vec_t to imply "ownership is being transferred", while if ownership isn't being transferred a plain old pointer/length is used (either as two parameters or one return and one out-parameter).

An example of this is the libgit2 API where git_buf is used similarly to wasm_byte_vec_t but only when a whole vector is returned. All other APIs that take a list of values take a plain pointer/length pair. This also shows an alternative to returning lists of values as returning iterators of values. For example in Wasmtime to implement wasm_module_exports we don't natively store exports in as a list of wasm_extern_t values, so an iterator would be much more natural there anyway.

@rossberg
Copy link
Member Author

rossberg commented Aug 6, 2020

For example why does wasm_module_imports return an owned vector but wasm_functype_params returns a vector owned by the original type?

Generally, the driving factors for returning ownership were whether the accessed information will still be needed by the callee, and whether it will typically be accessed repeatedly.

For type structures, the assumption is that they all are transparent, traversable, (usually) tree-shaped type descriptions with a canonical parent-owns-children policy. So constructors all take ownership and accessor to such a structure are meant for efficient reading/traversal. Passing back ownership would require costly copying subgraphs for every read node, where a vector is just a particular kind of a node. That would imply a quadratic amount copying when traversing a deeper tree. Or a complicated internal ownership regime. (Mind you, we don't have much type structure yet, but more will come and the design should be future-proof.)

Something like module_imports OTOH conceptually creates a type structure. Depending on the implementation details of an engine, this structure may be pre-computed and cached or not. So it seemed appropriate to treat it as generative and return ownership. But I'd curious to hear from engines whether they'd be fine with having to keep this information alive with the module.

It is true that we could use iterators to access vectors, but they require additional state and are more complex and costly than just returning a pointer, when the information is stored in array form anyway. Another option would be to have indexed accessors -- that's simpler but probably still more costly.

It also seems easier (and possibly requires fewer allocations) for higher-level bindings if input/output is handled symmetrically, i.e., uses a uniform abstraction for representing vectors of things and use both ways -- they can just wrap this abstraction. But that is just a conjecture.

@MarkMcCaskey
Copy link
Contributor

Following up on this to say that we're experiencing a similar issue on two fronts:

  • for implementation, some of the ownership semantics are confusing
  • for direct users of the Wasm C API that we've talked to, they're often struggling to use the API correctly in regards to ownership

I'd like to add that the comment in wasm.h about how own applies to vec is either misleading or incorrect with the current state of the world https://github.com/WebAssembly/wasm-c-api/blob/master/include/wasm.h#L51

// - `own wasm_xxx_vec_t` owns the vector as well as its elements(!)

wasm_xxx_vec_ts are not passed by value anywhere in wasm.h as far as I can see, so it's possible that this statement is true but irrelevant, but if we assume that it's applying to own wasm_xxx_vec_t * then it's not correct: it seems that everywhere own wasm_xxx_vec_t * is used as a parameter the ownership of the vec data (ptr & len) are not transferred. It's only when own wasm_xxx_vec_t * is returned does the vec data's ownership transfer.


Speaking for myself, I see the Wasm C API as a very generic API designed to work on all different types of Wasm VMs, so the right trade off in my mind is to always prioritize universality over performance. Though usability is also an important consideration. For example, interacting with either imports or exports by name is a pretty complicated process in the current Wasm C API that requires building up your own data structures and organizing the data yourself when the same thing could be done more efficiently and directly by most hosts by providing a higher level function in the C API. I think this is a fine tradeoff to make, having a universal API could be really valuable. But it does lead me to believe that a C library built on top of the Wasm C API that provides a higher level interface would be the right choice for most users. Those higher level features could always be added into wasm.h too, as is already done with things like wasm_name_new_from_string.

On the other hand, being too slow could lead to more fragmentation. But most use cases I've seen personally are completely dominated by the Wasm compilation and execution time. As long as the critical paths (i.e. calling functions, loading cached modules) aren't too slow, I think it's actually not important if the rest of the C API is slow or inefficient. But it does matter if it's easy to implement and use correctly (and also easy for end-users to build things with).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants