Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to collect into savvy's types #125

Closed
daniellga opened this issue Mar 27, 2024 · 17 comments
Closed

Allow to collect into savvy's types #125

daniellga opened this issue Mar 27, 2024 · 17 comments

Comments

@daniellga
Copy link
Contributor

daniellga commented Mar 27, 2024

I was reading your guide about creating R objects and I think the most efficient way, avoiding bound checks every time that set_elt is called, would probably be allowing to collect into those types. Would it be possible?

Another possibility would be to create an unsafe set_elt that lets the user control the bound checks... But I really like more the first idea.

@daniellga daniellga changed the title Allow to collect into savyy's types Allow to collect into savvy's types Mar 27, 2024
@yutannihilation
Copy link
Owner

Thanks, good point. I think you can use as_slice_mut() as well, but I'll implement FromIterator.

@daniellga
Copy link
Contributor Author

I am not sure I follow... How would you use as_slice_mut() here?

#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
    out.try_into()
}

@yutannihilation
Copy link
Owner

I meant something like this. (Sorry, it was as_mut_slice(), not as_slice_mut()).

#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = unsafe { OwnedIntegerSexp::new_without_init(x.len())? };
    let out_s = out.as_mut_slice();
    for (i, v) in x.iter().enumerate() {
        out_s[i] = v * 2;
    }
    out.into()
}

@daniellga
Copy link
Contributor Author

oh I didn't know it was possible. Thanks.

@yutannihilation
Copy link
Owner

Current progress: I found implementing FromIterator<T> for Owned***Sexp is not possible. As the new() returns Result I want to make the signature to FromIterator<T> for savvy::Result<Owned***Sexp>, but it's not only inconvenient (.collect::<Result<Owned***Sexp>>() instead of .collect::<Owned***Sexp>()), but also it simply won't compile.

error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
   --> src\sexp\integer.rs:271:1
    |
271 | impl FromIterator<i32> for crate::error::Result<OwnedIntegerSexp> {
    | ^^^^^-----------------^^^^^--------------------------------------
    | |    |                     |
    | |    |                     `std::result::Result` is not defined in the current crate
    | |    `i32` is not defined in the current crate
    | impl doesn't use only types from inside the current crate
    |
    = note: define and implement a trait or new type instead

Ignoring an error can be an option, but I feel it's not good to encourage it to the users.

I'll implement from_iter() method individually, not via trait.

@yutannihilation
Copy link
Owner

yutannihilation commented Mar 30, 2024

I'm giving up this for now, at least until I find the right interface for this.

Thinking the comment above again, I found from_iter() without collect() is not very convenient and confusing.

Next, I tried to implement TryFrom, but it requires Sized so I cannot use dyn IntoIterator.

psuedo implementation
impl TryFrom<dyn IntoIterator<Item = i32>> for OwnedIntegerSexp {
    type Error = crate::error::Error;

    fn try_from(value: dyn IntoIterator<Item = i32>) -> crate::error::Result<Self> {
        let iter = iter.into_iter();
        match iter.size_hint() {
            (_, Some(upper)) => {
                let mut out = unsafe { OwnedIntegerSexp::new_without_init(upper)? };
                let mut actual_len = 0;
                for (i, v) in iter.enumerate() {
                    out[i] = v;
                    actual_len = i;
                }

                if actual_len != upper {
                    unsafe {
                        savvy_ffi::SETLENGTH(out.inner, actual_len as _);
                    }
                }

                Ok(out)
            }
            (_, None) => {
                let v: Vec<i32> = iter.collect();
                v.try_into()
            }
        }
    }
}

Btw, one correction to my example code above. I forgot I implemented it, but OwnedIntegerSexp (real and complex as well) has IndexMut so it can be a bit more simplified to:

#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = unsafe { OwnedIntegerSexp::new_without_init(x.len())? };
    for (i, v) in x.iter().enumerate() {
        out[i] = v * 2;
    }
    out.into()
}

@daniellga
Copy link
Contributor Author

daniellga commented Mar 30, 2024

Thanks. I managed to use your solution like this, avoiding the bound check from [i]:

#[savvy]
fn times_two(x: savvy::IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = unsafe { savvy::OwnedIntegerSexp::new_without_init(x.len())? };
    for (x_, out_) in x.iter().zip(out.as_mut_slice().iter_mut()) {
        *out_ = *x_ * 2;
    }
    out.into()
}

StringSexp is the only one I couldn't manage to do the same until now, I will let you know if I find a solution that doesn't involve creating an unsafe set_elt.

@daniellga
Copy link
Contributor Author

I was taking a look at arrow2 crate and it seems that for some reason the author also chose to create methods instead of implementing FromIterator:
https://docs.rs/arrow2/latest/arrow2/array/struct.PrimitiveArray.html?search=mutableprimiti#method.from_trusted_len_iter

@yutannihilation
Copy link
Owner

StringSexp is the only one

Thanks for the attempt. StringSexp and LogicalSexp doesn't implement Index or IndexMut because the internal representation is different form the interface values. It's discussed here, in case you don't read yet.

https://yutannihilation.github.io/savvy/guide/08_atomic_types.html

I was taking a look at arrow2 crate and it seems that for some reason the author also chose to create methods instead of implementing FromIterator:

Oh, looking at how arrow does is a nice idea! Good to know, thanks. But, it seems they implement FromIterator in actual. (Just in case you don't notice yet, arrow2 crate is archived. arrow crate is the latest one.)

https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.from_iter
https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#example-from-an-iterator-of-values

@daniellga
Copy link
Contributor Author

daniellga commented Mar 30, 2024

I did notice that, but I liked how the author made its API. I liked it way more than arrow. I think arrow2 is being continued inside polars, while the original was archived.

@yutannihilation
Copy link
Owner

If you are curious, you can read the context related to polars and arrow2 here.

jorgecarleitao/arrow2#1429 (comment)

@yutannihilation
Copy link
Owner

Okay, I think try_from_iter() is a nice name for this.

https://docs.rs/arrow/latest/arrow/array/struct.FixedSizeBinaryArray.html#method.try_from_iter

@daniellga
Copy link
Contributor Author

daniellga commented Mar 30, 2024

my initial idea was something like from_trusted_len_iter, since savvy's constructors are all based on a fixed length. But TIL that TrustedLen is nightly only. When I have some time I am gonna check how arrow does it anyway or if they just keep the function unsafe and let the user decide if the input iterator implements TrustedLen.

@yutannihilation
Copy link
Owner

In my understanding, from_trusted_len_iter() is just a performat version for the case when the length is known. You probably want to convert iterators whose length is unknown (e.g. (0..10).filter(|x| x % 2 == 0)).

@daniellga
Copy link
Contributor Author

daniellga commented Mar 31, 2024

Actually if I just have a Vec and want to collect from that Vec into for example OwnedIntegerSexp (a simple vec.iter().collect()) that would be the case I am looking for.

I agree with you though that most users will look for those unknown length cases.

Probably your future try_from_slice will suit better for my case.

@yutannihilation
Copy link
Owner

Oh, the case in your mind has an already-allocated Vec? Then, that's a problem. In that case, what you should choose is try_into() (which will probably be available as try_from_slice()) because this is more performant when the underlying representations match between Rust and R (i.e. integer, real, and complex).

savvy/src/sexp/integer.rs

Lines 186 to 194 in bee3c73

impl TryFrom<&[i32]> for OwnedIntegerSexp {
type Error = crate::error::Error;
fn try_from(value: &[i32]) -> crate::error::Result<Self> {
let mut out = unsafe { Self::new_without_init(value.len())? };
out.as_mut_slice().copy_from_slice(value);
Ok(out)
}
}

@yutannihilation
Copy link
Owner

I think half of the problems are solved. Let's close this for now and revisit when TrustedLen gets stabilized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants