Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object_id computed incorrectly when duration set to single value #384

Open
adelavega opened this issue Mar 4, 2020 · 2 comments
Open

object_id computed incorrectly when duration set to single value #384

adelavega opened this issue Mar 4, 2020 · 2 comments
Labels

Comments

@adelavega
Copy link
Member

ExtractorResult.duration can be set to a single value (e.g. 0.1) when this value represents durations for all data points. This should be broadcast to all values, and is when creating df.

However, when computing object_id this is not done, and it causes the index that is sued to groupby the data to be a single value, followed by all NAs.

https://github.com/tyarkoni/pliers/blob/7e92ad50b3996a0df1f3dfcc91c1f006597b5728/pliers/extractors/base.py#L154

This results in object_ids like so:

0, 0, 1, 2, 3, 4, 5, 6

When it should be all 0s.

Broadcasting prior to making the pd.Series would fix this issue.

@adelavega adelavega added the bug label Mar 4, 2020
@adelavega
Copy link
Member Author

@rbroc do you remember if we fixed this? I feel like we did but I'm not seeing a PR for it

@rbroc
Copy link
Collaborator

rbroc commented Mar 19, 2020

@rbroc do you remember if we fixed this? I feel like we did but I'm not seeing a PR for it

We didn't. I fixed it within the AudiosetLabelExtractor (which is where we observed the anomaly) by forcing durations to be a vector of length len(onsets).

I guess the question is whether we want to add explicit broadcasting at https://github.com/tyarkoni/pliers/blob/7e92ad50b3996a0df1f3dfcc91c1f006597b5728/pliers/extractors/base.py#L154

I feel like that might be a bit more convoluted than just sticking to the rule that ExtractorResults should have onset and duration as lists of the same length (with one value per row of the output). So far, the overwhelming majority of extractors do that already.

Rather than changing the way object_id is created, we might just want to add a line checking len(onsets) == len(durations) and triggering an Error if this is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants