Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add survey criteria #970

Merged
merged 1 commit into from
Sep 12, 2024
Merged

Add survey criteria #970

merged 1 commit into from
Sep 12, 2024

Conversation

tjennison-work
Copy link
Collaborator

@tjennison-work tjennison-work commented Aug 28, 2024

  • Loads all survey data for real time filtering.
  • Improves question/answer name handling, though indexing changes will
    be needed to handle global search correctly.
  • Multi-select plus instance level data still to be dealt with.
  • Config changes will come separately since they'll break stored data.
  • Copies some filter generation code from entity group criteria but this
    allows the config and selection data to be distinct from entity group
    criteria. There is already some divergence and this allows them to
    evolve separately. This wasn't originally planned but handling
    backwards compatability while switching to a new
    plugin/config/selection data seemed pretty sketchy.
  • Fixes a bug where entity group criteria with modifiers but no
    selection weren't handled correctly.
  • Testing is weird because we don't currently have survey data. The
    basics are the same as entity group criteria so using the same data
    for now but TBD how we want to handle this.

Comment on lines 89 to 90
// "survey" plugins are designed to handle medium sized amount of survey data in
// an optimized fashion.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Any clarification possible on "medium sized" -- maybe < 1GB or <100k rows (from fetchAll limit below)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 114 to 116
// TODO(tjennison): There's no way to get the question information for
// answers added via global search. We'll need to get the question
// information at index time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would creating a new attribute e.g. label that is the concatenation of [question] : [answer] address this for global search? Or maybe that's what you mean by "at index time".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the newly added isLeaf filter to not return answers per our discussion.

repeated EntityGroupConfig entity_groups = 2;

// Optional configuration of a categorical or numeric value associated with
// the selection (e.g. a measurement value). Applied to the entire selection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "e.g. a survey numeric answer value"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 26 to 27
// Entity groups where the related entity is what is selected (e.g. condition
// when filtering condition_occurrences).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "e.g. surveyBasics when filtering surveyOccurrence"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 15 to 16
// The key of the selected value, which references a related entity (e.g.
// condition for a condition_occurrence).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "e.g. surveyBasics for a surveOccurrence"?

Comment on lines 36 to 37
// Data for an additional categorical or numeric value associated with the
// selection (e.g. a measurement value).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "e.g. a survey numeric answer value"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SuppressFBWarnings(
value = "NP_UNWRITTEN_PUBLIC_OR_PROTECTED_FIELD",
justification = "The config and data objects are deserialized by Jackson.")
public class SurveyFilterBuilder extends FilterBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chunks of this class are pretty similar to the existing EntityGroupFilterBuilder. I think it would be better to move those chunks into a shared base class or a utility class. Perhaps using a generic type would be the least amount of up-front work (e.g. <T> = <DTEntityGroupEntityGroup> or <DTSurvey.Survey>).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this up into a shared base class. The generic type didn't seem like the way to go because of some of the names not matching the complexity of the interface required to handle the builders.

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

public class SurveyFilterBuilderTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests should use actual survey entities instead of the same ones the entity group filter builder tests use. The filter builder tests are just testing that the right filter object is created, I don't think they're running any queries (you can confirm by running them in the noCloudAccessRequiredTests mode). So you could use the AoU config files here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are removed now that the code is mostly shared. We can start adding survey specific tests when the functionality starts to diverge.


selectedIdsPerEntityGroup.forEach(
(entityGroup, selectedIds) -> {
Collections.sort(selectedIds);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you add a comment here that the ids will always be in sorted order in the generated SQL, not in the order of the defined criteria?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (!selectedIdsPerEntityGroup.isEmpty()) {
selectedEntityGroups = new ArrayList<>(selectedIdsPerEntityGroup.keySet());
} else {
selectedEntityGroups =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if we add a criteria with no selected ids (e.g. condition multi-select and check no boxes), we'll get a different filter now than we would have before. Previously, we'd have returned a null filter, so the effect would be to show counts for the entire dataset. Now, we'll return a filter for any condition occurrence data, so the effect may be to show counts for a subset. Is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually not the case because buildForCohort and buildForDataFeature early exit for criteria with no selection or modifiers. In the modifiers only case this is a change as expected.

That being said, I think changing the behavior is actually correct. Adding an empty condition filter should filter on all people who have a condition rather than doing nothing. I'll probably follow up to remove the early exit but I didn't want to put anything else into this PR if I didn't need to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I forgot about the early exit conditional at the top. Agreed that changing the behavior would make more sense.

return selectedIdsPerEntityGroup;
}

private List<EntityGroup> selectedEntityGroups(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you add a comment here that this returns the list of entity groups that the user selected ids from, or the list of configured entity groups if the user selected no ids? I know that now from our discussions and reviewing this code, but seems like something I'm going to forget.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@marikomedlock
Copy link
Collaborator

RE survey tests: Now that we have the AoU CT test data indexed in verily-tanagra-dev, we could run tests against that, as long as the VUMC folks are okay with that. I imagine they will be, since we got the okay to run tests against the 2019 test data -- but that was several years ago now, so probably good to double check.

* Loads all survey data for real time filtering.
* Improves question/answer name handling, though indexing changes will
  be needed to handle global search correctly.
* Multi-select plus instance level data still to be dealt with.
* Config changes will come separately since they'll break stored data.
* Filter generation shares code with existing entity group criteria.
* Copies some filter generation code from entity group criteria but this
  allows the config and selection data to be distinct from entity group
  criteria.  There is already some divergence and this allows them to
  evolve separately. This wasn't originally planned but handling
  backwards compatability while switching to a new
  plugin/config/selection data seemed pretty sketchy.
* Fixes a bug where entity group criteria with modifiers but no
  selection weren't handled correctly.
* Testing is weird because we don't currently have survey data. The
  basics are the same as entity group criteria so using the same data
  for now but TBD how we want to handle this.
@tjennison-work
Copy link
Collaborator Author

RE survey tests: Now that we have the AoU CT test data indexed in verily-tanagra-dev, we could run tests against that, as long as the VUMC folks are okay with that. I imagine they will be, since we got the okay to run tests against the 2019 test data -- but that was several years ago now, so probably good to double check.

Sounds good. I'll check with them on Wednesday.

@tjennison-work tjennison-work merged commit 1c58d1e into main Sep 12, 2024
8 checks passed
@tjennison-work tjennison-work deleted the tj-newsurvey branch September 12, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants