Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch295/crud programs #87

Merged
merged 33 commits into from
Jul 9, 2021
Merged

Ch295/crud programs #87

merged 33 commits into from
Jul 9, 2021

Conversation

nmanduj1
Copy link

@nmanduj1 nmanduj1 commented Jul 7, 2021

No description provided.

@shortcut-integration
Copy link

This pull request has been linked to Clubhouse Story #295: [API] Implement Program resolvers.

@nmanduj1 nmanduj1 requested a review from jnu July 7, 2021 22:12
Copy link
Contributor

@jnu jnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great, nice work!! i made some suggestions, in particular about:

  1. shifting to use @classmethods for querying instance of instance methods to avoid unnecessary querying (eliminating n+1 queries and avoiding unnecessary SELECTs).
  2. permissions on program updates -- i think we can just let admins handle this
  3. a mutation for deeply updating programs + related objects, to simplify the primary request logic from the frontend.

i don't think (3) should hold up this PR -- we can add a separate mutation that does and keep the existing logic. i think it'd be good to prioritize merging this so i can resolve conflicts in my PR and start using it.

so exciting to see this coming together!!! great work!!! <3

api/mutations.py Outdated
session.query(CategoryValue).filter(CategoryValue.id == id).update({'deleted':func.now()}, synchronize_session='fetch')
session.query(Entry).filter(Entry.category_value_id == id).update({'deleted':func.now()}, synchronize_session='fetch')
session.query(Target).filter(Target.category_value_id == id).update({'deleted':func.now()}, synchronize_session='fetch')
CategoryValue.get_not_deleted(session, id).soft_delete(session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned this elsewhere, but using a classmethod soft_delete written in terms of an ID you can avoid extra trips to the database (one to fetch the object, one to issue the update).

otherwise it'd be good to add a if not None condition because get_not_deleted might not return anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is that the soft_delete instance method is not only responsible for deleting itself but also cascading the soft delete to its direct children. I will add the conditional as you mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, but given the foreign-key relationship it shouldn't be a problem to do in a classmethod (e.g. with the soft_delete_by_category_id etc comments elsewhere).

tags = input.pop('tag_ids', [])
program = session.query(Program).get(input['id'])
if len(datasets) > 0:
program.datasets = [session.merge(Dataset(id=uuid.UUID(dataset_id))) for dataset_id in datasets]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of allowing datasets, targets, and other entities to be created on-the-fly when a program is created or updated? in the current UI design there are only pages for managing the User, Team, and Program entities -- everything else is subordinate to those entities.

A drawback of managing all these entities in separate mutations is that we lose atomicity -- the frontend would have to issue multiple requests and handle retry logic independently for each. for example:

scenario: updating a program and adding a new target, new tag, and new dataset:

FE requests:
1) Mutation to create Target and return ID
2) Mutation to create Tag and return ID
3) Mutation to create Dataset and return ID
4) Query to get existing state of program
5) Mutation to update Program and pass in existing IDs combined with the three new IDs.

if instead we updated the programs update mutation to accept objects for these entities we could handle the merge/create implicitly and simplify the error handling:

mutation {
  updateProgram(updates: {
    tags: [{id: "some-existing-tag"}, {name: "NewTagIWantToCreate"}],
    targets: [{id: "some-existing-target"}, {categoryValue: {...}, target: 0.2}],
   // and so on
  }) {
    id
  }
}

then if any part of that query fails the whole thing will be rolled back and no changes at all will be applied. Then the UI's job becomes much easier.

I think this is also fairly simple to extend what you currently have -- instead of assuming datasets is just a list of IDs, just use it as an arbitrary list of fields to update:

  program.datasets = [session.merge(Dataset(**dataset)) for dataset in datasets]

if an id is passed merge should detect that the thing exists, otherwise it'll create it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that pattern is valid. I actually used the nesting pattern for some of our other resolvers. I chose to not implement that pattern here because it felt like we would end up with unnecessarily bloated resolvers. It feels more natural that the resolver should be responsible for the resource it describes. And there’s no reason why the frontend should not be able to make multiple network calls to create the resources it needs on the fly.

That said, I am not passionate about keeping thin resolvers and can implement the nesting pattern for Programs if you and @seerocode think that’s best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there’s no reason why the frontend should not be able to make multiple network calls to create the resources it needs on the fly.

a full realistic example of creating a new program is:

new program:

{
  name: "foo",
  team: "abc123-whatever-uuid",
  targets: [
    // first category, e.g. gender
    {categoryValue: {id: "cv01", category: {id: "c1"}}, target: 0.2},
    {categoryValue: {id: "cv02", category: {id: "c1"}}, target: 0.2},
    {categoryValue: {id: "cv03", category: {id: "c1"}}, target: 0.2},
    {categoryValue: {id: "cv04", category: {id: "c1"}}, target: 0.2},
    {categoryValue: {id: "cv05, category: {id: "c1"}}, target: 0.2},
    // second category, e.g. race
    {categoryValue: {id: "cv11", category: {id: "c2"}}, target: 0.2},
    {categoryValue: {id: "cv12", category: {id: "c2"}}, target: 0.2},
    {categoryValue: {id: "cv13", category: {id: "c2"}}, target: 0.2},
    {categoryValue: {id: "cv14", category: {id: "c2"}}, target: 0.2},
    {categoryValue: {id: "cv15", category: {id: "c2"}}, target: 0.2},
  ],
}

this will be a pretty typical request, even without configuring datasets, tags, or personTypes, and assuming that all the required categories and categoryValues have been created already.

if the frontend has to issue separate queries to create all of these that's 11 separate requests. they also aren't happening inside a transaction, so it's possible for the database state to get inconsistent if any of the requests fail.

my reasoning is this:

  1. atomicity is good for UX and good for the DB and good for whoever's operating the app going forward
  2. a single request would cut out a few dozen lines of code to sort out requests and handle errors and retries on the FE
  3. it also doesn't really add any complexity or LoC to the backend since unless i'm missing something it just leverages the "upsert" logic of SQLAlchemy's merge
  4. it'd be strictly faster than multiple queries, likely significantly so on average, since most of the time is spent waiting for requests to return.
  5. it'd be easier to optimize later on (e.g. along the lines of batching DB queries as mentioned elsewhere).

that said i think let's punt on that for this PR -- i'd like to merge this as soon as we can, we can revise and iterate on mutations later. thanks!

api/queries.py Outdated
def resolve_program(obj, info, id):
'''GraphQL query to find a Program based on Program ID.
:param id: Id for the Program to be fetched
:returns: Program dictionary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit that this isn't actually a dictionary, it's the sql alchemy model!

:returns: Program dictionary
'''
session = info.context['dbsession']
program = Program.get_not_deleted(session, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi i also implemented this resolver. i also chose to just use get(id) without the deleted check in order to support a UI feature of restoring deleting programs. (this is one of the use-cases for soft deletes -- to prevent accidental deletions and let users recover if necessary. We don't necessarily need to do this for every form of delete -- I've done it for Users, and I assume it'll be requested for programs at some point, if not immediately.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware we were allowing users to restore deleted programs. We should make sure that our user stories are organized moving forward- the functionality should be specified upfront as much as possible to make sure dev resources are used wisely. More importantly- why would you have a single query return both deleted and non-deleted db records? We should implement a separate query handling each for clarity. At a minimum, i think it should be a parameter (such as with_deleted) that's passed in to the resolver as an argument.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy, wait on the UI we are allowing users to recover deleted Users? Sorry if I missed that! I didn't see it in the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main use case for these indexes right now is to let admins manage all the objects in the application. if we don't return deleted things in the index, they can't restore deleted things (because they can't get to them in the UI). that's not the end of the world but it's a common feature request.

why would you have a single query return both deleted and non-deleted db records

could be separate queries, but for low cardinality (hundreds or even low thousands of programs, total) it's just simpler to return everything and filter on the frontend and there's not a perceptible performance penalty.

At a minimum, i think it should be a parameter (such as with_deleted)

agreed that'd be ideal, but at this phase it doesn't seem worth the time to implement a directive or a separate query just for this.

the functionality should be specified upfront as much as possible to make sure dev resources are used wisely

agreed, and to be clear this hasn't been explicitly requested. i'm just saying i wouldn't be surprised if it gets requested at some point, which came to mind when i implemented this same resolver in my PR and im just letting you know that that led me to implement it slightly differently!

We should make sure that our user stories are organized moving forward

as i've said before i'm happy to work with you to specify tickets in more detail. i'd love to get to a place where everyone's needs are getting addressed and people are fully supported to do their best work. i'm also juggling a few projects and i'm sorry if i dropped the ball here.

name: String
description: String
teamId: ID
datasetIds: [ID!]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my comment above one way to update the schema would be to change these fields to be like datasets and then define an input like:

input DatasetCreateOrUpdateInput {
  id: ID
  name: String
  description: String
}

then if the user is creating a new dataset they can omit the ID and pass the other stuff, or if they're reusing an existing dataset they can just pass the ID.

(This is a good use case for the input unions / tagged type proposal, but it's not implemented yet: graphql/graphql-spec#733)

some more references for this approach to mutations:

what do you think about this?

@@ -99,6 +100,13 @@ def user_has_permission(permissions: Iterable[str], obj, info) -> bool:
record = Record.get_not_deleted(session, id_)
if record and record.user_is_team_member(current_user):
return True

if info.field_name == 'updateProgram':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually want to grant team members the ability to update a program? This would allow them to change targets, manage datasets, etc. My impression was this was best left as an admin control, so that the teams themselves weren't able to fiddle with things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, although honestly- I am still unclear on who is an "admin", who is a "team member" vs a "current user" vs a "user" and where "logged in" fits into everything (Wouldn't, by the very nature of a user being a user, a user always have to be logged in? Like, how would we know the person is a user/admin/team member if they were not logged in ?). Also, I was under the impression "admin" was Lara-level which, given your comment, does not seem to be the case. However, if an "admin" is just the head of a particular team, wouldn't we need to add the concept of a "super admin" or comparable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is an "admin",

the bbc5050 team - lara and miranda for example, they set up all the stuff for all the teams currently.

Also, I was under the impression "admin" was Lara-level which, given your comment, does not seem to be the case

I think "admin" is lara level -- what makes you think otherwise? "admin" shouldn't be granted to anyone on a team. (currently with their spreadsheet system, lara sets up all the spreadsheets and sends them to people on a team -- the team members themselves don't have control over configuring the spreadsheet. we're basically replicating that model.)

who is a "team member"

any of the data entry people or producers or whoever who should be able to see / record data on a particular program. this controls visibility so that people on one team can't see programs belonging to another team.

"current user"

just refers to the user account who made a request

"user"

could be anyone in the database

where "logged in" fits into everything

it's the least restrictive level of permission. when people visit the login page they are not logged in, so they should have permission to basically nothing. it's good to have a default permission like this as a safety precaution, e.g. to prevent people from fiddling with the /graphql endpoint.

wouldn't we need to add the concept of a "super admin" or comparable?

highly likely we'll need to expand permissions at some point in the future

def soft_delete(self, session):
self.deleted= func.now()
session.add(self)
session.query(CategoryValue).filter(CategoryValue.category_id == self.id).update({'deleted':func.now()}, synchronize_session='fetch')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the design you use elsewhere a bit better, where you would call a soft_delete function that's implemented on the CategoryValue class instead of implementing it here in a separate class (which violates the OOP philosophy of encapsulation).

That said, i also really like this L284 quite a lot for two reasons:

  1. it avoids an unnecesary SELECT query to fetch the instance, and
  2. it issues only a single UPDATE statement based on the category_id.

maybe you can combine the two approaches by writing a @classmethod like CategoryValue.soft_delete_by_category_id? and call that here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoryValue's soft_delete also soft deletes associated targets and entries. - Is that the functionality we're expecting when a Category is deleted?
Implementing CategoryValue.soft_delete_by_category_id is an option however, in a comment further below you suggest also creating a Target.soft_delete_by_program_id and a Dataset.soft_delete_by_program_id. It feels like implementing this pattern will get unwieldy fairly quickly if we are creating them for any ids were hoping to target, but I see your point re: the unnecessary SELECT. I will create a ticket to add these functions in a future pr and I will tackle it next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an option instead of writing explicit methods (agree, tedious) is to have a polymorphic signature

something like-

class Target(Base):
    @classmethod
    def soft_delete(cls, session, **kwargs):
        filters = [cls[k] == v for k, v in kwargs.items()]
        return session.query(cls).filter(*filters).update({"deleted": func.now()})

haven't tested it, and you'd want to add some safe guards (since that could be very destructive!), but then you could call it like:

Target.soft_delete(session, program_id="prog123")
Target.soft_delete(session, tag_id="tag123")

etc.

since that method also use cls instead of Target explicitly it could also be defined in a mixin, like class SoftDeletableMixin that you could extend like Target(SoftDeletableMixin).

all sorts of options, but i think let's leave things where they are for now and prioritize merging.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I kinda like that- thanks for bringing it up!

datasets = session.query(Dataset).filter(Dataset.program_id == self.id).all()
for dataset in datasets:
dataset.soft_delete(session)
session.query(Target).filter(Target.program_id == self.id).update({'deleted':func.now()}, synchronize_session='fetch')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like with the Category/CategoryValue class, i like the encapsulation of the soft_delete method like you have with dataset, but i also like the functional style instead of the instance-method - so maybe write a classmethod like Target.soft_delete_by_program_id?

if you updated the Dataset to have a classmethod like Dataset.soft_delete_by_program_id you could optimize this quite a bit -- currently you have to make n+1 queries, where you use one SELECT to get the datasets then an UPDATE for each to delete them. really you just need the one UPDATE query to update the batch (then if you want to use synchronize_session it will still keep stuff in sync but it will be optimized to reduce querying).

@nmanduj1 nmanduj1 merged commit 2c814fb into main Jul 9, 2021
@nmanduj1 nmanduj1 deleted the ch295/crud-programs branch July 9, 2021 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants