Ch295/crud programs #87

nmanduj1 · 2021-07-07T22:11:53Z

No description provided.

shortcut-integration · 2021-07-07T22:11:55Z

This pull request has been linked to Clubhouse Story #295: [API] Implement Program resolvers.

jnu

this looks great, nice work!! i made some suggestions, in particular about:

shifting to use @classmethods for querying instance of instance methods to avoid unnecessary querying (eliminating n+1 queries and avoiding unnecessary SELECTs).
permissions on program updates -- i think we can just let admins handle this
a mutation for deeply updating programs + related objects, to simplify the primary request logic from the frontend.

i don't think (3) should hold up this PR -- we can add a separate mutation that does and keep the existing logic. i think it'd be good to prioritize merging this so i can resolve conflicts in my PR and start using it.

so exciting to see this coming together!!! great work!!! <3

jnu · 2021-07-08T18:19:02Z

api/mutations.py

-    session.query(CategoryValue).filter(CategoryValue.id == id).update({'deleted':func.now()}, synchronize_session='fetch')
-    session.query(Entry).filter(Entry.category_value_id == id).update({'deleted':func.now()}, synchronize_session='fetch')
-    session.query(Target).filter(Target.category_value_id == id).update({'deleted':func.now()}, synchronize_session='fetch')
+    CategoryValue.get_not_deleted(session, id).soft_delete(session)


mentioned this elsewhere, but using a classmethod soft_delete written in terms of an ID you can avoid extra trips to the database (one to fetch the object, one to issue the update).

otherwise it'd be good to add a if not None condition because get_not_deleted might not return anything.

The problem with that is that the soft_delete instance method is not only responsible for deleting itself but also cascading the soft delete to its direct children. I will add the conditional as you mentioned.

agreed, but given the foreign-key relationship it shouldn't be a problem to do in a classmethod (e.g. with the soft_delete_by_category_id etc comments elsewhere).

jnu · 2021-07-08T18:35:36Z

api/mutations.py

+    tags = input.pop('tag_ids', [])
+    program = session.query(Program).get(input['id'])
+    if len(datasets) > 0:
+        program.datasets = [session.merge(Dataset(id=uuid.UUID(dataset_id))) for dataset_id in datasets]


What do you think of allowing datasets, targets, and other entities to be created on-the-fly when a program is created or updated? in the current UI design there are only pages for managing the User, Team, and Program entities -- everything else is subordinate to those entities.

A drawback of managing all these entities in separate mutations is that we lose atomicity -- the frontend would have to issue multiple requests and handle retry logic independently for each. for example:

scenario: updating a program and adding a new target, new tag, and new dataset: FE requests: 1) Mutation to create Target and return ID 2) Mutation to create Tag and return ID 3) Mutation to create Dataset and return ID 4) Query to get existing state of program 5) Mutation to update Program and pass in existing IDs combined with the three new IDs.

if instead we updated the programs update mutation to accept objects for these entities we could handle the merge/create implicitly and simplify the error handling:

mutation { updateProgram(updates: { tags: [{id: "some-existing-tag"}, {name: "NewTagIWantToCreate"}], targets: [{id: "some-existing-target"}, {categoryValue: {...}, target: 0.2}], // and so on }) { id } }

then if any part of that query fails the whole thing will be rolled back and no changes at all will be applied. Then the UI's job becomes much easier.

I think this is also fairly simple to extend what you currently have -- instead of assuming datasets is just a list of IDs, just use it as an arbitrary list of fields to update:

program.datasets = [session.merge(Dataset(**dataset)) for dataset in datasets]

if an id is passed merge should detect that the thing exists, otherwise it'll create it.

I think that pattern is valid. I actually used the nesting pattern for some of our other resolvers. I chose to not implement that pattern here because it felt like we would end up with unnecessarily bloated resolvers. It feels more natural that the resolver should be responsible for the resource it describes. And there’s no reason why the frontend should not be able to make multiple network calls to create the resources it needs on the fly.

That said, I am not passionate about keeping thin resolvers and can implement the nesting pattern for Programs if you and @seerocode think that’s best.

there’s no reason why the frontend should not be able to make multiple network calls to create the resources it needs on the fly.

a full realistic example of creating a new program is:

new program:

{ name: "foo", team: "abc123-whatever-uuid", targets: [ // first category, e.g. gender {categoryValue: {id: "cv01", category: {id: "c1"}}, target: 0.2}, {categoryValue: {id: "cv02", category: {id: "c1"}}, target: 0.2}, {categoryValue: {id: "cv03", category: {id: "c1"}}, target: 0.2}, {categoryValue: {id: "cv04", category: {id: "c1"}}, target: 0.2}, {categoryValue: {id: "cv05, category: {id: "c1"}}, target: 0.2}, // second category, e.g. race {categoryValue: {id: "cv11", category: {id: "c2"}}, target: 0.2}, {categoryValue: {id: "cv12", category: {id: "c2"}}, target: 0.2}, {categoryValue: {id: "cv13", category: {id: "c2"}}, target: 0.2}, {categoryValue: {id: "cv14", category: {id: "c2"}}, target: 0.2}, {categoryValue: {id: "cv15", category: {id: "c2"}}, target: 0.2}, ], }

this will be a pretty typical request, even without configuring datasets, tags, or personTypes, and assuming that all the required categories and categoryValues have been created already.

if the frontend has to issue separate queries to create all of these that's 11 separate requests. they also aren't happening inside a transaction, so it's possible for the database state to get inconsistent if any of the requests fail.

my reasoning is this:

atomicity is good for UX and good for the DB and good for whoever's operating the app going forward

a single request would cut out a few dozen lines of code to sort out requests and handle errors and retries on the FE

it also doesn't really add any complexity or LoC to the backend since unless i'm missing something it just leverages the "upsert" logic of SQLAlchemy's merge

it'd be strictly faster than multiple queries, likely significantly so on average, since most of the time is spent waiting for requests to return.

it'd be easier to optimize later on (e.g. along the lines of batching DB queries as mentioned elsewhere).

that said i think let's punt on that for this PR -- i'd like to merge this as soon as we can, we can revise and iterate on mutations later. thanks!

jnu · 2021-07-08T18:35:57Z

api/queries.py

+def resolve_program(obj, info, id):
+    '''GraphQL query to find a Program based on Program ID.
+        :param id: Id for the Program to be fetched
+        :returns: Program dictionary


nit that this isn't actually a dictionary, it's the sql alchemy model!

jnu · 2021-07-08T18:37:33Z

api/queries.py

+        :returns: Program dictionary
+    '''
+    session = info.context['dbsession']
+    program = Program.get_not_deleted(session, id)


fyi i also implemented this resolver. i also chose to just use get(id) without the deleted check in order to support a UI feature of restoring deleting programs. (this is one of the use-cases for soft deletes -- to prevent accidental deletions and let users recover if necessary. We don't necessarily need to do this for every form of delete -- I've done it for Users, and I assume it'll be requested for programs at some point, if not immediately.)

I was not aware we were allowing users to restore deleted programs. We should make sure that our user stories are organized moving forward- the functionality should be specified upfront as much as possible to make sure dev resources are used wisely. More importantly- why would you have a single query return both deleted and non-deleted db records? We should implement a separate query handling each for clarity. At a minimum, i think it should be a parameter (such as with_deleted) that's passed in to the resolver as an argument.

Oy, wait on the UI we are allowing users to recover deleted Users? Sorry if I missed that! I didn't see it in the tests.

the main use case for these indexes right now is to let admins manage all the objects in the application. if we don't return deleted things in the index, they can't restore deleted things (because they can't get to them in the UI). that's not the end of the world but it's a common feature request.

why would you have a single query return both deleted and non-deleted db records

could be separate queries, but for low cardinality (hundreds or even low thousands of programs, total) it's just simpler to return everything and filter on the frontend and there's not a perceptible performance penalty.

At a minimum, i think it should be a parameter (such as with_deleted)

agreed that'd be ideal, but at this phase it doesn't seem worth the time to implement a directive or a separate query just for this.

the functionality should be specified upfront as much as possible to make sure dev resources are used wisely

agreed, and to be clear this hasn't been explicitly requested. i'm just saying i wouldn't be surprised if it gets requested at some point, which came to mind when i implemented this same resolver in my PR and im just letting you know that that led me to implement it slightly differently!

We should make sure that our user stories are organized moving forward

as i've said before i'm happy to work with you to specify tickets in more detail. i'd love to get to a place where everyone's needs are getting addressed and people are fully supported to do their best work. i'm also juggling a few projects and i'm sorry if i dropped the ball here.

jnu · 2021-07-08T18:56:50Z

api/schema.graphql

+  name: String
+  description: String
+  teamId: ID
+  datasetIds: [ID!]


per my comment above one way to update the schema would be to change these fields to be like datasets and then define an input like:

input DatasetCreateOrUpdateInput { id: ID name: String description: String }

then if the user is creating a new dataset they can omit the ID and pass the other stuff, or if they're reusing an existing dataset they can just pass the ID.

(This is a good use case for the input unions / tagged type proposal, but it's not implemented yet: graphql/graphql-spec#733)

some more references for this approach to mutations:

the points about "nesting" in this apollo blog: https://www.apollographql.com/blog/graphql/basics/designing-graphql-mutations/

the "deep" mutations supported by dgraph: https://dgraph.io/docs/graphql/mutations/deep/

what do you think about this?

jnu · 2021-07-08T18:59:06Z

api/directives.py

@@ -99,6 +100,13 @@ def user_has_permission(permissions: Iterable[str], obj, info) -> bool:
                record = Record.get_not_deleted(session, id_)
                if record and record.user_is_team_member(current_user):
                    return True
+
+            if info.field_name == 'updateProgram':


do we actually want to grant team members the ability to update a program? This would allow them to change targets, manage datasets, etc. My impression was this was best left as an admin control, so that the teams themselves weren't able to fiddle with things.

Good point, although honestly- I am still unclear on who is an "admin", who is a "team member" vs a "current user" vs a "user" and where "logged in" fits into everything (Wouldn't, by the very nature of a user being a user, a user always have to be logged in? Like, how would we know the person is a user/admin/team member if they were not logged in ?). Also, I was under the impression "admin" was Lara-level which, given your comment, does not seem to be the case. However, if an "admin" is just the head of a particular team, wouldn't we need to add the concept of a "super admin" or comparable?

who is an "admin",

the bbc5050 team - lara and miranda for example, they set up all the stuff for all the teams currently.

Also, I was under the impression "admin" was Lara-level which, given your comment, does not seem to be the case

I think "admin" is lara level -- what makes you think otherwise? "admin" shouldn't be granted to anyone on a team. (currently with their spreadsheet system, lara sets up all the spreadsheets and sends them to people on a team -- the team members themselves don't have control over configuring the spreadsheet. we're basically replicating that model.)

who is a "team member"

any of the data entry people or producers or whoever who should be able to see / record data on a particular program. this controls visibility so that people on one team can't see programs belonging to another team.

"current user"

just refers to the user account who made a request

"user"

could be anyone in the database

where "logged in" fits into everything

it's the least restrictive level of permission. when people visit the login page they are not logged in, so they should have permission to basically nothing. it's good to have a default permission like this as a safety precaution, e.g. to prevent people from fiddling with the /graphql endpoint.

wouldn't we need to add the concept of a "super admin" or comparable?

highly likely we'll need to expand permissions at some point in the future

jnu · 2021-07-08T19:03:29Z

api/database.py

+    def soft_delete(self, session):
+        self.deleted= func.now()
+        session.add(self)
+        session.query(CategoryValue).filter(CategoryValue.category_id == self.id).update({'deleted':func.now()}, synchronize_session='fetch')


i like the design you use elsewhere a bit better, where you would call a soft_delete function that's implemented on the CategoryValue class instead of implementing it here in a separate class (which violates the OOP philosophy of encapsulation).

That said, i also really like this L284 quite a lot for two reasons:

it avoids an unnecesary SELECT query to fetch the instance, and

it issues only a single UPDATE statement based on the category_id.

maybe you can combine the two approaches by writing a @classmethod like CategoryValue.soft_delete_by_category_id? and call that here?

CategoryValue's soft_delete also soft deletes associated targets and entries. - Is that the functionality we're expecting when a Category is deleted?
Implementing CategoryValue.soft_delete_by_category_id is an option however, in a comment further below you suggest also creating a Target.soft_delete_by_program_id and a Dataset.soft_delete_by_program_id. It feels like implementing this pattern will get unwieldy fairly quickly if we are creating them for any ids were hoping to target, but I see your point re: the unnecessary SELECT. I will create a ticket to add these functions in a future pr and I will tackle it next week.

an option instead of writing explicit methods (agree, tedious) is to have a polymorphic signature

something like-

class Target(Base): @classmethod def soft_delete(cls, session, **kwargs): filters = [cls[k] == v for k, v in kwargs.items()] return session.query(cls).filter(*filters).update({"deleted": func.now()})

haven't tested it, and you'd want to add some safe guards (since that could be very destructive!), but then you could call it like:

Target.soft_delete(session, program_id="prog123") Target.soft_delete(session, tag_id="tag123")

etc.

since that method also use cls instead of Target explicitly it could also be defined in a mixin, like class SoftDeletableMixin that you could extend like Target(SoftDeletableMixin).

all sorts of options, but i think let's leave things where they are for now and prioritize merging.

Interesting! I kinda like that- thanks for bringing it up!

jnu · 2021-07-08T19:16:42Z

api/database.py

+        datasets = session.query(Dataset).filter(Dataset.program_id == self.id).all()
+        for dataset in datasets: 
+            dataset.soft_delete(session)
+        session.query(Target).filter(Target.program_id == self.id).update({'deleted':func.now()}, synchronize_session='fetch')


like with the Category/CategoryValue class, i like the encapsulation of the soft_delete method like you have with dataset, but i also like the functional style instead of the instance-method - so maybe write a classmethod like Target.soft_delete_by_program_id?

if you updated the Dataset to have a classmethod like Dataset.soft_delete_by_program_id you could optimize this quite a bit -- currently you have to make n+1 queries, where you use one SELECT to get the datasets then an UPDATE for each to delete them. really you just need the one UPDATE query to update the batch (then if you want to use synchronize_session it will still keep stuff in sync but it will be optimized to reduce querying).

Add class method to Category and update tests/cleanup

nmanduj1 requested a review from jnu July 7, 2021 22:12

jnu approved these changes Jul 8, 2021

View reviewed changes

nmanduj1 added 27 commits July 8, 2021 20:00

Add queryProgram

4c0a603

Add createProgram + update gql schema

2549f43

[WIP] updateProgram

77a5529

Add deleteProgram, update gql schema and programQuery

d6e6d6c

[WIP] Update resolver soft-deletes to use instance methods

b76a90f

Cleanup deleteDataset mutation

7094dd4

Clean up deleteRecord mutation

78519d0

Making tagids required

e5c9ba1

Update programQuery to use class method

e98a261

Update deleteRecord tests to use class method

4b2bbd2

Add class and instance methods

9910a0a

Update program and categoryVal to use class and instance methods

2415811

Update categoryValueQuery to handle soft-deletes

8ac6359

Add createProgram test + seed data mod

736a6c5

Add createProg no perm test

1ceec3a

Add deleteProgram test

a168ccf

Update deleteProgram and Dataset tests to use class method

a3054c9

Add class method to Category and update tests/cleanup

Update deleteCat noperm test to use class methods

a44e0bc

Update deleteCat test to use class methods

4871083

Update deleteCatValNoPerm test to use class methods

37113d5

Add queyProgram test

a2eeb69

Add additional test coverage for queryProgram

aec9640

Add updateProgram test

cda32b9

Add updatePrograms to team member permissions

cec2b26

Add UUID type coercion for tests

d6f4cdc

Add test coverage

001b708

Add test coverage

af8a6b6

nmanduj1 added 6 commits July 8, 2021 20:27

Cleaning up queries- simplify to use class methods

9484890

Cleaning up deleteCat

3feb9b2

Add test coverage for cascade soft-deletes

a8a8017

Add conditional to deletes

bc58fb5

Remove updateProgram team permissions

b7514ed

Cleaning up comments

57455e2

nmanduj1 force-pushed the ch295/crud-programs branch from 0010a55 to 57455e2 Compare July 9, 2021 00:29

nmanduj1 merged commit 2c814fb into main Jul 9, 2021

nmanduj1 deleted the ch295/crud-programs branch July 9, 2021 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ch295/crud programs #87

Ch295/crud programs #87

nmanduj1 commented Jul 7, 2021

shortcut-integration bot commented Jul 7, 2021

jnu left a comment

jnu Jul 8, 2021

nmanduj1 Jul 8, 2021

jnu Jul 8, 2021

jnu Jul 8, 2021

nmanduj1 Jul 8, 2021

jnu Jul 8, 2021

jnu Jul 8, 2021

jnu Jul 8, 2021

nmanduj1 Jul 8, 2021

seerocode Jul 8, 2021

jnu Jul 8, 2021

jnu Jul 8, 2021

jnu Jul 8, 2021

nmanduj1 Jul 8, 2021

jnu Jul 8, 2021

jnu Jul 8, 2021

nmanduj1 Jul 8, 2021

jnu Jul 8, 2021

nmanduj1 Jul 9, 2021

jnu Jul 8, 2021

Ch295/crud programs #87

Ch295/crud programs #87

Conversation

nmanduj1 commented Jul 7, 2021

shortcut-integration bot commented Jul 7, 2021

jnu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment