Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to offline entity backlog processing #1171

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

ktuite
Copy link
Member

@ktuite ktuite commented Aug 20, 2024

This is meant to be a precursor to another PR for issue getodk/central#682

The main changes here are

  • Migration to add more useful columns to submission backlog
  • Reordering and reorganizing code in entity query module
  • Changing flow of Entities._updateEntity to try to determine the base version of an entity before it looks up the existing server version, because this helps with the logic of holding submissions for later processing, and will eventually help with the force-processing case.
  • Removes branchId from tests about creating entities

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

Copy link
Member Author

@ktuite ktuite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another test scenario?

@@ -0,0 +1,25 @@
// Copyright 2024 ODK Central Developers
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something here reminded us to double-check what happens with reprocessing submissions in soft-deleted forms.

Once we have entity deletion/purging, we will probably want to delete things from the backlog or not process them.

lib/model/query/entities.js Outdated Show resolved Hide resolved
@@ -415,13 +409,15 @@ const _processSubmissionEvent = (event, parentEvent) => async ({ Audits, Dataset
throw Problem.user.entityActionNotPermitted({ action, permitted: permittedActions });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably need to change to prevent updates-applied-as-creates from happening on forms that dont allow creates. Double check when returning to force-processing work.

// Look up the next submission to process
const _getNextHeldSubmissionInBranch = (entityUuid, branchId, branchBaseVersion) => ({ maybeOne }) => (
(branchId == null)
? maybeOne(sql`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could there be multiple? could this query fail?

@ktuite ktuite merged commit b4a8e6c into master Aug 21, 2024
5 checks passed
@ktuite ktuite deleted the ktuite/improve_offline_entities branch August 21, 2024 18:12
Comment on lines +320 to +321
if (clientEntity.def.baseVersion === 1) // Special case
condition = { version: 1 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned during code review that I thought that once we introduce force-processing, it won't always be the case that if an entity has a branch base version of 1, it will have a server base version of 1. Thinking about it more, I still sort of think that's the case, but I also don't think it'd be a big deal to keep it the way it is. In any case, let me throw out my example. I think there's only an issue if the entity was created offline, then immediately also updated offline at least twice, then the resulting submissions are processed out of order. For example:

  • An entity is created offline. Before being sent to the server, it is updated offline 3 times.
  • There's a problem sending the entity creation and the 1st update, but the 2nd and 3rd updates are sent successfully.
  • After some time, the 2nd and 3rd updates are force-processed. On the server, the 2nd update ends up with version=1, baseVersion=null, branchBaseVersion=2. The 3rd update ends up with version=2, baseVersion=1, branchBaseVersion=3.
  • The 1st update is eventually sent successfully and processed. The 1st update has branchBaseVersion=1, but I think ideally, it would get a server baseVersion=2, not baseVersion=1 as it does here. We've discussed how for now, force-processing on its own (without interleaving) should not cause a conflict. If the 1st update were given a server baseVersion of 1 instead of 2, that would be a conflict.

The logic that I think would be ideal would be something like: "If clientEntity.def.baseVersion === 1, and clientEntity.def.trunkVersion == null, then find the latest version from the branch that's been processed. Use that version as the base version."

The reason I think it's not a big deal is that in this case, the entity creation already has a high potential for causing a conflict. Unless the entity creation is processed at the very end, it will cause a conflict, because it will be applied as an update, and that update will cause the branch to stop being contiguous. In the example above, if the processing order were 2nd update / 3rd update / entity creation / 1st update, the entity creation would already cause the 1st update to be a conflict, unless we have extra logic that associates the entity creation with the branch. Without that extra logic, the entity creation would look like an interleaving update from outside the branch. I'm thinking that maybe we should just be prepared for conflicts in this case: if an entity is created offline, then updated offline at least twice, and the entity creation is processed late, then there is a significant chance of conflict.

Anyway, just wanted to share this example and thoughts in case they're helpful as you're thinking about force-processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants