Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A note and another strategy for ID-based deduplication #3526

Merged
merged 3 commits into from
Sep 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ Moreover, the ICP/ICRC ledgers use the `created_at_time` parameter to limit the

But even with this improvement used in the ledgers, the time window approach implicitly assumes that the client will be able to get a definite answer to their call within the time window. For example, after the 24 hours expire, the user cannot easily tell if their ledger transfer happened; their only option is to analyze the ledger blocks, which is somewhat tedious, and has to be done carefully to avoid asynchrony issues; see the section on [queryable call results](#queryable-call-results).

Relying solely on a time window for deduplication does not guarantee bounded memory usage. In theory, an unlimited number of updates could occur within the time window, though in practice, this is constrained by the scaling limits of the ICP. The ICP/ICRC ledgers thus also define a maximum capacity: a limit on the number of deduplicated transactions (i.e., deduplication IDs) that can be stored in their deduplication store. Once this capacity is reached, further transactions are rejected until older transactions expire from the deduplication store at the end of the time window. Yet another extension of the approach is to guaranteed deduplication for the stated time window as above, but keep storing deduplication IDs even beyond that window, as long as the capacity is not reached. This way, the clients obtain a hard deduplication guarantee for the time window, and a best-effort attempt to deduplicate transactions even past the window.

An alternative is to do away with the time window, and store the deduplication data forever. This requires multiple canisters to prevent exhausting the canister memory, similar to how the ICP/ICRC ledgers store the transaction data in the archive canister. This shifts the tedious part of querying the deduplication data (e.g., ledger blocks) from the user to the canister.

Summarizing, the advantages of this approach are:
Expand Down
Loading