New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: [MR-552] Implement callback expiration #1699

Open

alin-at-dfinity wants to merge 44 commits into master from alin/MR-552-callback-expiration

Contributor

alin-at-dfinity commented Sep 26, 2024 •

edited by jira bot

Loading

MR-552: Generate compact reject responses upon best-effort callback expiration. And inflate them into SYS_UNKNOWN reject responses when peeking/popping.

alin-at-dfinity added 19 commits

September 7, 2024 10:17


          feat: [MR-523] Prevent enqueueing multiple responses for the same cal…

75461e9

…lback


          feat: [MR-603] Keep track of shed inbound responses

9534ca4


          Define two separate CanisterInput variants of SysUnknown responses: D…

e65c087

…eadlineExpired and ResponseDropped.


          Merge branch 'master' into alin/MR-523-response-deduplication

c70e701


          Merge branch 'alin/MR-523-response-deduplication' into alin/MR-603-sh…

9fe3fa6

…ed-inbound-responses


          Make clippy happy.

fa0e182


          Address review comments.

5c90ada


          Merge branch 'alin/MR-523-response-deduplication' into alin/MR-603-sh…

e2a5852

…ed-inbound-responses


          Update queues_compatibility_test to use mainnet version. Describe cal…

9b8bbf5

…lbacks_with_enqueued_response invariant in CanisterQueues doc comment.


          Merge branch 'alin/MR-523-response-deduplication' into alin/MR-603-sh…

9d0e758

…ed-inbound-responses


          Merge branch 'master' into alin/MR-603-shed-inbound-responses

54f89c3


          Produce SysUnknown reject codes for shed responses.

679d618


          Merge branch 'master' into alin/MR-603-shed-inbound-responses

e0b1bc3


          Add test for shedding inbound responses.

e55ba00


          Have SystemState::pop_input() always succeed, by returning an arbitra…

d01c587

…ry reject response that will result in a critical eror anyway (and the response being dropped).


          Merge branch 'master' into alin/MR-603-shed-inbound-responses

dc80ea3


          Address review comments: use the anonymous principal instead of IC-00…

195b5e8

… as a dummy canister ID.


          refactor: [MR-603] Typed canister queues and references

2b6fb3a

Assign type parameters to canister queues and references, to designate them as either input/inbound or output/outbound.

Ensures that input queues can only hold inbound references; and output queues can only hold outbound references. Implements separate logic (for inbound and outbound references) for determining staleness, lookup and removal.


          Make clippy happy.

efa16a2

alin-at-dfinity requested review from stiegerc, derlerd-dfinity and oggy-dfin

September 26, 2024 11:37

github-actions bot added the feat label

alin-at-dfinity added 7 commits

September 26, 2024 11:50


          Make clippy even happier.

4ede3ff


          Address review comments: deduplicate queue_front_not_stale() out of…

0e7b71c

… the two specific `MessageStore<_>` implementations into `MessageStoreImpl`.


          Merge branch 'master' into alin/MR-603-typed-queues

e0388bb


          Add tests for Reference<T> conversions. Improve documentations. MNino…

523a1ce

…t cleam-ups.


          Fix doc comment.

61a3f17


          Merge branch 'master' into alin/MR-603-typed-queues

46b514a


          Make clippy happy.

913a0af

alin-at-dfinity added 7 commits

September 30, 2024 15:06


          Rely on free functions instead of implementing from to convert from I…

94937cc

…d to InboundReference / OutboundReference, to make it clear that the functions may panic (even though the two types are essentially private to the module).


          Get rid of asserts about reference contexts altogether. Heve the queu…

0e185a2

…e item types declare their own context (inbound vs outbound) and use that when constructing a new reference of the given type.


          feat: [MR-552] Implement callback expiration

0a5a38e

Generate compact reject responses upon best-effort callback expiration. And inflate them into SYS_UNKNOWN reject responses when peeking/popping.


          Make clippy extatic.

a4de1e3


          Minor cleanup.

d141beb


          CanisterQueues tests.

d03ccd2


          Add test for SystemState::time_out_callbacks().

41d6b69

alin-at-dfinity force-pushed the alin/MR-552-callback-expiration branch from 32c0d32 to 41d6b69 Compare

October 1, 2024 05:52

alin-at-dfinity marked this pull request as ready for review

October 1, 2024 05:53

alin-at-dfinity requested review from a team as code owners

October 1, 2024 05:53

Base automatically changed from alin/MR-603-typed-queues to master

October 1, 2024 08:01

stiegerc reviewed

View reviewed changes

Contributor

stiegerc left a comment

So far lgtm. Didn't spend much time on the tests though.

rs/replicated_state/src/canister_state/queues.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues.rs Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues.rs Outdated

Comment on lines 474 to 491

+                  fn get(&self, reference: InboundReference) -> CanisterInput {
+                      assert_eq!(Context::Inbound, reference.context());
+                      if let Some(msg) = self.pool.get(reference) {
+                          debug_assert!(!self.expired_callbacks.contains_key(&reference));
+                          debug_assert!(!self.shed_responses.contains_key(&reference));
+                          return msg.clone().into();
+                      } else if reference.class() == Class::BestEffort && reference.kind() == Kind::Response {
+                          if let Some(callback_id) = self.expired_callbacks.get(&reference) {
+                              debug_assert!(!self.shed_responses.contains_key(&reference));
+                              return CanisterInput::DeadlineExpired(*callback_id);
+                          } else if let Some(callback_id) = self.shed_responses.get(&reference) {
+                              return CanisterInput::ResponseDropped(*callback_id);
+                          }
+                      }
+                      panic!("stale reference at the front of input queue");
+                  }

Contributor

stiegerc Oct 1, 2024

This seems a bit odd. get() and take() is almost the same function. For one you could probably refactor that using a helper function to remove the duplication. Why is get() returning a clone though? It's usually Option<&T>.

Contributor Author

alin-at-dfinity Oct 1, 2024

I suppose it's possible to either pass a function pointer or use generics, but the two functions aren't all that large. And trying to extract the common structure might simply add complexity and make it less readable. If you don't have a strong objection, I'd rather leave it as is.

MessageStore<CanisterInput>::get() returns a clone rather than a reference because we have to build a CanisterInput enum on the fly and Rust (with good reason) won't let you return a reference to a temporary value. MessageStore<RequestOrResponse>::get() OTOH, will return an Option<&RequestOrResponse> because it holds RequestOrResponse values within, so it can just return references to them.

Contributor

stiegerc Oct 2, 2024

Fair enough. How about a function

fn is_best_effort_response(&self) -> bool {
   self.class() == Class::BestEffort && self.kind() == Kind::Response
}

or something like that? Maybe also something similar for expired_callbacks and shed_responses since both of these expressions also appear in the complicated logical expression @derlerd-dfinity mentioned below?

Contributor Author

alin-at-dfinity Oct 2, 2024

That's a good point, actually. At some point I was thinking of adding fancier methods to Id / Reference, so that you could say BestEffort && Request by applying a single mask and a single binary operation (as opposed to filtering for one bit and then separately for the other). But now that you brought this up, I took a quick look and the two combinations we care about are "inbound best-effort response" (because these can be shed and result in an edge case); and "outbound guaranteed-response requests" (because these are the only non-best-effort messages that expire). So I added explicit methods for the two cases (which are just id & BITMASK == Inbound | BestEffort | Response, which compiles to id & x = y).

rs/replicated_state/src/canister_state/queues.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues/message_pool/tests.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues/message_pool/tests.rs Outdated Show resolved Hide resolved


          Merge branch 'master' into alin/MR-552-callback-expiration

a55c769

github-actions bot added @execution @ic-interface-owners labels


          Address review comments: avoid also allocating a Vec in InboundMessag…

f8e8ffc

…eStore::callbacks_with_enqueued_response(); fix typo; move inner function definition before use.

derlerd-dfinity reviewed

View reviewed changes

Contributor

derlerd-dfinity left a comment

Did a quick initial pass and left some comments.

rs/protobuf/def/state/queues/v1/queues.proto Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues.rs Outdated Show resolved Hide resolved


          Address review comment: restructure a bit MessageStore<CanisterInput>…

f1d4040

…::is_stale() for better readability. Make all asserts for the right context type into debug_asserts().

stiegerc reviewed

View reviewed changes

rs/replicated_state/src/canister_state/queues/message_pool/tests.rs Show resolved Hide resolved

derlerd-dfinity reviewed

View reviewed changes

rs/replicated_state/src/canister_state/queues.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/system_state.rs Outdated Show resolved Hide resolved

oggy-dfin reviewed

View reviewed changes

Member

oggy-dfin left a comment

Thanks! LGTM overall, the blocker for me is understanding why the unwrap in system_state is safe (assuming that it is safe).

rs/replicated_state/src/canister_state/queues.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues.rs Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues/message_pool.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues/tests.rs

@@ @@ -74,6 +74,20 @@ impl CanisterQueuesFixture { @@
                       )
                   }
+                  fn try_push_deadline_expired_input(&mut self) -> Result<(), String> {
+                      self.last_callback_id += 1;

Member

oggy-dfin Oct 2, 2024

Absolutely not for this PR, but conceptually there should be one callback ID for every successful push_output_request. If you're going in the direction of bundling the CallContextManager together with the queues, I do wonder if we have any practical chance of actually enforcing this, i.e., somehow have CallbackId be only generated through push_output_request. I realize you'd need to thread some canister-global state through to keep the numbers unique across the different queues. And I now also realize that there are two next_callback_id fields, one in SandboxSafeSystemState, and one in CallContextManager, and I have no idea what's going on there.

Contributor Author

alin-at-dfinity Oct 2, 2024

I also hope it can be done. My eventual aim is to turn SystemState (at least the parts of it that deal with CanisterQueues plus CallContextManager) into a state machine (in-between message executions; executing a request; executing a response; etc.). It should be possible to automatically create callbacks as requests are being enqueued, although it may require refactoring on the Execution end.

The reason why both CallContextManager and SandboxSafeSystemState have a next_callback_id is that the latter is trying to guess the callback IDs that the former will assign the new callbacks when they are registered (likely in order to populate them in the Request). So bundling callback creation and request enqueuing should also get rid of this weirdness.

rs/replicated_state/src/canister_state/system_state.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/system_state/call_context_manager.rs

@@ @@ -276,7 +276,7 @@ impl CallContextManagerStats { @@
                   /// Calculates the stats for the given call contexts and callbacks.
                   ///
-                  /// Time complexity: `O(|call_contexts| + |callbacks|)`.
+                  /// Time complexity: `O(n)`.

Member

oggy-dfin Oct 2, 2024

Nit, but I found the previous formulation more informative, it's not really clear to me what n is if I just see that.

Contributor Author

alin-at-dfinity Oct 2, 2024

As explained elsewhere to Christian, my goal with these notes is to give the caller an idea of whether it's OK to use this in a tight loop or not (and ideally not, since anything that isn't constant or logarithmic time has no business being called from the scheduler or Message Routing). If you check, virtually all the functions with this kind of note are invariant checks or "calculate X" sort of things, that should only be called from debug_asserts or while loading the state.

What the exact n is, is irrelevant, since it's more or less under the control of users or an attacker. Saying e.g. O(|shed_responses|) may lead someone to conclude "Oh, this is likely to be a small number, so it's fine to use it for X".

rs/replicated_state/tests/system_state.rs Outdated Show resolved Hide resolved

alin-at-dfinity added 3 commits

October 2, 2024 19:07


          Address review comments: implement Reference::.is_inbound_best_effort…

a8330fd

…_response() and Reference::is_outbound_guaranteed_request() methods and use them to more concisely check for the respective message types.


          Address review comments.

5a9dd15


          Merge branch 'master' into alin/MR-552-callback-expiration

4c46593

derlerd-dfinity reviewed

View reviewed changes

Contributor

derlerd-dfinity left a comment

Just finished my pass over the prod code. Left some more questions comments but overall LGTM.

Will complete my pass by going over the tests; after that I think I should be ready to approve.

rs/replicated_state/src/canister_state/queues/message_pool.rs Show resolved Hide resolved

rs/replicated_state/src/canister_state/queues/message_pool.rs Show resolved Hide resolved

rs/replicated_state/src/canister_state/system_state.rs Outdated Show resolved Hide resolved

rs/replicated_state/src/canister_state/system_state.rs Show resolved Hide resolved


          Address review comments.

fceea1b

derlerd-dfinity approved these changes

View reviewed changes

Contributor

derlerd-dfinity left a comment

Thanks a lot. LGTM as soon as Ogi and Christian are happy as well.

rs/replicated_state/tests/system_state.rs Show resolved Hide resolved


          Add a bad weather test case for SystemState::time_out_callbacks().

c4fb4b5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@execution feat @ic-interface-owners