Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: remove PersistMonotonicWriteWorker #29087

Merged
merged 3 commits into from
Aug 27, 2024

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Aug 16, 2024

It was originally introduced to wrap persist WriteHandles in a task, but
everything that was using it to write data was already in a task. So at
this point, it's just an extra layer of indirection to reason about.

Motivation

  • This PR refactors existing code.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@danhhz danhhz changed the title [DNM] storage: remove PersistMonotonicWriteWorker Aug 16, 2024
@danhhz
Copy link
Contributor Author

danhhz commented Aug 16, 2024

There are some test failures to shake out, but feels like this is a worthwhile simplification?

@benesch
Copy link
Member

benesch commented Aug 16, 2024

feels like this is a worthwhile simplification?

1000%!

Ok if I defer the review on this one to @aljoscha? I'm not too familiar with this code these days. (@ParkMyCar may also be able to take a look? I feel like he was the one to add the monotonic write worker originally.)

@@ -1153,6 +1049,55 @@ where
(tx, handle.abort_on_drop(), shutdown_tx)
}

async fn monotonic_append<T: Timestamp + Lattice + Codec64 + TimestampManipulation>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I ported the intention here correctly, it was kinda hard to follow

// aware of read-only mode and will not attempt to write before told
// to do so.
//
self.register_introspection_collection(id, *typ, write)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to pull this call up here? It's hard to tell what in the method is a delicate dance of doing things in just the right order and what's historical cruft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be alright. register_introspection_collection itself does a dance around first doing the "truncation" and then registering with the worker, but that's described in comments in there.

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the Monotonic writer perspective, not sure about the 0dt and read-only bits though

self.collections.insert(id, collection_state);
new_source_statistic_entries.insert(id);
// This collection of statistics is periodically aggregated into
// `source_statistics`.
new_webhook_statistic_entries.insert(id);
// Register the collection so our manager knows about it.
//
// NOTE: Maybe this shouldn't be in the collection manager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought the same thing in the past, and the current situation is very much for historic reasons, IMO.

edit: because I wrote this comment ... 😂 (only realized this when scrolling down)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

.await
.expect("sender hung up")?;
.unwrap();
let write_handle = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it maybe a bit heavy handed to open a write handle just for this? On the other hand, we only use this method on rare occasions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write handles are cheap as long as you never run a CaA with them! That's subtle, so probably worth a comment. I'll stick one in when I circle back to this.

// aware of read-only mode and will not attempt to write before told
// to do so.
//
self.register_introspection_collection(id, *typ, write)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be alright. register_introspection_collection itself does a dance around first doing the "truncation" and then registering with the worker, but that's described in comments in there.

let res = write_handle
.compare_and_append(
updates,
Antichain::from_elem(lower),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where it differs from the original impl, which uses current_upper as the expected upper and only uses lower to determine a timestamp for the timestampless updates.

As is here, I would expect this to fail when at_least is higher than current_upper, because the compare_and_append would then fail because the expected upper doesn't line up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I'll have to load this up again to feel like I totally understand that, but sounds like a pretty mechanical fix.

Something I don't understand is that previously, the MonotonicAppend call could return a StorageError that the upper didn't match. But why? Isn't the contract just that the thing gets written down at some timestamp that's at least the at_least? Do you know if this was just an artifact of implementation (assigning a timestamp and then afterward treating it like a normal Append) or is that necessary for some reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention was "we pick a timestamp that's guaranteed to increase, but we still do a compare_and_append to ensure that there's no other writer". I also believe that the reasoning around these "monotonic appends" is a bit hand-wave-y, especially when you consider potential concurrent writers, in a use-case isolation future, and just in general. It comes from a world where we didn't have the differential collections and from a world where we didn't think about 0dt/uci and relied a lot on "yeah yeah, envd fencing makes this all correct" (which I never fully trusted... 😅 ).

So it isn't what I would call a blind write, where the semantics really are: write down the thing, no matter what, if there's a concurrent write and the upper doesn't match you can retry.

It's part of the reason I introduced the differential collections, which have a sync-and-retry loop and generally feel much more robust. And now that the "partially truncate" logic also used a proper compare_and_append (I changed that in 1), I feel better about our consistency story. And it might be that all users of monotonic_append now are okay with them being blind writes.

@@ -869,9 +893,6 @@ where
}
}

self.append_shard_mappings(new_collections.into_iter(), 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this call was a no-op before because it would hit this short-circuit:

        let id = match self
            .introspection_ids
            .lock()
            .expect("poisoned")
            .get(&IntrospectionType::ShardMapping)
        {
            Some(id) => *id,
            _ => return,
        };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, this was incorrect. It was a no-op for the first call, but not on the later ones

@danhhz
Copy link
Contributor Author

danhhz commented Aug 23, 2024

Okay @aljoscha this seems to be passing tests now. Modulo a lint failure and the comment I promised, this should be ready for a review!

@danhhz danhhz marked this pull request as ready for review August 23, 2024 23:50
@danhhz danhhz requested a review from a team as a code owner August 23, 2024 23:50
Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is excellent! 🙌

I had one inline concern about initialize_shard_mapping().

let res = write_handle
.compare_and_append(
updates,
Antichain::from_elem(lower),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention was "we pick a timestamp that's guaranteed to increase, but we still do a compare_and_append to ensure that there's no other writer". I also believe that the reasoning around these "monotonic appends" is a bit hand-wave-y, especially when you consider potential concurrent writers, in a use-case isolation future, and just in general. It comes from a world where we didn't have the differential collections and from a world where we didn't think about 0dt/uci and relied a lot on "yeah yeah, envd fencing makes this all correct" (which I never fully trusted... 😅 ).

So it isn't what I would call a blind write, where the semantics really are: write down the thing, no matter what, if there's a concurrent write and the upper doesn't match you can retry.

It's part of the reason I introduced the differential collections, which have a sync-and-retry loop and generally feel much more robust. And now that the "partially truncate" logic also used a proper compare_and_append (I changed that in 1), I feel better about our consistency story. And it might be that all users of monotonic_append now are okay with them being blind writes.

) -> Result<(), StorageError<T>> {
tracing::info!(%id, ?introspection_type, "preparing introspection collection for writes");

match introspection_type {
IntrospectionType::ShardMapping => {
self.initialize_shard_mapping().await;
// Done by the `self.append_shard_mappings` call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still need this? The append_shard_mappings call in create_collections only does it for new collections, and initialize_shard_mapping() made sure we write down the collections that were already known at the time at which we "create" the shard-mappings shard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure how we want this to work. I removed this because they were getting double registered. It seems that before, during initial boot we were hitting the following short-circuit when append_shard_mappings got called as part of create_collections_for_bootstrap. But this PR shifts the order slightly of things in create_collections_for_bootstrap and ends up putting ShardMapping in the introspection_ids map before append_shard_mappings gets called. This resulted in them getting added twice, once from append_shard_mappings and once from initialize_shard_mapping

        let id = match self
            .introspection_ids
            .lock()
            .expect("poisoned")
            .get(&IntrospectionType::ShardMapping)
        {
            Some(id) => *id,
            _ => return,
        };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We hashed this out in a huddle. Decision was to make this short-circuit a panic if that's possible, or add a comment if not.)

@danhhz
Copy link
Contributor Author

danhhz commented Aug 27, 2024

@aljoscha Nightlies seem happy with replacing the append_shard_mapping short-circuit with a panic. Turns out it was already documented to do that, anyway. Wanna stamp?

@aljoscha
Copy link
Contributor

Sorry, should have stamped yesterday after we talked! 🙈

@aljoscha aljoscha self-requested a review August 27, 2024 09:52
It was originally introduced to wrap persist WriteHandles in a task, but
everything that was using it to write data was already in a task. So at
this point, it's just an extra layer of indirection to reason about.
@danhhz
Copy link
Contributor Author

danhhz commented Aug 27, 2024

\o/ TFTR!

@danhhz danhhz enabled auto-merge August 27, 2024 14:44
@danhhz danhhz merged commit 9c3a160 into MaterializeInc:main Aug 27, 2024
81 checks passed
@danhhz danhhz deleted the storage_worker branch August 27, 2024 16:24
@github-actions github-actions bot locked and limited conversation to collaborators Aug 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants