-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collation fetching fairness #4880
base: master
Are you sure you want to change the base?
Conversation
polkadot/node/network/collator-protocol/src/validator_side/collation.rs
Outdated
Show resolved
Hide resolved
polkadot/node/network/collator-protocol/src/validator_side/collation.rs
Outdated
Show resolved
Hide resolved
c7f24aa
to
0f28aa8
Compare
@@ -266,9 +264,6 @@ impl PeerData { | |||
let candidates = | |||
state.advertisements.entry(on_relay_parent).or_default(); | |||
|
|||
if candidates.len() > max_candidate_depth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error leads to reporting the peer with COST_UNEXPECTED_MESSAGE
. I think we shold relax it to just ignoring the advertisement.
Pros:
- with the new logic submitting more elements than scheduled is not such a major offence
- old collators won't get punished for not respecting the claim queue
Cons:
- we don't punish spammy collators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the new logic submitting more elements than scheduled is not such a major offence
how come?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't pick the best words here but my point was since we guarantee fair share of core time for each parachain we can close our eyes for collators pushing too much advertisements as they don't cause any harm.
Furthermore at the moment we haven't got a proper collator using the claim queue to decide what advertisements to send (right?) so when this change is live a lot of collators will suffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should certainly not keep inserting blindly. You are right that it is less of an issue since we are now guaranteeing fairness, so we don't have to be super strict here. In particular given how ineffective reputation changes are in practice, it might make sense to set the limit to something we are 100% no honest node will ever reach, but then make it an immediate disconnect (max punishment).
Is the concern about old collators valid? Why would they be spammy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should still have some limit here. otherwise we record unlimited advertisements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that without proper spam protection a malicious collator could bombard the validator with advertisements. I've restored the limit but instead of max_candidate_depth
I've put the length of the current assignments which is equal to the claim queue of the assigned core.
polkadot/node/network/collator-protocol/src/validator_side/collation.rs
Outdated
Show resolved
Hide resolved
…he weird function params
…therwise the chain can't progress in time
…tic-scaling-doesnt-break-parachains.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, nice work!
Would be good to do some manual tests with a hacked spammy collator to verify that the other para is not DOSed.
polkadot/node/network/collator-protocol/src/validator_side/mod.rs
Outdated
Show resolved
Hide resolved
polkadot/node/network/collator-protocol/src/validator_side/mod.rs
Outdated
Show resolved
Hide resolved
polkadot/node/network/collator-protocol/src/validator_side/mod.rs
Outdated
Show resolved
Hide resolved
@@ -8,6 +8,7 @@ bootnode = true | |||
[relaychain.genesis.runtimeGenesis.patch.configuration.config.scheduler_params] | |||
max_validators_per_core = 2 | |||
num_cores = 2 | |||
lookahead = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to bump up the lookahead value? is it so that it matches the allowed_ancestry_len (which is usually 2, plus the most recent relay parent)
what happens if we don't do this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Advertisements got ignored because they were over the limit and the test failed.
Do you think it should also work with 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I'm trying to understand whether this change will break parachains on polkadot unless we also bump lookahead there
polkadot/node/network/collator-protocol/src/validator_side/tests/prospective_parachains.rs
Outdated
Show resolved
Hide resolved
@@ -327,6 +334,137 @@ async fn assert_persisted_validation_data( | |||
} | |||
} | |||
|
|||
// Combines dummy candidate creation, advertisement and fetching in a single call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any reason to keep the prospective_parachains tests module anymore (as you removed the code for dealing with the old runtime. and for good reason). but would be good to do this in a follow-up PR. so that it's easier to review for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but as you pointed out reviewing it would have been harder so I'll do it in a separate PR.
polkadot/node/network/collator-protocol/src/validator_side/tests/prospective_parachains.rs
Show resolved
Hide resolved
polkadot/node/network/collator-protocol/src/validator_side/collation.rs
Outdated
Show resolved
Hide resolved
@@ -266,9 +264,6 @@ impl PeerData { | |||
let candidates = | |||
state.advertisements.entry(on_relay_parent).or_default(); | |||
|
|||
if candidates.len() > max_candidate_depth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should still have some limit here. otherwise we record unlimited advertisements
…ool indicating the claim queue entry fulfillment is no longer needed
Related to #1797
When fetching collations in collator protocol/validator side we need to ensure that each parachain has got a fair core time share depending on its assignments in the claim queue. This means that the number of collations fetched per parachain should ideally be equal to (but definitely not bigger than) the number of claims for the particular parachain in the claim queue.
The current implementation doesn't guarantee such fairness. For each relay parent there is a
waiting_queue
(PerRelayParent -> Collations -> waiting_queue) which holds any unfetched collations advertised to the validator. The collations are fetched on first in first out principle which means that if two parachains share a core and one of the parachains is more aggresive it might starve the second parachain. How? At each relay parent up tomax_candidate_depth
candidates are accepted (enforced infn is_seconded_limit_reached
) so if one of the parachains is quick enough to fill in the queue with its advertisements the validator will never fetch anything from the rest of the parachains despite they are scheduled. This doesn't mean that the aggressive parachain will occupy all the core time (this is guaranteed by the runtime) but it will deny the rest of the parachains sharing the same core to have collations backed.The solution I am proposing extends the checks inThe solution I am proposing is to limit fetches and advertisements based on the state of the claim queue. At each relay parent the claim queue for the core assigned to the validator is fetched. For each parachain a fetch limit is calculated (equal to the number of entries in the claim queue). Advertisements are not fetched for a parachain which has exceeded its claims in the claim queue. This solves the problem with aggressive parachains advertising too much collations.is_seconded_limit_reached
with an additional check.The second part is in collation fetching logic. The collator will keep track on which collations it has fetched so far. When a new collation needs to be fetched instead of popping the first entry from the
waiting_queue
the validator examines the claim queue and looks for the earliest claim which hasn't got a corresponding fetch. This way the collator will always try to prioritise the most urgent entries.