Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposer call time to historical dataset #560

Open
MaxResnick opened this issue Dec 12, 2023 · 7 comments
Open

Add proposer call time to historical dataset #560

MaxResnick opened this issue Dec 12, 2023 · 7 comments

Comments

@MaxResnick
Copy link

Is your feature request related to a problem? Please describe.
There has been a lot of interest around timing games recently and one of the core metrics for this is when the proposer calls the auction. Right now the best approximate we have is to match the winning bid to its submission time but this is a lower bound (see https://github.com/dataalways/mevboost-data/tree/main).

Describe the solution you'd like
Add the time that the proposer requested the block header during the slot to the historical data API

Additional context
image

https://ethresear.ch/t/timing-games-implications-and-possible-mitigations/17612/2

https://arxiv.org/abs/2305.09032

@ralexstokes
Copy link
Collaborator

this does seem like a nice feature to add

possible concerns:

  1. complexity/overhead for relays: don't see any substantive increase here as it just means a few extra bytes per record and this data is already being logged so simple enough to just route to the historical data store
  2. privacy leak for validators: knowing what time a validator calls a relay for the winning bid (along w/ their identity that we already have) could enhance efforts to geolocate validators -- given the degree of freedom proposers have over when they make this call it doesn't seem to leak any material information beyond the status quo

anyone see anything else?

@dataalways
Copy link

It could also provide valuable empirical insights into relay processing speeds. Right now there's some ghost data that makes its way into a lot of explorers, i.e.: block 18,770,921 @ Dec-12-2023 03:12:47 PM with timestamp 1702393967.

If we check the bloXroute max profit raw bids for the corresponding winning block hash we see a bid arrival time of 1702393970.577, but if we cross reference vs zeromev block arrival times we see that it was first detected by their nodes at 15:12:50.519 which corresponds to a timestamp of 1702393970.519.

If the bid timestamp is after other nodes were already seeing the block then the routing doesn't make any sense to me.

I would suggest adding both the proposer call time and the bid timestamp to the proposer_payload_delivered endpoint.

@austonst
Copy link
Contributor

austonst commented Dec 12, 2023

Maybe I'm missing something, but I don't believe this data is being logged. It seems a nontrivial effort to do so without changes to the builder API specs. getHeader requests are unsigned: it's difficult to pick out the proposer's request from the other thousands of requests in a slot.

We can narrow down the range by seeing which header they eventually call getPayload for, as is already commonly done. A sophisticated relay could also retroactively attempt to match the IP address (or other identifying info) of the sender of a successful getPayload call to one of the getHeader requests received that slot. It's not a foolproof method, and for some relays will produce edge cases when the header they deliver to the proposer is different than the one they eventually deliver via getPayload (if a different relay provided a more valuable bid).

Vanilla mev-boost-relay is currently not doing this kind of matching, and storing ~10k getHeader request IPs just to be able to match them later starts to sound like overhead.

@ralexstokes
Copy link
Collaborator

mev-boost currently includes a request ID in the header to correlate getHeader and getPayload calls, e.g. https://github.com/flashbots/mev-boost/blob/bdabd0e6181990f9c3e0fef9f750b40f71d50c6c/server/service.go#L566C31-L566C31

relays can go from successful call to payload -id-> successful call to header this way and I'd be surprised if timestamps are not currently logged this way (although I haven't verified); my original point was that if we already have this infra in place, it would not be much of an ask for relays to just write the timestamps to their data store

an asynchronous process can run to correlate the relevant data if processing in-line is a bottleneck

@austonst
Copy link
Contributor

That does make things easier. mev-boost-relay doesn't currently look in the header for the UUID but having it available makes this technically simpler to implement.

I do think in general exposing this data would be beneficial. And if a relay relies on the client-provided ID (rather than e.g. tracking IPs), it gives validators a way to opt in or out.

@alextes
Copy link
Contributor

alextes commented Dec 14, 2023

We're a bit over-loaded as a team but wanted to quickly add the ultra sound relay is becoming aware of this data. Although we're unlikely to share raw data we do want to help ETH stakers / researchers have a clear idea of which strategies are in play, and what ETH stakers effectively support by putting their ETH somewhere.

Toni is also pretty deep into this data, you may be able to tell a lot already from what he has. See: https://twitter.com/nero_eth/status/1733016369715675358

I'm also curious, what question are we trying to answer exactly? When the proposer calls the auction obviously, but then say you have a perfect list of every proposer and when they call. What next? Answering may help find practical ways to reach the end-goal.

@nerolation
Copy link

Thanks @MaxResnick for putting up this feature request!

Currently, some folks interpret some charts on mevboost.pics wrong (e.g. https://x.com/P2Pvalidator/status/1734173667704332429?s=20), by thinking that the bids_received endpoint is the timestamp when the proposer "initiated the proposing process" (c.f. calling getHeader), which is inaccurate/wrong.

I think some possible questions that the described feature would allow us to answer is:

  1. Independent of how long the signing takes, when did the proposer actively started the "proposal process" by calling getHeader?
  2. What is the latency between the getHeader call and the block propagation/how long does the signing take?
  3. How benefical/bad is it for validators to have long/short distances to the relay (thinking of long distance allows to grap later bids but puts more risk on the proposer)

I guess, for all of those questions we'd need to have the getHeader timestamps, or at least some better approximation than the bid_received timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants