Top level nameserver failures should result in retries in --iterative or --retries #93

paul-pearce · 2017-03-29T01:20:10Z

Right now if a root server times out in --iterative mode the query fails without trying other roots. This is because the root servers were bolted onto factory.RandomNameServer. This behavior should change, but will require a fairly large restructuring of how we handle name servers. However we fix it, we should also have --retries > 1 try other nameservers (if they exist).

zakird · 2017-04-06T00:32:41Z

Why should this be different than the normal number of retries?

paul-pearce · 2017-04-06T02:48:26Z

This isn't an issue with the number of retries. It's issue that we do not rotate through the roots.

e.g., for iterative, we first randomly select a . root. If that fails, the entire process fails. Conversely, if our . query succeeds and we receive the .com authoritative, and the first .com authoritative fails, we will continue to retry different .com authoritative until timeout.

--retries has no impact on this, as it will try the same . root over and over.

zakird · 2017-04-06T13:17:58Z

Out of curiosity, why do some root resolvers stop responding?

paul-pearce · 2017-04-06T14:01:50Z

Great question. I don't know, but I observed it. I encountered this during one of my test runs when working on the recursion branch. One of the runs had a failure rate about 7% higher than expected. Upon investigation, I discovered that one of the roots was timing out. I manually poked it and it was, indeed, not responding to that measurement machine. It may have been a rate-limiting reaction, but I doubt it. The failures were immediate during that run, and I've not observed it before or since.

phillip-stephens · 2024-09-11T20:59:20Z

It looks like the current code has a similar behavior to the old code in this regard. --retries simply retries connecting to the same nameserver, I supposed assuming there was a transitory network issue in reaching that nameserver.

@zakird and @paul-pearce, do you think we should make this change for all levels, not just the root nameservers?
Like if a.gtld-servers.net fails and we have retries left, should we choose another .com nameserver at random?

Additionally, I can imagine --retries being:

per-NS connection (as it is now)
per layer (--retries=3 means we can attempt to connect to 3 .com NS's before giving up)
per domain (--retries=3 means we can re-attempt 3 times during a domain's entire iterative lookup)

I don't have strong feelings, but I think per domain is the most easily understandable as a user. LMK your thoughts.

zakird · 2024-09-11T21:11:06Z

Yeah I definitely thinking trying others at every layer is the right call. I think retries could be a max number total for a given thing that you are trying to look up. Seems easiest to understand and consistent?

…

On Wed, Sep 11, 2024 at 4:59 PM Phillip Stephens ***@***.***> wrote: It looks like the current code has a similar behavior to the old code in this regard. --retries simply retries connecting to the same nameserver, I supposed assuming there was a transitory network issue in reaching that nameserver. @zakird <https://github.com/zakird> and @paul-pearce <https://github.com/paul-pearce>, do you think we should make this change for all levels, not just the root nameservers? Like if a.gtld-servers.net fails and we have retries left, should we choose another .com nameserver at random? Additionally, I can imagine --retries being: - per-NS connection (as it is now) - per layer (--retries=3 means we can attempt to connect to 3 .com NS's before giving up) - per domain (--retries=3 means we can re-attempt 3 times during a domain's entire iterative lookup) I don't have strong feelings, but I think per domain is the most easily understandable as a user. LMK your thoughts. — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABREUAR3SYEHT6IZTKXH73ZWCVL5AVCNFSM4DFRRNQ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMZUGQ3DSMJSGMZQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

phillip-stephens · 2024-09-12T15:18:28Z

Yeah I agree, definitely easiest for the user to understand!

paul-pearce added the bug label Mar 29, 2017

dadrian changed the title ~~Top level nameserver failures should result in retires in --iterative or --retires~~ Top level nameserver failures should result in retries in --iterative or --retries May 3, 2017

zakird assigned spencerdrak Jan 23, 2022

zakird unassigned spencerdrak Jun 1, 2024

zakird added this to the Version 2.1 milestone Jul 19, 2024

phillip-stephens linked a pull request Sep 19, 2024 that will close this issue

Make --retries global to a name and try with other name servers in a given layer #451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top level nameserver failures should result in retries in --iterative or --retries #93

Top level nameserver failures should result in retries in --iterative or --retries #93

paul-pearce commented Mar 29, 2017

zakird commented Apr 6, 2017

paul-pearce commented Apr 6, 2017

zakird commented Apr 6, 2017

paul-pearce commented Apr 6, 2017 •

edited

Loading

phillip-stephens commented Sep 11, 2024

zakird commented Sep 11, 2024 via email

phillip-stephens commented Sep 12, 2024

Top level nameserver failures should result in retries in --iterative or --retries #93

Top level nameserver failures should result in retries in --iterative or --retries #93

Comments

paul-pearce commented Mar 29, 2017

zakird commented Apr 6, 2017

paul-pearce commented Apr 6, 2017

zakird commented Apr 6, 2017

paul-pearce commented Apr 6, 2017 • edited Loading

phillip-stephens commented Sep 11, 2024

zakird commented Sep 11, 2024 via email

phillip-stephens commented Sep 12, 2024

paul-pearce commented Apr 6, 2017 •

edited

Loading