Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top level nameserver failures should result in retries in --iterative or --retries #93

Open
paul-pearce opened this issue Mar 29, 2017 · 7 comments · May be fixed by #451
Open

Top level nameserver failures should result in retries in --iterative or --retries #93

paul-pearce opened this issue Mar 29, 2017 · 7 comments · May be fixed by #451
Labels
Milestone

Comments

@paul-pearce
Copy link
Contributor

Right now if a root server times out in --iterative mode the query fails without trying other roots. This is because the root servers were bolted onto factory.RandomNameServer. This behavior should change, but will require a fairly large restructuring of how we handle name servers. However we fix it, we should also have --retries > 1 try other nameservers (if they exist).

@zakird
Copy link
Member

zakird commented Apr 6, 2017

Why should this be different than the normal number of retries?

@paul-pearce
Copy link
Contributor Author

This isn't an issue with the number of retries. It's issue that we do not rotate through the roots.

e.g., for iterative, we first randomly select a . root. If that fails, the entire process fails. Conversely, if our . query succeeds and we receive the .com authoritative, and the first .com authoritative fails, we will continue to retry different .com authoritative until timeout.

--retries has no impact on this, as it will try the same . root over and over.

@zakird
Copy link
Member

zakird commented Apr 6, 2017

Out of curiosity, why do some root resolvers stop responding?

@paul-pearce
Copy link
Contributor Author

paul-pearce commented Apr 6, 2017

Great question. I don't know, but I observed it. I encountered this during one of my test runs when working on the recursion branch. One of the runs had a failure rate about 7% higher than expected. Upon investigation, I discovered that one of the roots was timing out. I manually poked it and it was, indeed, not responding to that measurement machine. It may have been a rate-limiting reaction, but I doubt it. The failures were immediate during that run, and I've not observed it before or since.

@dadrian dadrian changed the title Top level nameserver failures should result in retires in --iterative or --retires Top level nameserver failures should result in retries in --iterative or --retries May 3, 2017
@zakird zakird added this to the Version 2.1 milestone Jul 19, 2024
@phillip-stephens
Copy link
Contributor

It looks like the current code has a similar behavior to the old code in this regard. --retries simply retries connecting to the same nameserver, I supposed assuming there was a transitory network issue in reaching that nameserver.

@zakird and @paul-pearce, do you think we should make this change for all levels, not just the root nameservers?
Like if a.gtld-servers.net fails and we have retries left, should we choose another .com nameserver at random?

Additionally, I can imagine --retries being:

  • per-NS connection (as it is now)
  • per layer (--retries=3 means we can attempt to connect to 3 .com NS's before giving up)
  • per domain (--retries=3 means we can re-attempt 3 times during a domain's entire iterative lookup)

I don't have strong feelings, but I think per domain is the most easily understandable as a user. LMK your thoughts.

@zakird
Copy link
Member

zakird commented Sep 11, 2024 via email

@phillip-stephens
Copy link
Contributor

Yeah I agree, definitely easiest for the user to understand!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants