-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Availability Zone Standard #640
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first round of comments, I'll need to get back to this later.
Notice there are still some spelling mistakes, which I didn't have the time to address one by one just yet.
Thanks for all the effort put into this!
|
||
Within each Availability Zone: | ||
|
||
- there MUST be redundancy in power supply, as in line into the deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho we need to more clearly define what "redundancy" means here, that is, redundancy up two what level? redundancy for every electrica component in the AZ would possibly entail:
- redundant PDUs and PSUs for each server
- redundant fuses and electric circuits for each PDUs to the redundant online UPS system
- possibly connecting each redundant UPS (e.g. battery backed) to two independent local power generators for redundant fallback power generation (e.g. diesel generator)
- and finally external redundant energy providers over redundant external power connections.
also we may want to specify the level of redundancy (2, 3..).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important topic. But I wonder, if this would fit in a standard about Availability Zones in that kind of detail.
I did not wanted to discuss all possible physical features, that need to be redundant. I think it would be better to refer to a document, doing all this (at least within this standard). Maybe something from the BSI like this?
Or do you think that it should be stated here in a detail like:
- there MUST be at least two redundant power sources (power line or generator)
- each PDU SHOULD have at least one redundant twin
- each PDU MUST have at least two redundant electric circuits to the redundant power sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should in the eyes of the CSP (also giving them certain interpretation freedom) while expecting that user is able to have certain level of high-er availability.
Co-authored-by: Sven <[email protected]> Signed-off-by: josephineSei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to put in all information I got from CSPs, while keeping the focus on Availability Zones.
This keeps much of the physical redundancy part out, but if this should also be part of it, we need to discuss to what extent or if it may be better to refer to some document outside of this standard.
|
||
Within each Availability Zone: | ||
|
||
- there MUST be redundancy in power supply, as in line into the deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important topic. But I wonder, if this would fit in a standard about Availability Zones in that kind of detail.
I did not wanted to discuss all possible physical features, that need to be redundant. I think it would be better to refer to a document, doing all this (at least within this standard). Maybe something from the BSI like this?
Or do you think that it should be stated here in a detail like:
- there MUST be at least two redundant power sources (power line or generator)
- each PDU SHOULD have at least one redundant twin
- each PDU MUST have at least two redundant electric circuits to the redundant power sources
In todays IaaS call, we discussed a few open questions: Network AZIn the standard I discussed, that it is possible to have Network AZ, but this has downsides for users. Thus i did not make any recommendations. We discussed, whether we even want to discourage CSPs to use it ("SHOULD NOT"):
Cross-Attach AZQuestion was, whether we want to encourgage / allow / discourage or disallow this?
Overall
|
I send a mail to the ML asking for feedback on the network AZ topic. |
Single network AZ is not a problem for us. Neutron's HA capabilities are strong enough and our networks are small enough that we wouldn't gain anything from separate AZs. |
Signed-off-by: josephineSei <[email protected]>
I read through the standard after my vacation and looked through the IaaS call protocols, that happened in the mean time. I think we still need feedback from CSPs, so I wrote a Mail to the scs ml. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments and suggestions mostly revolving around spelling and phrasing.
I did notice there is a mix of capitalization for some terms: often "Storage" and "Compute" are capitalized (not everywhere 100% though) whereas "network" in the network AZ section is not while it is in others. I think this could also be aligned a bit better over the whole document.
There might only be a loss of a few packages within the los network ressources. | ||
|
||
With having Compute and Storage in a good state (e.g. through having fire zones with a compute AZ each and storage being replicated over the fire zones) it would not have downsides to not have Availability Zones for the network service. | ||
It might even be the opposite: Having resources running in certain Availability Zones might permit them from being scheduled in other AZs[^3]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permit [...] from
Did you mean "prevent [...] from" here?
To be honest, I can't really tell as I haven't fully understood why there are no downsides from omitting AZs in network from this whole paragraph.
Maybe one or two details could be added to explain the reasoning behind this general statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more lines to this paragraph to explain this better. and yes it was "prevent".
Co-authored-by: Markus Hentsch <[email protected]> Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I am fine but agree with one comment on rephrasing or dropping one statement
|
||
Within each Availability Zone: | ||
|
||
- there MUST be redundancy in power supply, as in line into the deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should in the eyes of the CSP (also giving them certain interpretation freedom) while expecting that user is able to have certain level of high-er availability.
FTR, plusserver's definition on AZ https://docs.plusserver.com/en/general/plusserver-region-az/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the overall definition of the AZ´s but i think that regulation on e.G. how many PDU`s a CSP has depends heavily on their design.
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
@artificial-intelligence and @markus-hentsch we've got feedback from CSPs and I added a note for manual testing. Could you check, if all your comments are addressed now? |
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
|
||
## Physical Audits | ||
|
||
In cases where it is reasonable to mistrust the provided documentation, a physical audit by a natural person - called auditor - send by the OSBA (?) should be performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garloff When we want to have someone auditing deployments in special cases we need to define, who will name such a person. Will that be the OSBA?
@artificial-intelligence and @markus-hentsch we've got feedback from CSPs and I added a note for manual testing. Could you check, if all your comments are addressed now? |
Co-authored-by: Markus Hentsch <[email protected]> Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
…lability-Zones-Standard.md Signed-off-by: josephineSei <[email protected]>
…119-w1-Availability-Zones-Standard.md Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Signed-off-by: josephineSei <[email protected]>
closes #539