Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-region support #856

Open
ezekg opened this issue Jun 6, 2024 · 9 comments
Open

Add multi-region support #856

ezekg opened this issue Jun 6, 2024 · 9 comments

Comments

@ezekg
Copy link
Member

ezekg commented Jun 6, 2024

For GDPR compliance, it would be useful to side-step lawyers and data processing agreements entirely by allowing customers to store their data in the EU.

We could do this with a primary-primary database setup. Each account would have a region, defined at account creation. And each request would accept a Keygen-Region header, either US or EU, with US being the default.

Like environments and the Keygen-Environment header, the Keygen-Region header would switch regions, i.e. databases, for the current request.

What would be super cool would be to allow data to be intermixed, i.e. an account in the US region having data in both the US and EU regions, for compliance reasons.

See: https://sentry.engineering/blog/3m-dollar-dropdown

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

If we were to always store account-level data in the US, it would be easier to implement multi-region support for a single account (i.e. intermixed data). Otherwise, we potentially need to query for the account in all regions in order to set the current tenant. And we would also need to assert uniqueness across regions, which would be pretty hard. Everything else, though, aside from the account, can be stored in the account's region by default.

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

This would also require servers and workers in each given region, to ensure compliance for data processing, but that can be handled via the Keygen-Region header (again, always US by default) and a location-aware load balancer or router (look at Fastly's offering).

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

To simplify, we could implement multi-region data storage/residency first (i.e. database only), and full multi-region support later (servers, etc.)

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

Need to note that any joins on the account, e.g. the pruning jobs, would need to be rewritten since accounts would only exist in the US region.

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

If we were to intermix, regions would need to be implemented similarly to environments, where any record's belong to associations must also be in the same region. Otherwise we get problems where e.g. machine counts per-license are inaccurate.

Edit: actually, this isn't the case since the parent wouldn't exist since it's in a separate database.

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

What if we modeled this as Silos? Where each "regionable" (or "siloable") model belongs to a Silo, and a Silo belongs to a Region (either itself modeled or just an opaque string like backend).

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

The per-request region switching would make enforcing account-level limits challenging, since resources could span multiple regions i.e. databases. E.g. an account's hard limit on ALUs for the Dev 0 tier would be challenging to enforce. But we could always make it so multi-region support was only available on Std or Ent tiers to work around that.

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

Workers would need to be region-aware. For example, a license expiration worker would need to look at licenses across all regions, and the worker that waits on an artifact's upload would need to know which region the artifact was created in.

@ezekg
Copy link
Member Author

ezekg commented Jun 6, 2024

If we only implement multi-region data storage, we also introduce significant latency from US servers to EU databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant