Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos DB state store - does not support multi-master #701

Open
sebader opened this issue Feb 18, 2021 · 9 comments
Open

Cosmos DB state store - does not support multi-master #701

sebader opened this issue Feb 18, 2021 · 9 comments

Comments

@sebader
Copy link

sebader commented Feb 18, 2021

When using a multi-master write Azure Cosmos DB, does the state store component take this into account when I'm running Dapr in multiple geographical locations and redirects requests to the closest Cosmos DB region - instead of always going to the primary region?

@sebader
Copy link
Author

sebader commented Feb 18, 2021

ok, after digging into this myself a bit, unfortunately I have to say: No, it doesn't support it at all.

Let me explain on the following setup and the implications:

I have 3 regional deployments of my app. Lets say an AKS cluster in EastUS, WestEurope and EastAsia. I have a global load balancer (Azure Front Door or Traffic Manager) in front that sends client requests to the clusters based on client region and can transparently fail over if need be.
Cosmos DB is configured with multi-master write in all these regions. Dapr state store is configured the same in all three regions.

This all works nicely when you test it. The data in the state store is available, no matter which region a request hits.

However: When you look the Cosmos DB metrics, you will see that only the primary Cosmos DB region gets all requests. Primary region in that case is the first region in the list of replication locations.

This has a couple of severe implications:

  • Increased request latency. Instead of using the database endpoint which is available in, let's say EastAsia, the requests to Cosmos goes all the way to EastUS.
  • You obviously pay for the cross-region traffic
  • No immediate failover. If Cosmos DB in EastUS were to go down, you rely on Microsoft to disable that endpoint until a failover happens. Then requests should all go to the next region in line (WestEurope).
  • Increased RU (request unit requirement). Let's say overall you need 100,000 RUs in Cosmos to handle your requests around the world. If you can distribute those requests evenly to each region, you only need to provision about 33k RUs in Cosmos. Each region on its own has now 33k RU. (maybe you want to overprovision a bit to cater for one region failing, etc.) If, however, all your requests only go to one region, you need to provision 100k RUs in Cosmos. Because of the replication, those 100k are also available in every region - but you don't use them at all. So you actually need to pay for 300k RUs all the time.

How to work around this? Actually, I don't know :(
For other languages the Azure SDKs do offer ways to support multi-master (https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-multi-master?tabs=api-async). The best solution is the one in the .NET SDK where you tell your app itself in which region it is running and the SDK figures out which is the closest cosmos db endpoint (parameter ApplicationRegion). For other languages you can at least specify a list of preferred locations.
However, since the Azure SDK for Go does not support to query Cosmos DB in the first place and the Dapr component thus uses this 3rd party SDK (which hasn't been updated in a very long time), it might be a bigger issue how to solve this.

@sebader sebader changed the title Cosmos DB state store - does it properly support multi-master? Cosmos DB state store - does not support multi-master Feb 18, 2021
@KaiWalter
Copy link

KaiWalter commented Feb 26, 2021

@sebader - please checkout our configuration
https://dev.to/kaiwalter/using-azure-private-links-and-private-dns-zones-with-globally-distributed-resources-4ce3

Entries in private DNS zones in each region point to the next / regional Cosmos DB private endpoint, so that the cluster / Dapr sidecar always writes "locally".

Does this make sense to you?

@sebader
Copy link
Author

sebader commented Feb 26, 2021

thanks @KaiWalter ! This sure does look like an interesting workaround. But in the end it really is only that. You still don't get any real failover. If Cosmos has an issue in a region, you cannot fail over and basically you have to shut off your entire region (AKS etc.).

Plus, you could probably already achieve the same without using Private Link: In each region instead of using the default cosmos db connection string in your Dapr binding, you modify it to include the region (mycosmos-westeurope.documents.azure.com...). But again, you dont get fail over.

@KaiWalter
Copy link

@sebader for us using Private Link and private DNS zones is the primary notion of shielding our environment (attack surface reduction). So it is not intended as a workaround with regards to the Cosmos endpoint routing per se - just a side effect.
For us, if one of the main resources like AKS, SQL, CosmosDb, ... goes down in a region we would shift the whole workload to another region anyway. This is why we do not invest in fail over on a resource level.
But again, our use case of multi master maybe to special here in the context of this issue.

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale label Jul 28, 2021
@dapr-bot
Copy link
Collaborator

dapr-bot commented Aug 4, 2021

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@dapr-bot dapr-bot closed this as completed Aug 4, 2021
@harvendra2022
Copy link

harvendra2022 commented Sep 17, 2024

@sebader - Dapr does not support for multi-master read-write configuration/component with Cosmos DB, nor does it provide automatic failover capabilities similar to available in the Cosmos DB SDK. It would be highly beneficial if Dapr could extend its functionality to include support for multi-master read-write operations in Cosmos DB. Therefore, it would be greatly appreciated if this issue could be revisited and re-opened for further consideration.

@yaron2 yaron2 reopened this Sep 17, 2024
@yaron2
Copy link
Member

yaron2 commented Sep 17, 2024

Done @harvendra2022

@github-actions github-actions bot removed the stale label Sep 17, 2024
@harvendra2022
Copy link

Thank you, @yaron2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants