Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a compression cost optimization section to the Cosmos DB docs #4151

Merged
merged 4 commits into from
May 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,9 @@ az cosmosdb sql role assignment create \
--role-definition-id "$ROLE_ID"
```

## Optimizing Cosmos DB for bulk operation write performance
## Optimizations

### Optimizing Cosmos DB for bulk operation write performance

If you are building a system that only ever reads data from Cosmos DB via key (`id`), which is the default Dapr behavior when using the state management API or actors, there are ways you can optimize Cosmos DB for improved write speeds. This is done by excluding all paths from indexing. By default, Cosmos DB indexes all fields inside of a document. On systems that are write-heavy and run little-to-no queries on values within a document, this indexing policy slows down the time it takes to write or update a document in Cosmos DB. This is exacerbated in high-volume systems.

Expand Down Expand Up @@ -211,6 +213,18 @@ This optimization comes at the cost of queries against fields inside of document

{{% /alert %}}

### Optimizing Cosmos DB for cost savings

If you intend to use Cosmos DB only as a key-value pair, it may be in your interest to consider converting your state object to JSON and compressing it before persisting it to state, and subsequently decompressing it when reading it out of state. This is because Cosmos DB bills your usage based on the maximum number of RU/s used in a given time period (typically each hour). Furthermore, RU usage is calculated as 1 RU per 1 KB of data you read or write. Compression helps by reducing the size of the data stored in Cosmos DB and subsequently reducing RU usage.

This savings is particularly significant for Dapr actors. While the Dapr State Management API does a base64 encoding of your object before saving, Dapr actor state is saved as raw, formatted JSON. This means multiple lines with indentations for formatting. Compressing can signficantly reduce the size of actor state objects. For example, if you have an actor state object that is 75KB in size when the actor is hydrated, you will use 75 RU/s to read that object out of state. If you then modify the state object and it grows to 100KB, you will use 100 RU/s to write that object to Cosmos DB, totalling 175 RU/s for the I/O operation. Let's say your actors are concurrently handling 1000 requests per second, you will need at least 175,000 RU/s to meet that load. With effective compression, the size reduction can be in the region of 90%, which means you will only need in the region of 17,500 RU/s to meet the load.

{{% alert title="Note" color="primary" %}}

This particular optimization only makes sense if you are saving large objects to state. The performance and memory tradeoff for performing the compression and decompression on either end need to make sense for your use case. Furthermore, once the data is saved to state, it is not human readable, nor is it queryable. You should only adopt this optimization if you are saving large state objects as key-value pairs.

{{% /alert %}}

## Related links

- [Basic schema for a Dapr component]({{< ref component-schema >}})
Expand Down
Loading