Skip to content

Commit

Permalink
document shard-based routing
Browse files Browse the repository at this point in the history
  • Loading branch information
pirtleshell committed Mar 4, 2024
1 parent e737fa0 commit fb04d8b
Show file tree
Hide file tree
Showing 3 changed files with 114 additions and 1 deletion.
111 changes: 110 additions & 1 deletion architecture/PROXY_ROUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Now suppose you want multiple backends for the same host.

The proxy service supports height-based routing to direct requests that only require the most recent
block to a different cluster.
This support is handled via the [`PruningOrDefaultProxies` implementation](../service/shard.go#L16).
This support is handled via the [`PruningOrDefaultProxies` implementation](../service/shard.go#L17).

This is configured via the `PROXY_HEIGHT_BASED_ROUTING_ENABLED` and `PROXY_PRUNING_BACKEND_HOST_URL_MAP`
environment variables.
Expand Down Expand Up @@ -136,6 +136,114 @@ in `PROXY_BACKEND_HOST_URL_MAP`.

Any request made to a host not in the `PROXY_BACKEND_HOST_URL_MAP` map responds 502 Bad Gateway.

## Sharding

Taking the example one step further, support the backend consists of data shards each containing a set of blocks. Although sharded routing can be configured without pruning vs default cluster routing, this example assumes it is.

The above example supports fielding requests to a particular endpoint with pruning & archive clusters:
* request for tip-of-chain -> pruning cluster
* everything else -> archive cluster ("default")

The proxy service supports breaking down "everything else" further by defining "shards": clusters that contain a fixed set of block heights.

This is configured via the `PROXY_SHARDED_ROUTING_ENABLED` and `PROXY_SHARD_BACKEND_HOST_URL_MAP` environment variables:
* `PROXY_SHARDED_ROUTING_ENABLED` - flag to toggle this functionality
* `PROXY_SHARD_BACKEND_HOST_URL_MAP` - encodes the shard cluster urls and block ranges for a given endpoint.
This support is handled via the [`ShardProxies` implementation](../service/shard.go#L103).


The map is encoded as follows:
```
PROXY_SHARDED_ROUTING_ENABLED=true
PROXY_SHARD_BACKEND_HOST_URL_MAP=HOST_A>ENDBLOCK_A1|ROUTE_A1|ENDBLOCK_A2|ROUTE_A2,HOST_B>ENDBLOCK_B1|ROUTE_B1
```

This defines two shards for `HOST_A` and one shard for `HOST_B`:
* `HOST_A`'s shards:
* blocks 1 to `ENDBLOCK_A1` hosted at `ROUTE_A1`
* blocks `ENDBLOCK_A1`+1 to `ENDBLOCK_A2` hosted at `ROUTE_A2`
* `HOST_B`'s shard:
* blocks 1 to `ENDBLOCK_B1` hosted at `ROUTE_B1`

Shards are inclusive of their end blocks and they must collectively contain all data from block 1 to the end bock of the last shard.

Shards field requests that would route to the "Default" cluster in any of the above configurations:
* requests for `"earliest"` block are routed to the first defined shard
* any request for a specific height that is contained in a shard is routed to that shard.

All other requests continue to route to the default cluster. In this context, the default cluster is referred to as the "active" cluster (see below).

Requests for tx hashes or block hashes are routed to the "active" cluster.

### Shard Routing

When `PROXY_SHARDED_ROUTING_ENABLED` is `true`, "everything else" can be broken down further into clusters that contain fixed ranges of blocks.

As an example, consider a setup that has the following clusters:
* Pruning cluster (`http://kava-pruning:8545`)
* "Active" cluster - blocks 4,000,001 to chain tip (`http://kava-archive:8545`)
* Shard 2 - blocks 2,000,001 to 4,000,000 (`http://kava-shard-4M:8545`)
* Shard 1 - blocks 1 to 2,000,000 (`http://kava-shard-2M:8545`)

The proxy service can be configured to as follows:
```
PROXY_HEIGHT_BASED_ROUTING_ENABLED=true
PROXY_SHARDED_ROUTING_ENABLED=true
PROXY_BACKEND_HOST_URL_MAP=evm.data.kava.io>http://kava-archive:8545
PROXY_PRUNING_BACKEND_HOST_URL_MAP=evm.data.kava.io>http://kava-pruning:8545
PROXY_SHARD_BACKEND_HOST_URL_MAP=evm.data.kava.io>2000000|http://kava-shard-2M:8545|4000000|http://kava-shard-4M:8545
```

This value is parsed into a map that looks like the following:
```
{
"default": {
"evm.data.kava.io" => "http://kava-archive:8545",
},
"pruning": {
"evm.data.kava.io" => "http://kava-pruning:8545",
},
"shards": {
2000000 => "http://kava-shard-2M:8545",
4000000 => "http://kava-shard-4M:8545"
}
}
```

All requests that would route to the "default" cluster in teh "Default vs Pruning Backend Routing" example route as follows:
* requests for specific height between 1 and 2M -> `http://kava-shard-2M:8545`
* this includes requests for `"earliest"`
* requests for specific height between 2M+1 and 4M -> `http://kava-shard-4M:8545`
* requests for a block hash or tx hash -> the active cluster: `http://kava-archive:8545`.

Otherwise, requests are routed as they are in the "Default vs Pruning Backend Routing" example.

![Proxy service configured with shard-based routing](images/proxy_service_sharding.jpg)

### "Active" Cluster

In practice, a full-archive node can be used as the active cluster. However, the data can be slimmed down by accounting for the fact that it doesn't need the application data for blocks contained in the shards.

The optimally-sized active cluster runs on a unique data set that includes:
* At least one recent block - this will be the starting point for the node to begin syncing once spun up. Ideally, this is the last shard's end block + 1.
* A complete blockstore, cometbft state, and tx_index

The blockstore, cometbft state, and tx_index are required for fielding requests for data on unknown heights. These are requests for block hashes and transaction hashes. Because the proxy service can't know which height a particular hash is for (and therefore, to which shard the request should be routed), these complete databases are required to handle requests for the hashes.

The optimally-sized node data can be created from a full-archive node by pruning only the application state for the node. On Kava, this can be accomplished with the `--only-app-state` flag of the shard command:
```
kava shard --start <last-shard-end-block-plus-1> --end -1 --only-appstate-
```

The bulk of data on cosmos-sdk chains like Kava is in the application.db, so pruning the blocks allow for a much smaller cluster footprint than a full archive node.

### Shard Clusters

On Kava, data for shards can be created with the `shard` command of the Kava CLI from any node that contains the desired shard block range:
```
kava shard --home ~/.kava --start <shard-start-block> --end <shard-end-block>
```

## Metrics

When metrics are enabled, the `proxied_request_metrics` table tracks the backend to which requests
Expand All @@ -147,6 +255,7 @@ always `DEFAULT`.
When enabled, the column will have one of the following values:
* `DEFAULT` - the request was routed to the backend defined in `PROXY_BACKEND_HOST_URL_MAP`
* `PRUNING` - the request was routed to the backend defined in `PROXY_PRUNING_BACKEND_HOST_URL_MAP`
* `SHARD` - the request was routed to a shard defined in the `PROXY_SHARD_BACKEND_HOST_URL_MAP`

Additionally, the actual URL to which the request is routed to is tracked in the
`response_backend_route` column.
Binary file added architecture/images/proxy_service_sharding.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions service/shard.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ func shouldRouteToPruning(encodedHeight int64) bool {
return blockTagEncodingsRoutedToLatest[encodedHeight]
}

// ShardProxies handles routing requests for specific heights to backends that contain the height.
// The height is parsed out of requests that would route to the default backend of the underlying `defaultProxies`
// If the height is contained by a backend in the host's IntervalURLMap, it is routed to that url.
// Otherwise, it forwards the request via the wrapped defaultProxies.
type ShardProxies struct {
*logging.ServiceLogger

Expand Down

0 comments on commit fb04d8b

Please sign in to comment.