SSV - Monitoring

/metrics end-point is exposing metrics from ssv node to prometheus.

Prometheus should also hit /health end-point in order to collect the health check metrics.
Even if prometheus is not configured, the end-point can simply be polled by a simple HTTP client (it doesn't contain metrics)

See the configuration of a local prometheus service.

Metrics

MetricsAPIPort is used to enable prometheus metrics collection:

Example:

MetricsAPIPort: 15000

Or as env variable:

METRICS_API_PORT=15000

Collected Metrics:

go_* metrics by prometheus
ssv:node_status Health check status of operator node
ssv:eth1:node_status Health check status of eth1 node
ssv:beacon:node_status Health check status of beacon node
ssv:network:connected_peers{pubKey} Count connected peers for a validator
ssv:network:ibft_decided_messages_outbound{topic} Count IBFT decided messages outbound
ssv:network:ibft_messages_outbound{topic} Count IBFT messages outbound
ssv:network:net_messages_inbound{topic} Count incoming network messages
ssv:validator:ibft_highest_decided{lambda} The highest decided sequence number
ssv:validator:ibft_round{lambda} IBFTs round
ssv:validator:ibft_stage{lambda} IBFTs stage
ssv:validator:ibft_current_slot{pubKey} Current running slot
ssv:validator:running_ibfts_count{pubKey} Count running IBFTs by validator pub key
ssv:validator:running_ibfts_count_all Count all running IBFTs

Grafana

In order to setup a grafana dashboard do the following:

Enable metrics (MetricsAPIPort)
Setup Prometheus as mentioned in the beginning of this document and add as data source
- Job name assumed to be 'ssv'
Import dashboards to Grafana:
- SSV Operator Node dashboard
- SSV Validator dashboard
Align dashboard variables:
- instance - container name, used in 'instance' field for metrics coming from prometheus.
  In the given dashboard, instances names are: ssv-node-v2-<i>, make sure to change according to your setup
- validator_dashboard_id - exist only in operator dashboard, points to validator dashboard

Note: In order to show Process Health panels, the following K8S metrics should be exposed:

kubelet_volume_stats_used_bytes
container_cpu_usage_seconds_total
container_memory_working_set_bytes

Health Check

Health check route is available on GET /health.
In case the node is healthy it returns an HTTP Code 200 with empty response:

$ curl http://localhost:15000/health

If the node is not healthy, the corresponding errors will be returned with HTTP Code 500:

$ curl http://localhost:15000/health
{"errors": ["could not sync eth1 events"]}

Profiling

Profiling can be enabled via config:

EnableProfile: true

All the default pprof routes are available via HTTP:

$ curl http://localhost:15000/debug/pprof/goroutine?minutes\=20 --output goroutines.tar.gz

Open with Go CLI:

$ go tool pprof goroutines.tar.gz

Or with Web UI:

$ go tool pprof -web goroutines.tar.gz

Another option is to visualize results in web UI directly:

$ go tool pprof -web http://localhost:15001/debug/pprof/heap?minutes=5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SSV - Monitoring

Metrics

Collected Metrics:

Grafana

Health Check

Profiling

Files

README.md

Latest commit

History

README.md

File metadata and controls

SSV - Monitoring

Metrics

Collected Metrics:

Grafana

Health Check

Profiling