Skip to main content

Monitoring your Anapaya SCION CA

The Anapaya SCION CA provides multiple ways to monitor the health of the services. You can check the health of the services as a snapshot of the current state, or you can use the telemetry to monitor the services over time. A combination of both is recommended to ensure that the services are running smoothly and to detect any issues early on.

Health Checks

The step-ca-adapter and the step-ca-rotator expose health check endpoints that can be used to asses the current health of the services on the admin API. The default endpoints are:

ServiceEndpointDocs
step-ca-adapterhttp://localhost:41500/api/v1/healthspec
step-ca-rotatorhttp://localhost:41501/api/v1/healthspec
tip

In the API specification, expand the checks section in the health schema object. This shows you all the available health checks that are performed by the service.

Query the health check endpoints, e.g., using curl, or any HTTP client. The response contains a JSON object with the health status of the service. For example, to check the health of the step-ca-adapter, you can run:

curl -X GET http://localhost:41500/api/v1/health
Example: step-ca-adapter health with missing TRC for ISD 1
{
"health": {
"checks": [
{
"check_id": "1004",
"data": {
"data_type": "not_found",
"isd": 1
},
"detail": "failed to load TRC: failed to fetch TRC: [err=sql: no rows in result set; id=ISD1-B0-S0]",
"name": "trc_available",
"status": "failing"
},
{
"check_id": "1002",
"data": {
"reported": "ok"
},
"detail": "step-ca health check succeeded",
"name": "step_ca_available",
"status": "passing"
},
{
"check_id": "1001",
"data": {
"driver": "pg"
},
"detail": "database connection is available",
"name": "database_available",
"status": "passing"
}
],
"status": "failing"
}
}

Telemetry

The step-ca-adapter and step-ca-rotator both expose an endpoint that can be used to retrieve telemetry data from the services. The telemetry data is exported in the form of Prometheus metrics. Metrics are exposed by default on the admin API, but can be configured to be exposed on a different endpoint. Consult the configuration documentation for more information.

The default endpoints are:

ServiceEndpoint
step-ca-adapterhttp://localhost:41500/metrics
step-ca-rotatorhttp://localhost:41501/metrics

To access these metrics, a Prometheus server (or similar) is required to ingest the metrics from each service. How to set this up is outside the scope of this document. Should you require assistance with integrating metrics in your monitoring setup, please Anapaya support.

step-ca-adapter metrics

MetricDescriptionLabelsType
step_ca_adapter_ca_backend_healthyIndicates the step AC backend reports healthy.Nonegauge
step_ca_adapter_db_healthyIndicates the database connection is healthy.Nonegauge
step_ca_adapter_http_request_latency_secondsLatencies of HTTP requests.api, method, status, routehistogram
step_ca_adapter_http_request_totalTotal number of HTTP requests. Status label is the HTTP status code.api, method, status, routecounter
step_ca_adapter_policy_violations_totalTotal number of policy violations detected by the Step CA adapter.violationcounter
step_ca_adapter_renewal_requests_totalTotal number of renewal requests handled by the Step CA adapter.resultcounter
step_ca_adapter_tls_certificate_not_after_timestamp_secondsThe expiration time of the TLS certificate used by the Step CA adapter.server_namegauge

step-ca-rotator metrics

MetricDescriptionLabelsType
step_ca_rotator_db_healthyIndicates the database connection is healthy.Nonegauge
step_ca_rotator_http_request_latency_secondsLatencies of HTTP requests.api, method, status, routehistogram
step_ca_rotator_http_request_totalTotal number of HTTP requests. Status label is the HTTP status code.api, method, status, routecounter
step_ca_rotator_modeCurrent mode of the CA rotator. (follower, leader)isd_as, modegauge
step_ca_rotator_provision_cert_instanceInstance number of the provisioned certificate.isd_asgauge
step_ca_rotator_provision_cert_not_after_timestamp_secondsNotAfter timestamp of the provisioned certificate.isd_asgauge
step_ca_rotator_provision_inconsistent_configurationIndicates whether a provisioning failure has left the configuration in an inconsistent state. This happens when the configuration cannot be reverted after a failed provisioing of the CA certificate.isd_asgauge
step_ca_rotator_provision_key_versionKey cloudkms key version of the provisioned certificate.isd_asgauge
step_ca_rotator_provision_last_run_okIndicates whether the last provision run was successful.isd_asgauge
step_ca_rotator_provision_last_run_timestamp_secondsTimestamp of the last provision run.isd_as, resultgauge
step_ca_rotator_provision_runs_totalTotal number of provision runs.isd_as, resultcounter
step_ca_rotator_rotate_last_run_okGauge whether the last rotate run was successful.isd_asgauge
step_ca_rotator_rotate_last_run_timestamp_secondsTimestamp of the last rotate run.isd_as, resultgauge
step_ca_rotator_rotate_root_cert_not_after_timestamp_secondsNotAfter timestamp of the Root certificate.isd_asgauge
step_ca_rotator_rotate_root_signer_availableIndicates if the root signer is available.isd_asgauge
step_ca_rotator_rotate_runs_totalTotal number of rotate runs.isd_as, resultcounter