Monitoring your Anapaya SCION CA
The Anapaya SCION CA provides multiple ways to monitor the health of the services. You can check the health of the services as a snapshot of the current state, or you can use the telemetry to monitor the services over time. A combination of both is recommended to ensure that the services are running smoothly and to detect any issues early on.
Health Checks
The step-ca-adapter and the step-ca-rotator expose health check endpoints that can be used to asses the current health of the services on the admin API. The default endpoints are:
Service | Endpoint | Docs |
---|---|---|
step-ca-adapter | http://localhost:41500/api/v1/health | spec |
step-ca-rotator | http://localhost:41501/api/v1/health | spec |
In the API specification, expand the checks
section in the health
schema object. This shows you
all the available health checks that are performed by the service.
Query the health check endpoints, e.g., using curl
, or any HTTP client. The response contains a
JSON object with the health status of the service. For example, to check the health of the
step-ca-adapter, you can run:
curl -X GET http://localhost:41500/api/v1/health
{
"health": {
"checks": [
{
"check_id": "1004",
"data": {
"data_type": "not_found",
"isd": 1
},
"detail": "failed to load TRC: failed to fetch TRC: [err=sql: no rows in result set; id=ISD1-B0-S0]",
"name": "trc_available",
"status": "failing"
},
{
"check_id": "1002",
"data": {
"reported": "ok"
},
"detail": "step-ca health check succeeded",
"name": "step_ca_available",
"status": "passing"
},
{
"check_id": "1001",
"data": {
"driver": "pg"
},
"detail": "database connection is available",
"name": "database_available",
"status": "passing"
}
],
"status": "failing"
}
}
Telemetry
The step-ca-adapter and step-ca-rotator both expose an endpoint that can be used to retrieve telemetry data from the services. The telemetry data is exported in the form of Prometheus metrics. Metrics are exposed by default on the admin API, but can be configured to be exposed on a different endpoint. Consult the configuration documentation for more information.
The default endpoints are:
Service | Endpoint |
---|---|
step-ca-adapter | http://localhost:41500/metrics |
step-ca-rotator | http://localhost:41501/metrics |
To access these metrics, a Prometheus server (or similar) is required to ingest the metrics from each service. How to set this up is outside the scope of this document. Should you require assistance with integrating metrics in your monitoring setup, please Anapaya support.
step-ca-adapter metrics
Metric | Description | Labels | Type |
---|---|---|---|
step_ca_adapter_ca_backend_healthy | Indicates the step AC backend reports healthy. | None | gauge |
step_ca_adapter_db_healthy | Indicates the database connection is healthy. | None | gauge |
step_ca_adapter_http_request_latency_seconds | Latencies of HTTP requests. | api , method , status , route | histogram |
step_ca_adapter_http_request_total | Total number of HTTP requests. Status label is the HTTP status code. | api , method , status , route | counter |
step_ca_adapter_policy_violations_total | Total number of policy violations detected by the Step CA adapter. | violation | counter |
step_ca_adapter_renewal_requests_total | Total number of renewal requests handled by the Step CA adapter. | result | counter |
step_ca_adapter_tls_certificate_not_after_timestamp_seconds | The expiration time of the TLS certificate used by the Step CA adapter. | server_name | gauge |
step-ca-rotator metrics
Metric | Description | Labels | Type |
---|---|---|---|
step_ca_rotator_db_healthy | Indicates the database connection is healthy. | None | gauge |
step_ca_rotator_http_request_latency_seconds | Latencies of HTTP requests. | api , method , status , route | histogram |
step_ca_rotator_http_request_total | Total number of HTTP requests. Status label is the HTTP status code. | api , method , status , route | counter |
step_ca_rotator_mode | Current mode of the CA rotator. (follower, leader) | isd_as , mode | gauge |
step_ca_rotator_provision_cert_instance | Instance number of the provisioned certificate. | isd_as | gauge |
step_ca_rotator_provision_cert_not_after_timestamp_seconds | NotAfter timestamp of the provisioned certificate. | isd_as | gauge |
step_ca_rotator_provision_inconsistent_configuration | Indicates whether a provisioning failure has left the configuration in an inconsistent state. This happens when the configuration cannot be reverted after a failed provisioing of the CA certificate. | isd_as | gauge |
step_ca_rotator_provision_key_version | Key cloudkms key version of the provisioned certificate. | isd_as | gauge |
step_ca_rotator_provision_last_run_ok | Indicates whether the last provision run was successful. | isd_as | gauge |
step_ca_rotator_provision_last_run_timestamp_seconds | Timestamp of the last provision run. | isd_as , result | gauge |
step_ca_rotator_provision_runs_total | Total number of provision runs. | isd_as , result | counter |
step_ca_rotator_rotate_last_run_ok | Gauge whether the last rotate run was successful. | isd_as | gauge |
step_ca_rotator_rotate_last_run_timestamp_seconds | Timestamp of the last rotate run. | isd_as , result | gauge |
step_ca_rotator_rotate_root_cert_not_after_timestamp_seconds | NotAfter timestamp of the Root certificate. | isd_as | gauge |
step_ca_rotator_rotate_root_signer_available | Indicates if the root signer is available. | isd_as | gauge |
step_ca_rotator_rotate_runs_total | Total number of rotate runs. | isd_as , result | counter |