Skip to main content

Set up a monitoring host

This page explains how to set up Monitoring your Anapaya SCION environment. Anapaya's monitoring stack is based on Prometheus, Grafana, Loki, and AlertManager. They are all open-source tools with abundant documentation and online support.

The technical requirements for setting up monitoring on Anapaya appliances are:

  1. The management interface of the target Anapaya appliances must be reachable from the monitoring host.
  2. Firewall must allow HTTP(S) connections to the monitoring port of the appliance.

Enable telemetry on the Anapaya appliance

In the appliance configuration of the monitored host, verify that there is an entry for telemetry, as shown in the snippet below. The appliance exposes the metrics on the address, which is in the ip:port format.



{
  "management": {
      "telemetry": {
          "address": ""
      }
  }
}

Alternative approach

It is also possible to enable telemetry by configuring the management API endpoint.

Set up Prometheus

Set up Prometheus by following the official Prometheus instructions. Specifically,

  1. Ensure you have the latest version of Prometheus installed. Consult the installation guide.

  2. Follow the target configuration instructions and use the code example below to monitor an Anapaya appliance. For each host that is to be monitored, add an entry in the targets section. The appliance address is the one configured in the telemetry section of the appliance configuration, as shown in Enable telemetry on the Anapaya appliance. It has the format host:port, where host can be either an IP address or a hostname.

    
    
    - job_name: 'anapaya-appliance'
      honor_labels: true
      metric_relabel_configs: # Add this config if you are using the Anapaya Grafana dashboards.
        - source_labels: ['hostname']
          target_label: 'shortname'
        - source_labels: ['__name__']
          regex: 'target_info'
          action: drop
      static_configs:
        - targets:
            - 
    
            - 
    
          labels:
            product: 
    
            project: 
    
    
    
important

If you use the recommended Grafana dashboards, add the correct product label. The dashboards require this label to be set accordingly.

The possible labels are: core, edge, gate, ca

Optimal scrape interval

Anapaya dashboards work best with a scrape interval of 5 seconds.

Recording and Alerting Rules

Prometheus allows the configuration of rules for recording data or creating alerts when an event happens. These alerts can later be picked up by AlertManager and be integrated with your alerting system. You can specify the events that trigger an alert, the scope and severity of the alert, and also provide a description and summary of the firing alert. Refer to Recommended alert rules for more information.

Anapaya provides a recommended ruleset for alerts that can be used as a starting point. The files are accessible in Anapaya's software repository. Depending on the product you use, there is a predefined set of alert rules, i.e.,

  • anapaya-alerts-core for CORE,
  • anapaya-alerts-edge for EDGE,
  • anapaya-alerts-gate for GATE,
  • anapaya-alerts-scion-ca for the Anapaya SCION CA.

There is also an anapaya-alerts-external package that includes all recommended alerts.

For example, to download a version of the EDGE alert rules, run:

curl -O https://dl.cloudsmith.io/public/anapaya/public/raw/names/anapaya-alerts-edge/versions/v0.40.2/anapaya-alerts-edge-v0.40.2.yml

Adjust your Prometheus configuration to use the downloaded alert rule file. Follow the official instructions on adding an alert rule file to your Prometheus configuration.

Runbooks

To handle the published alerts, follow the Runbooks.

Set up Grafana

Set up Grafana by following the official Grafana instructions. Before setting up Grafana, ensure that you have set up and started Prometheus following the instructions in Set up Prometheus. On a high level, you need to:

  1. Install Grafana and log in.
  2. Configure Prometheus as a datasource for Grafana.
  3. Add recommended Grafana dashboards or create your own dashboards.

Anapaya provides a recommended set of Grafana dashboards that can be used as a starting point. The archives including the JSON dashboards are accessible in Anapaya's software repository. The JSON files can then be imported to Grafana. Depending on the product you are using, there is a predefined archive including multiple dashboards, i.e.,

  • anapaya-dashboards-core for CORE,
  • anapaya-dashboards-edge for EDGE,
  • anapaya-dashboards-gate for GATE,
  • anapaya-dashboards-scion-ca for the Anapaya SCION CA.

There is also an anapaya-dashboards-external archive that includes all recommended dashboards.

For example, to download a version of the combined dashboards, run:

curl -O https://dl.cloudsmith.io/public/anapaya/public/raw/names/anapaya-dashboards-external/versions/v0.40.2/anapaya-dashboards-external-v0.40.2.zip

Import a JSON dashboard by following the official instructions.

Set up Loki

Monitoring host

Anapaya uses Loki to export the logs from monitored appliances. Follow the official instructions on setting up Loki on the monitoring host.

Ensure that Loki is added as a data source, as explained in the instructions.

Monitored appliances

On the Anapaya appliances to be monitored, add the following snippet in the management.telemetry section of the appliance configuration. You can adapt the code snippet below in-place, set the url to the URL of the Loki instance where the appliance should send the generated logs to.



appliance-cli edit config 'management.telemetry.logging: {
  "logging_type": "LOKI",
  "loki": {
    "basic_auth": {
      "password": "",
      "username": ""
    },
    "url": ""
  }
}'

Set up AlertManager

Alertmanager handles alerts sent by Prometheus and takes care of deduplicating, grouping, and routing them to the correct receiver integration. Follow the official instructions on setting up AlertManager. As stated in the documentation, AlertManager can be integrated with a wide variety of alert systems, including emails, Slack, and Opsgenie.