SCION/CP-PKI
This guide explains how to troubleshoot SCION and CP-PKI related aspects of the Anapaya appliances.
Current configuration and state
The current SCION configuration can be retrieved from the appliance using the following command:
appliance-cli get config -f body.config.scion
To get the current SCION state of the appliance, use the following command:
appliance-cli info scion
This lists all the SCION ASes that are configured on the appliance and shows the state of crypto material and the state of the SCION interfaces.
Common problems
TRC for local ISD missing
appliance-cli info scion
SCION ASes
- 1-ff00:1:1
Crypto:
- TRC for local ISD ❌
...
Without a TRC for the local ISD, the appliance cannot receive and validate topology information and therefore there will be no SCION connectivity.
Refer to TRC handling on how to provision the TRC.
AS certificate missing or expired
appliance-cli info scion
SCION ASes
- 1-ff00:1:1
Crypto:
...
- AS certificate ❌
Without a valid AS certificate, the appliance cannot receive and validate topology information and therefore there will be no SCION connectivity.
Refer to Certificate handling on how to create a CSR and request a certificate. Refer to Request AS certificate via sibling appliance if the appliance is part of a cluster and a sibling appliance already has a valid AS certificate.
SCION interface is down
The appliance cannot send or receive SCION traffic on a SCION interface which is down.
Refer to the corresponding SCIONInterfaceStateDown to find out how to investigate the issue.
Uploading AS certificate fails
If the AS certificate is in PEM format, make sure that the certificate chain has exactly two certificates: the AS certificate and the issuer certificate. Also, make sure that there is no trailing line in the certificate chain.
SCION connectivity issues
This section provides primary guidelines to troubleshoot some common network issues caused by the misconfiguration of SCION services.
Issue: Assume that you are operating the SCION AS 1-ff00:1:1
and you are notified that the
connectivity from the host EDGE-1
in your AS to the neighboring AS 1-ff00:1:2
is lost. This
can be a loss of SCION connectivity or IP connectivity over IP-in-SCION tunneling.
In practice, your alerting system which sits on top of the monitoring system, should inform you about such an incident. You might be able to extract information from the alerts which can be useful to find the source of the issue. In this guide, we do not rely on such information as it is dependent on your monitoring and alerting systems.
The steps taken here for troubleshooting should be perceived solely as recommendations. Furthermore, they are meant to assist you with resolving only a small subset of issues you might encounter in practice.
A reasonable first step is to log into EDGE-1
and check the set of SCION paths to the AS
1-ff00:1:2
.
Not all expected paths alive considers the case where you do not see the full set of paths you expect and explains two potential causes and how to resolve them.
All expected paths alive covers the scenarios where all the expected paths are alive and considers two possible causes and guides you how to resolve them.
Not all expected paths alive
A basic sanity check for SCION connectivity-related issues is to log into EDGE-1
in the AS
1-ff00:1:1
and run the showpaths
command. This command shows the set of available paths to a
particular destination. The --refresh
forces the scion tool to grab fresh paths from the local
SCION control service.
scion showpaths 1-ff00:1:2 --refresh
If there is no path, the output looks like:
Available paths to 1-ff00:1:2
Error: no path found
It is also possible that you do not see the complete set of paths you expect or some of them are in
the timeout
state instead of alive
. For example, you expect to see the path [1-ff00:1:1 2>3 1-ff00:1:2]
which corresponds to the link from interface 2 in 1-ff00:1:1
to interface 3 in
1-ff00:1:2
, but it is not present. Run an IP ping
between EDGE-1
and the corresponding router
in AS 1-ff00:1:2
. If this works, it means that there is connectivity on the IP underlay
connecting EDGE-1
and the router in 1-ff00:1:2
. In that case, the connectivity issue is probably
on the SCION layer. If, on the other hand, the IP ping
does not work, the root cause of the issue
is probably in the lower layers, e.g., misconfiguration of the underlay network or an issue with
networking hardware. This document assumes that the root cause of the issue is at the SCION layer
and explains three most possible scenarios.
Scenario 1: endpoint misconfiguration
One potential cause is that there is an error in the configuration of EDGE-1
. This is especially
likely if you have just configured EDGE-1
. Furthermore, if a non-empty subset of the paths is
available, the AS certificate issue that we discuss in the next section can be ruled out on our
side.
The issue could be simply caused by a typo in an IP address or a missing entry. In the example above, you need to check the configuration of interface 2 in your AS. If this is the problem, fix the misconfiguration, configure the appliance with the new configuration, and then check that you see the set of expected paths.
Scenario 2: AS certificate issue
If there is no valid AS certificate configured on EDGE-1
, the appliance cannot create valid path
segments from the beacons because it cannot sign them. As a result, the showpaths
will not display
any path. Thus, the AS certificate might be the source of the problem.
Get the list of AS certificates that are configured on the appliance:
appliance-cli get cppki/certificates
If there is no AS certificate configured on EDGE-1
, the output is:
{
"certificate_chains": []
}
Missing AS certificate can be due to forgetting to configure an AS certificate, deleting the certificate accidentally, or failing to renew certificates automatically, e.g., when there has been a prolonged connectivity issue in the order of days.
To resolve the issue, you need to add a valid AS certificate to EDGE-1
. In general, an AS
certificate needs to be requested from one of the CAs of the local ISD. The initial certificate is
requested with an out-of-band mechanism. See Certificate
handling
for more details on listing, generating, and installing AS certificates.
Scenario 3: time synchronization issue
If the appliance has a valid AS certificate but does not have any paths to the SCION network, its time might have been desynchronized, resulting in appliance's disability to verify beacons and create path segments.
Check the current date, timezone and NTP status:
timedatectl status
For example this output shows that timezone is UTC
and NTP synchronization is not working:
Local time: Mon 2023-10-30 12:39:43 UTC
Universal time: Mon 2023-10-30 12:39:43 UTC
RTC time: Mon 2023-10-30 12:39:43
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: no
NTP service: active
RTC in local TZ: no
Time synchronization failure can be due to wrong configuration of NTP servers or unreachabilty of the servers. NTP servers must be reachable via an underlay IP connectivity. NTP servers should be configured in the appliance configuration. See System for detailed information on how to configure time servers.
As a temporary solution, set the time manually:
timedatectl set-time '2015-11-20 16:14:50'
However, to avoid future time synchronization problems, configure NTP servers and make sure they are reachable.
It is not necessary to configure a timezone for the SCION network to be operational. If you prefer, you can set the timezone using the command::
timedatectl set-timezone UTC
We recommend using UTC everywhere since it makes it easier to correlate events across timezones.
All expected paths alive
Scenario 1: Domain misconfiguration
Assume that ping from the end host Endhost-1
in the AS 1-ff00:1:1
to the end host Endhost-2
in
the AS 1-ff00:1:2
, which should be reachable over the IP-in-SCION tunneling, does not work.
Meanwhile, running a showpaths
command towards AS 1-ff00:1:2
displays all the expected paths
between the two ASes.
Inspect the prefixes advertised by the local SCION AS (i.e., 1-ff00:1:1
) and the prefixes learnt
from the remote SCION ASes (in particular, 1-ff00:1:2
).
These prefixes are exposed by the appliance on a debug endpoint:
appliance-cli get debug/scion-tunneling/sgrp/domains
Below is an example of how the output could look like::
{
"domains": {
"your-domain-name": {
"announced": ["10.0.10.0/24"],
"received": [""]
}
}
}
In this case, no prefix from remote ASes has been learned.
If there is a discrepancy between the set of expected and learnt prefixes, the domain is probably misconfigured.
- Fix the configuration and configure the appliance with the modified configuration.
- Check the HTTP status page to confirm that the changes appear there too.
- Try
ping
command fromEndhost-1
toEndhost-2
.
Scenario 2: TRC issue
In order for the appliance to join the SCION network and communicate with other nodes, it has to be configured with a set of TRCs. These TRCs build the trust anchors for verifying all of the control plane data that is exchanged in the SCION protocol. Therefore, the lack of a trusted TRC in the appliance results in loss of connectivity.
Get the list of configured TRCs on the appliance:
appliance-cli get cppki/trcs
If there is no TRC configured on the appliance, the output is as follow::
{
"trcs": []
}
This indicates that no TRC is configured on the appliance. To fix the issue, install a valid TRC on this appliance. See TRC handling for more details on generating and installing a TRC.
Missing TRC can be due to forgetting to configure a TRC or deleting the TRC accidentally. If the
latter has happened, the showpaths
command may function correctly for some time.