Common operations

This documentation page contains information for common operations that are helpful when troubleshooting.

Collect appliance debug dump

A debug dump is a compressed journald-log of the last hour. Among other things, it contains snapshots of metrics and appliance API-endpoints taken at regular intervals. A debug dump should always be included when filing a bug report, e.g.

The following command takes a debug dump and stores the result in debugdump.zst:

appliance-cli debug dump -o debugdump.zst

Unreachable API

If the HTTP-API of the appliance is not properly configured or not reachable, you can specify the --use-journalctl option. Specifying this option bypasses the appliance API and makes use of journalctl directly.

To share the debug dump with Anapaya Support, please upload it at upload.anapaya.net and attach the upload information to your support request.

Time range filtering

When using --use-journalctl, you can filter logs by time range using the --since and --until options. These options accept various time formats:

Absolute timestamps: "YYYY-MM-DD HH:MM:SS" (e.g., "2024-01-15 14:30:00")
Keywords: "yesterday", "today", "tomorrow", "now"
Relative times: "-1h" (1 hour ago), "+30m" (30 minutes from now)

For more details on time specifications, see the systemd documentation:

Examples:

Create a debug dump for a specific time range using absolute timestamps:

appliance-cli debug dump --use-journalctl --since "2024-01-15 14:00:00" --until "2024-01-15 15:00:00" -o debugdump.zst

Create a debug dump from yesterday until now:

appliance-cli debug dump --use-journalctl --since yesterday -o debugdump.zst

Create a debug dump for the last 2 hours:

appliance-cli debug dump --use-journalctl --since "-2h" -o debugdump.zst

Create a debug dump from 30 minutes ago until 5 minutes ago:

appliance-cli debug dump --use-journalctl --since "-30m" --until "-5m" -o debugdump.zst

Collect coredumps

If the dataplane process has crashed, a coredump file is created that contains valuable debugging information. Coredump files are stored in the /var/tmp/cores/ directory and should be provided to Anapaya customer support for analysis.

To collect a coredump:

SSH to the given machine.
Compress the newest coredump file:
```
tar -czf coredump.tar.gz "$(ls -t /var/tmp/cores/core.vpp* | head -n 1)"
```
Multiple coredumps
Possibly there are multiple coredump files in the /var/tmp/cores/ directory. To list them use:
ll /var/tmp/cores/
You can also compress a specific or all coredump files by adjusting the tar command accordingly.
Upload the resulting coredump.tar.gz file at upload.anapaya.net and attach the upload information to your support request.

Gather appliance information

To collect appliance-related information to provide it to the Anapaya customer support:

SSH to the given machine.
Collect general information by running:
```
appliance-cli info > appliance.info
```
Fetch the appliance configuration by running:
```
appliance-cli get config > config.json
```

Secrets in the config (prior to v0.39.0)

In old versions (prior to v0.39.0) the appliance configuration contained secrets, so please remove them before sending the information to anyone!

Gather general host information

Collect host-related information to provide it to the Anapaya customer support:

SSH to the given machine.
Run
```
sudo lshw
```

Check docker services

Check whether the services (run as docker containers) are running:

SSH to the given machine

Use docker ps -a:

$ docker ps -a
CONTAINER ID  IMAGE                  COMMAND                 CREATED     STATUS     PORTS  NAMES
c718397beaf9  scion-all:v0.32.2      "/app/scion-all netw…"  7 days ago  Up 7 days         dataplane-control
5beecfb5d081  vpp-dataplane:v0.32.2  "/usr/bin/vpp -c /sh…"  7 days ago  Up 7 days         dataplane
...

The output of the command shows whether the service is up and for how long it has been running. If the service is up for a very short amount of time, there is a chance that it is crashlooping.

For further information please refer to the official Docker documentation.

Change log level

Change the log level to debug for collecting more information:

SSH to the given machine

Change the debug level of a specific service to debug.

appliance-cli services log level $service-name debug

warning

Revert your changes after troubleshooting.

Inspect docker service logs

Inspect the logs of services running as docker containers:

SSH to the given machine.
If needed, see the list of services:
```
docker ps -a
```
Inspect the logs by running the following command:
```
docker logs $service-name
```

Recent logs

To see only the recent logs use:

docker logs $service-name --since=time-duration

For example, to check the logs of the last minute, run:

docker logs $service-name --since=1m

Save logs to file

To save the logs in a file use:

docker logs $service-name 2> filename

Search in logs

To grep through the logs use

docker logs $service-name 2>&1 | grep query

For further information please refer to the official Docker documentation.

Restart a service

Automatic restart

The Anapaya appliance restarts failed services automatically, so manual restarting is likely to be useful only when the service is stuck and/or unresponsive.

Restart a service using the appliance-cli:

appliance-cli post debug/services/$service-name/restart

where $service_name is the name of the service you want to restart. To get the possible values for the $service_name, use the following command:

appliance-cli get debug/services

Using docker restart

Alternatively, you can restart a service by running the following commands:

SSH to the given machine
Run

docker restart $service-name

For further information, please refer to the official Docker documentation.

Clean up docker images

Remove docker images that are no longer used:

SSH to the given machine.
List all docker images by running:
```
docker image ls
```
Remove old unused images by running:
```
docker image prune
```

For further information please refer to the official Docker documentation.

Connect to the BGP daemon's interactive console

Connect to the BGP daemon's shell:

SSH to the given machine.
Open the interactive console by running:
```
docker exec -it frr vtysh
```

For further information on the console please refer to the official FRR documentation.

Check systemd services

Check if the systemd services are running:

SSH to the given machine

Run

systemctl list-units '$service-name'

$ systemctl list-units 'appliance*'
UNIT                        LOAD   ACTIVE SUB     DESCRIPTION
appliance-host.service      loaded active running Anapaya Appliance Host Service
appliance-installer.service loaded active running Anapaya Appliance Installer
...

To get a more detailed overview of a specific service, use

systemctl status $service-name

$ systemctl status appliance-installer.service
      ● appliance-installer.service - Anapaya Appliance Installer
   Loaded: loaded (/etc/systemd/system/appliance-installer.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2023-02-13 08:26:29 UTC; 7min ago
Main PID: 166 (appliance-insta)
   Tasks: 13 (limit: 38262)
   CGroup: /system.slice/appliance-installer.service
         └─166 /usr/bin/appliance-installer --config /etc/anapaya/installer/appliance-installer.toml
...

With these commands you can see whether the service is active and running and for how long it has been running.

note

A systemd service can be restarted using

systemctl restart $service-name.

Inspect systemd service logs

View the systemd service logs:

SSH to the given machine.
If needed, you can list the appliance-related services by running:
```
systemctl list-units 'appliance*'
```
Inspect the logs by running:
```
journalctl -eu $service-name
```

Recent logs

To see only the recent logs use the --since flag. For example, to see only the logs from today use

`journalctl -eu $service-name --since today`.

To show the most recent 20 entries, use the -n 20 option.

Save logs to file

To save the logs in a file use:

journalctl -u $service-name > filename

Search in logs

To grep through the logs use:

journalctl -eu $service-name | grep query

Check the systemd-timesyncd service

The Anapaya appliance uses the systemd-timesyncd service, which acts as an NTP client and connects to a pool of NTP servers for time synchronization. The following actions provide some starting points for troubleshooting the timesyncd service called systemd-timesyncd.service. For further information please refer to the official documentation.

SSH to the given machine.
Check if the system clock is synchronized and if NTP service is active using:
```
timedatectl status
```
Check if the service is running
Check the log of the service.

Restart the service:

systemctl restart systemd-timesyncd.service

Find the configured NTP servers:

cat /etc/systemd/timesyncd.conf | grep NTP

Disk usage analysis

This section contains some helpful commands that you may need when investigating if you run out of disk space.

SSH to the given machine.
Check the current space:
```
df -h path
```
Check the list of the current files:
```
ls -l path
```
The du command can be used to get a more detailed overview of which directory consumes how much space. You can vary the max-depth option or the starting directory:
```
du -cha --max-depth=1 / | grep -E "M|G"
```

For further information about the du command please refer to the official documentation.

Clean up disk space

There are several ways to free up disk space. The options are divided depending on the context.

Remove outdated packages:
```
sudo apt autoclean
```
Remove orphaned packages which are no longer needed:
```
sudo apt autoremove
```
Clean the entire APT cache:
```
sudo apt clean
```

Systemd journal logs

Check systemd journal logs:
```
journalctl --disk-usage
```
Clear the logs that are older than 3 days:
```
sudo journalctl --vacuum-time=3d
```

Docker images

Remove unused docker images.

Fix topology synchronization error

Appliances in a cluster share their topology information with each other. This either happens statically through configuration or dynamically through an exchange protocol. For further information on how to configure topology synchronization in the appliance configuration, refer to Topology Synchronization. The instructions below should help to identify a misconfiguration.

Check the logs of the appliance-controller service. The logs should contain an error describing the misconfiguration.
Fix the misconfigured appliances and update them.

Inspect SCION paths used for IP-in-SCION tunneling

While troubleshooting SCION connectivity, it is often useful to check the available paths for each domain. This section provides an overview on how to achieve this.

SSH to the given machine.
Show the currently available paths for all domains and traffic matchers by running the following command. This also shows whether the path is alive, dead (no probes are passing through), expired or similar.
```
appliance-cli inspect scion-tunneling summary --all-paths
```

Show the currently used paths for a specific domain.

appliance-cli inspect scion-tunneling summary --all-paths \
  --domain $domain

For the used paths for a specific traffic matcher within the given domain, run:

appliance-cli inspect scion-tunneling summary --all-paths \
  --domain $domain --traffic-matcher traffic matcher

Ping the underlay network

When investigating an issue, it is often helpful to determine whether the underlying IP connectivity is the problem.

For further information, please refer to the official ping documentation.

tip

The ping command runs indefinitely, unless specified otherwise:

ping -c number destination

Changing the source address is possible either directly via the address or the interface name:

ping destination -I interface/address

The default time interval between successive packet transmissions is one second. You can specify a custom interval in seconds:

ping -i interval destination

It may be useful to test how the underlying network deals with big packets. Use -s option to set the packet size and -M do to prohibit fragmentation:

ping -s size -M do destination

Collect appliance debug dump​

Time range filtering​

Collect coredumps​

Gather appliance information​

Gather general host information​

Check docker services​

Change log level​

Inspect docker service logs​

Restart a service​

Clean up docker images​

Connect to the BGP daemon's interactive console​

Check systemd services​

Inspect systemd service logs​

Check the systemd-timesyncd service​

Disk usage analysis​

Clean up disk space​

APT related​

Systemd journal logs​

Docker images​

Fix topology synchronization error​

Inspect SCION paths used for IP-in-SCION tunneling​

Ping the underlay network​

Collect appliance debug dump

Time range filtering

Collect coredumps

Gather appliance information

Gather general host information

Check docker services

Change log level

Inspect docker service logs

Restart a service

Clean up docker images

Connect to the BGP daemon's interactive console

Check systemd services

Inspect systemd service logs

Check the systemd-timesyncd service

Disk usage analysis

Clean up disk space

APT related

Systemd journal logs

Docker images

Fix topology synchronization error

Inspect SCION paths used for IP-in-SCION tunneling

Ping the underlay network