Skip to main content

EDGE failover testing

Reliable network connectivity is essential for any organization. If you use Anapaya's SCION-based networking, it's important to test failover between your Anapaya EDGE appliances. This guide explains how to perform failover tests for the two main high-availability setups: VRRP and BGP.

tip

A failover test is recommended for every new redundant Anapaya EDGE deployment before going into production.

Preparation

To prepare a failover test:

  • Define and inform involved parties: Ensure that all relevant stakeholders are informed about the planned test.

  • Verify redundant configuration: If applicable, double-check that both Anapaya EDGEs are correctly configured for either VRRP or BGP high availability according to the documentation.

  • Establish baseline performance: Monitor and document the normal network traffic patterns and performance metrics under the regular operation of the EDGEs. This will serve as a benchmark to compare against the metrics during and after the failover test.

  • Prepare monitoring tools: Have monitoring tools such as ping or traceroute ready to observe the failover process in real-time. When the operator of the Anapaya EDGEs is also involved in the test, ensure access to the Anapaya EDGE Grafana dashboards is available.

  • Define success criteria: Clearly outline what constitutes a successful failover.

Testing failover in a VRRP deployment

Testing a VRRP setup involves simulating a failure of connectivity between the LAN and the primary EDGE and observing the backup's takeover.

Step 1: Identify the primary Anapaya EDGE

Before initiating the test, you need to determine which of the two EDGEs is currently the primary. This can typically be done by checking the VRRP status on the EDGE appliances or by observing traffic flow on the Grafana dashboards.

Step 2: Monitor connectivity

Before simulating a failure, establish a continuous ping (or similar) from a device in the LAN to a remote destination through the Anapaya EDGEs. This will help you observe any packet loss during the failover process.

Step 3: Simulate failure of connectivity to the primary EDGE

There are several ways to simulate a failure of the connectivity to the primary device:

  • Shut down/reboot: Shut down or reboot the primary EDGE.
  • Disconnect LAN interface: Unplug the LAN cable of the primary EDGE to trigger the VRRP failover mechanism. This can be done by disconnecting the cable or by disabling the LAN port on the switch to which the primary EDGE is connected.

Step 4: Monitor the failover process

As soon as the failure is initiated, start monitoring the network traffic and the status of the backup EDGE.

  • Observe VRRP state change: On the backup EDGE, monitor its VRRP state. It should transition from "backup" to "master."
  • Continuous Ping: Keep an eye on the continuous ping test you started earlier. You may see a brief interruption in connectivity, but traffic should resume flowing through the backup EDGE.
note

The Virtual IP (VIP) itself is not pingable.

Step 5: Verify connectivity

Once the failover is complete, thoroughly verify that network connectivity is fully restored.

  • Test application connectivity: Ensure that all applications that rely on the Anapaya EDGEs are functioning correctly.
  • Analyze packet loss: Review the results of your continuous ping test to determine the extent of packet loss during the failover.

Step 6: Restore the original primary EDGE

After successful testing, you can bring the original primary EDGE back online. In a standard VRRP configuration, the original primary should preempt the backup and become the primary again once it is fully operational. Monitor this process to ensure a smooth transition back to the original state.

Testing failover in a BGP deployment

Testing a BGP failover involves simulating a failure that causes the BGP session on the active EDGE to go down, prompting the LAN and SCION networks to reroute traffic through the secondary EDGE.

Step 1: Identify the active traffic path

In a BGP setup, both EDGEs can potentially be active. Use BGP monitoring tools, the traceroute command or the Grafana dashboards to identify the primary path that traffic is currently taking.

Step 2: Monitor connectivity

Before simulating a failure, establish a continuous ping (or similar) from a device in the LAN to a remote destination through the Anapaya EDGEs. This will help you observe any packet loss during the failover process.

Step 3: Simulate a failure

There are several ways to simulate a failure of the connectivity to the primary device:

  • EDGE shutdown/reboot: The first option is to shut down or reboot the primary EDGE.
  • BGP session shutdown: The most targeted approach is to administratively shut down the BGP session towards the primary Anapaya EDGE. This will cause it to withdraw its advertised routes, and the local network's BGP routers will converge on the routes advertised by the secondary EDGE.

Step 4: Monitor traffic rerouting

The key to a successful BGP failover test is to observe the BGP convergence process and the subsequent traffic rerouting.

  • Check routing tables: Examine the BGP routing tables on your local routers. The routes that were previously learned from the primary EDGE should be removed, and the routes from the secondary EDGE should become the active paths.
  • Continuous ping: Keep an eye on the continuous ping test you started earlier. You may see a brief interruption in connectivity, but traffic should resume flowing through the backup EDGE.

Step 5: Verify connectivity

Once the failover is complete, thoroughly verify that network connectivity is fully restored.

  • Test application connectivity: Ensure that all applications that rely on the Anapaya EDGEs are functioning correctly.
  • Analyze packet loss: Review the results of your continuous ping test to determine the extent of packet loss during the failover.

Step 6: Restore the primary path

Once you have completed your testing, you can re-enable the BGP session towards the primary Anapaya EDGE. The network should then reconverge, and if your BGP policies are configured to prefer the primary path, traffic should shift back. Monitor this process to ensure it happens as expected.

Testing failover of the ISP

In addition to testing the failover between the Anapaya EDGEs, it's also important to test the failover of the ISPs connected to the EDGEs. This can be done by simulating a failure of the primary ISP connection and observing the failover to the secondary ISP.

info

No impact on IP connectivity is expected during this test, as the failover is handled on the SCION level.

Step 1: Identify the primary ISP connection

Before initiating the test, you need to determine which ISP connection is currently the primary. This can typically be done by inspecting the SCION tunneling routing state on the Anapaya EDGEs or by observing traffic flow on the Grafana dashboards.

Step 2: Monitor connectivity

Before simulating a failure, establish a continuous ping (or similar) from a device in the LAN to a remote destination through the Anapaya EDGEs. This will help you observe any packet loss during the failover process.

Step 3: Simulate a failure of the primary ISP connection

To simulate a failure of the primary ISP connection:

  • Disconnect the WAN interface: Unplug the WAN cable of the primary ISP connection to trigger the failover mechanism. This can be simulated by disconnecting the cable or by disabling the WAN port on the switch to which the primary ISP connection is connected.

  • Set the WAN interface to down: If you have access to the Anapaya EDGE shell, you can set the WAN interface to down using the following command:

    ip link set dev <interface> down
warning

Setting the interface to down with the ip link command is only supported on Anapaya EDGE appliances running version v0.40.0 or later.

Step 3: Monitor the path failover process

As soon as the failure is initiated, start monitoring the network traffic and the status of the secondary ISP connection. Use the SCION tunneling routing state to observe the path change.

Step 4: Verify connectivity

Once the failover is complete, thoroughly verify that network connectivity is fully restored.

  • Test application connectivity: Ensure that all applications that rely on the Anapaya EDGEs are functioning correctly.
  • Analyze packet loss: Review the results of your continuous ping test to determine the extent of packet loss during the failover.

Step 5: Restore the primary ISP connection

After successful testing, you can bring the primary ISP connection back online. Monitor this process to ensure a smooth transition back to the original state.

Either reconnect the cable or, if you set the interface to down, bring it back up using the following command:

ip link set dev <interface> up