VyOS - 100Gbit BGP - Part 2

In the previous blog I did a test between two Vyos hosts, indirectly connected to each other with two switches. It was amazing how easily Vyos could achieve these speeds with just a few minor tweaks. However, this only tests a small part of the entire stack. In this blog I add two new 100G hosts, I add routing and we do not test the traffic between two hosts directly but between two hosts that are indirectly connected to each other where traffic has to pass through the 100G host.

The new hosts contain different cards than the hosts we previously tested, namely: Mellanox MT27800 ConnectX-5. After performing all the steps from the previous blog, the speed remained low (<10 Gbps). The solution for was the following which seems to be specific to Mellanox;

ethtool -c <interface> adaptive-rx off
ethtool -c <interface> adaptive-tx off

Specifications of the newly added hosts:

  • AMD EPYC 7543P 32-Core Processor
  • 128GB DDR4 RAM
  • 2x 1Tb NVME
  • 1x Mellanox MT27800 (2x 100Gbps)
  • 1x Intel X710 (2x 10Gbps) - not in use currently

The current setup

How is it all connected

This is a representation of the current setup. These are two physically separate locations with several hundred kilometers of DWDM between them. Latency is very stable at 5ms. On the right side you will find the two 100G servers i used for the previous blog.

I'll skip the routing configuration because that's a different topic; This can be done in several ways, each with its advantages and disadvantages. In my case i've configured BGP on all hosts and can now trace the whole path:

traceroute to 10.255.255.150 (10.255.255.150), 30 hops max, 60 byte packets
 1  192.168.3.100 (192.168.3.100)  4.794 ms  4.793 ms  4.728 ms
 2  192.168.1.200 (192.168.1.200)  4.986 ms  5.030 ms  4.954 ms
 3  10.255.255.150 (10.255.255.150)  9.927 ms *

Initial results

Again, not what we were hoping for.

With the current optimizations from the previous blog, i achieve approximately 50 Gbit/s with -P 5. A lot worse than i saw between the two hosts directly. Running a single iperf (without the -P flag) is showing roughly 10Gbit.

[  4] 0.0000-10.0170 sec  10.1 GBytes  8.66 Gbits/sec
[  2] 0.0000-10.0169 sec  10.2 GBytes  8.75 Gbits/sec
[  5] 0.0000-10.0169 sec  10.0 GBytes  8.58 Gbits/sec
[  3] 0.0000-10.0168 sec  10.1 GBytes  8.65 Gbits/sec
[  1] 0.0000-10.0167 sec  10.2 GBytes  8.71 Gbits/sec
[SUM] 0.0000-10.0008 sec  50.5 GBytes  43.4 Gbits/sec

Let's take a step back and see what's going on by doing an iPerf between two servers connected via DWDM:

[ ID] Interval       Transfer     Bandwidth
[  6] 0.0000-10.0103 sec  12.7 GBytes  10.9 Gbits/sec
[  7] 0.0000-10.0103 sec  15.6 GBytes  13.4 Gbits/sec
[  8] 0.0000-10.0103 sec  16.9 GBytes  14.5 Gbits/sec
[  1] 0.0000-10.0103 sec  16.3 GBytes  14.0 Gbits/sec
[  4] 0.0000-10.0264 sec  12.5 GBytes  10.7 Gbits/sec
[  5] 0.0000-10.0103 sec  12.3 GBytes  10.5 Gbits/sec
[  2] 0.0000-10.0266 sec  12.0 GBytes  10.2 Gbits/sec
[  3] 0.0000-10.0264 sec  9.97 GBytes  8.54 Gbits/sec
[SUM] 0.0000-10.0021 sec   108 GBytes  92.9 Gbits/sec

Exactly the same iperf but now to a server 1 hop further:

[ ID] Interval       Transfer     Bandwidth
[  4] 0.0000-10.0173 sec  8.54 GBytes  7.32 Gbits/sec
[  8] 0.0000-10.0172 sec  8.16 GBytes  7.00 Gbits/sec
[  7] 0.0000-10.0171 sec  8.58 GBytes  7.36 Gbits/sec
[  5] 0.0000-10.0172 sec  8.38 GBytes  7.18 Gbits/sec
[  1] 0.0000-10.0171 sec  8.48 GBytes  7.27 Gbits/sec
[  2] 0.0000-10.0171 sec  8.94 GBytes  7.67 Gbits/sec
[  6] 0.0000-10.0171 sec  7.91 GBytes  6.79 Gbits/sec
[  3] 0.0000-10.0333 sec  7.83 GBytes  6.71 Gbits/sec
[SUM] 0.0000-10.0051 sec  66.8 GBytes  57.4 Gbits/sec

When enabling Receive Packet Steering I ran into a bug, so I used the command below:

find /sys/class/net/*/queues/[rt]x-[01234567]/[rx]ps_cpus -exec sh -c '[ -w {} ] && echo f > {} 2>/dev/null' \;

The speed looked a lot better after that, around 85Gbit/s of forwarding speed!

Since we still have better throughput between the hosts, I made one last adjustment to the Broadcom 100G cards, increasing the number of channels:

# ethtool -l eth2
Channel parameters for eth2:
Pre-set maximums:
RX:		37
TX:		37
Other:		n/a
Combined:	74
Current hardware settings:
RX:		0
TX:		0
Other:		n/a
Combined:	16
[edit]
# ethtool -L eth2 combined 74
[edit]
# ethtool -L eth3 combined 74
[edit]

After this change, the speed between the first, second and third hops is identical (92Gbps). That was quite easy!