VyOS - 100Gbit BGP - Part 2
In the previous blog I did a test between two Vyos hosts, indirectly connected to each other with two switches. It was amazing how easily Vyos could achieve these speeds with just a few minor tweaks. However, this only tests a small part of the entire stack. In this blog I add two new 100G hosts, I add routing and we do not test the traffic between two hosts directly but between two hosts that are indirectly connected to each other where traffic has to pass through the 100G host.
The new hosts contain different cards than the hosts we previously tested, namely: Mellanox MT27800 ConnectX-5. After performing all the steps from the previous blog, the speed remained low (<10 Gbps). The solution for was the following which seems to be specific to Mellanox;
ethtool -c <interface> adaptive-rx off
ethtool -c <interface> adaptive-tx off
Specifications of the newly added hosts:
- AMD EPYC 7543P 32-Core Processor
- 128GB DDR4 RAM
- 2x 1Tb NVME
- 1x Mellanox MT27800 (2x 100Gbps)
- 1x Intel X710 (2x 10Gbps) - not in use currently
The current setup
How is it all connected
This is a representation of the current setup. These are two physically separate locations with several hundred kilometers of DWDM between them. Latency is very stable at 5ms. On the right side you will find the two 100G servers i used for the previous blog.
I'll skip the routing configuration because that's a different topic; This can be done in several ways, each with its advantages and disadvantages. In my case i've configured BGP on all hosts and can now trace the whole path:
traceroute to 10.255.255.150 (10.255.255.150), 30 hops max, 60 byte packets
1 192.168.3.100 (192.168.3.100) 4.794 ms 4.793 ms 4.728 ms
2 192.168.1.200 (192.168.1.200) 4.986 ms 5.030 ms 4.954 ms
3 10.255.255.150 (10.255.255.150) 9.927 ms *
Initial results
Again, not what we were hoping for.
With the current optimizations from the previous blog, i achieve approximately 50 Gbit/s with -P 5. A lot worse than i saw between the two hosts directly. Running a single iperf (without the -P flag) is showing roughly 10Gbit.
[ 4] 0.0000-10.0170 sec 10.1 GBytes 8.66 Gbits/sec
[ 2] 0.0000-10.0169 sec 10.2 GBytes 8.75 Gbits/sec
[ 5] 0.0000-10.0169 sec 10.0 GBytes 8.58 Gbits/sec
[ 3] 0.0000-10.0168 sec 10.1 GBytes 8.65 Gbits/sec
[ 1] 0.0000-10.0167 sec 10.2 GBytes 8.71 Gbits/sec
[SUM] 0.0000-10.0008 sec 50.5 GBytes 43.4 Gbits/sec
Let's take a step back and see what's going on by doing an iPerf between two servers connected via DWDM:
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0103 sec 12.7 GBytes 10.9 Gbits/sec
[ 7] 0.0000-10.0103 sec 15.6 GBytes 13.4 Gbits/sec
[ 8] 0.0000-10.0103 sec 16.9 GBytes 14.5 Gbits/sec
[ 1] 0.0000-10.0103 sec 16.3 GBytes 14.0 Gbits/sec
[ 4] 0.0000-10.0264 sec 12.5 GBytes 10.7 Gbits/sec
[ 5] 0.0000-10.0103 sec 12.3 GBytes 10.5 Gbits/sec
[ 2] 0.0000-10.0266 sec 12.0 GBytes 10.2 Gbits/sec
[ 3] 0.0000-10.0264 sec 9.97 GBytes 8.54 Gbits/sec
[SUM] 0.0000-10.0021 sec 108 GBytes 92.9 Gbits/sec
Exactly the same iperf but now to a server 1 hop further:
[ ID] Interval Transfer Bandwidth
[ 4] 0.0000-10.0173 sec 8.54 GBytes 7.32 Gbits/sec
[ 8] 0.0000-10.0172 sec 8.16 GBytes 7.00 Gbits/sec
[ 7] 0.0000-10.0171 sec 8.58 GBytes 7.36 Gbits/sec
[ 5] 0.0000-10.0172 sec 8.38 GBytes 7.18 Gbits/sec
[ 1] 0.0000-10.0171 sec 8.48 GBytes 7.27 Gbits/sec
[ 2] 0.0000-10.0171 sec 8.94 GBytes 7.67 Gbits/sec
[ 6] 0.0000-10.0171 sec 7.91 GBytes 6.79 Gbits/sec
[ 3] 0.0000-10.0333 sec 7.83 GBytes 6.71 Gbits/sec
[SUM] 0.0000-10.0051 sec 66.8 GBytes 57.4 Gbits/sec
When enabling Receive Packet Steering I ran into a bug, so I used the command below:
find /sys/class/net/*/queues/[rt]x-[01234567]/[rx]ps_cpus -exec sh -c '[ -w {} ] && echo f > {} 2>/dev/null' \;
The speed looked a lot better after that, around 85Gbit/s of forwarding speed!
Since we still have better throughput between the hosts, I made one last adjustment to the Broadcom 100G cards, increasing the number of channels:
# ethtool -l eth2
Channel parameters for eth2:
Pre-set maximums:
RX: 37
TX: 37
Other: n/a
Combined: 74
Current hardware settings:
RX: 0
TX: 0
Other: n/a
Combined: 16
[edit]
# ethtool -L eth2 combined 74
[edit]
# ethtool -L eth3 combined 74
[edit]
After this change, the speed between the first, second and third hops is identical (92Gbps). That was quite easy!