Network Monitoring – Traps vs. Polling

As a network administrator, I have been (partly) responsible for the monitoring of network infrastructures or even entire companies for many customers. For some companies I have even completely redesigned it.

Many companies currently use tools such as Nagios, Solarwinds or a similar package. These tools are ( at least, in my opinion) a first-generation software package because it takes a lot of time to set the triggers (in particular). If you also have to deal with different vendors, it becomes even more complex; every vendor has its own event codes and descriptions, which causes unequal and therefore unclear alerts.

Many parties also rely on the use of SNMP Traps. Although you will not be able to completely disable SNMP-Traps, relying on (only) these traps is dangerous. This has various reasons.

Why we cannot trust SNMP Traps

SNMP Traps work on the basis of UDP and is a single datagram which is sent by the device. Consider for example a temperature that exceeds the limit, for which a switch will send a single UDP datagram once. However, UDP does not work in the same way as TCP, because there is no check whether the UDP datagram has ever arrived. TCP has TCP Retransmissions for this, UDP has no control whatsoever.

In the real world, it can happen that a switch gets into trouble due to (for example) spannig-tree recalculation and the SNMP-Trap does not arrive as a result. These notifications will not be sent again and will never be registered. Same thing is the connection between your monitoring host and the device is unstable. Not something you can count on in the event of disruptions.

Next-generation alerts

As I wrote above, I see the mentioned software packages as first-generation monitoring, this is mainly due to the method of generating alerts. For example, alerts are often set that go off when the bandwidth consumption of a gigabit interface is used for 90% or when a hard disk has only 20% space left.

But is it useful to know that bandwidth is going through that specific interface? Maybe someone is watching a video on Netflix that first has to buffer. Is it useful to know that a hard drive only has 20% space left? If it is a small disk it might, but if it's a 2TB disk then this does not seem worth mentioning.

Future-proof approach to Monitoring and Alerts

The only way to generate correct alerts, which actually require action, is based on metrics. Metrics, metrics and i say again: metrics. You can collect these metrics via SNMP Polling on the devices and with the collected data a trend line can be mapped out. Using the above-mentioned examples, we can determine on the basis of a trend analysis whether there is more often 90% consumption on the Gigabit interface. We can also determine how long it takes before a disk is really full and whether it requires action now.

Doors are opened by collecting metrics. Much more insight into what is happening on the infrastructure, outages can be prevented instead of resolved, better advice can be given in regards to capacity and growth path and it makes it easier for administrators.

The last huge difference is that all alerts are displayed on the same (unique) way. We are no longer dependent on the different vendors!

Steps to improve or setup network monitoring

  1. Draw the network infrastructure in a real-time map view
    No network administrator likes to make network drawings. Make it fun by playing with the real-time display of trends and data and at the same time gain insight into the network and up-to-date documentation.
  2. Transform from network monitoring to service and chain monitoring
    By gaining insight into the ultimate availability of the service, it is also clear what the impact is for the customer
  3. Reclassification of alerts based on impact.
    Considering point 2, we now know what the impact is for a customer. Many services are performed redundantly, so that the impact is less. Combine this with point 1 and you have a real-time view of where the disruption occurs without first having to search for half an hour on the various equipment.
  4. Create a performance baseline
    By measuring the services of the customer on performance (response time) in combination with the availability and response time of the network, it can quickly be determined whether there is a congestion.
  5. Work with trend analysis and forecast alerting
    By making use of all collected data, many false alerts can be prevented. With this data an indication can also be made about capacity and availability in the future.

Benchmarking SSDs with fio

Fio which stands for Flexible I/O Tester is a free and open source disk I/O tool used both for benchmark and stress/hardware verification that i mainly use for benchmarking ceph or specific ssd harware.

When using an SSD make sure it's pre-warmed. This can be done using the dd command:

dd if=/dev/zero of=/dev/xvdb bs=100M &

After this you can start performance measurement with fio. My advice is to run this test for 6 to 8 hours in order to get real data out of it.

fio --filename=/dev/nvmeXnXpX --direct=1 --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=128k --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting –-name=benchtest

This command will run for 24 hours and perform write-only workload of 128k blocks on a single process.

Random Read test

sudo fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=0 --size=512M --numjobs=4 --runtime=240 --group_reporting

This will use 4 processes, run for 2 minutes and only perform read iops.

Random Write test

sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

This will to a read/write test on a 4 GB file.

Ansible Linux

Ansible through Ubuntu (WSL) on Windows 10

Windows Subsystem for Linux (WSL) allows you to run Linux straight from your Windows Desktop. I use this on a daily basis for running Ansible scripts without having to install VM's. Make sure you installed al latest updates.

Enable WSL feature

Open up a Powershell box as Administrator (search powershell, right click and run as Administrator).

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

This will initiate the installation and once finished ask if you would like to reboot your system. Go ahead and do that. When the reboot is done search for 'bash' and open that, it will first require a few anwsers. Simply fill out all the questions and once that is done you will have Ubuntu up and running.

Install Ansible

Now you are basicly in a Linux environment so you can install Ansible the typical way. Again, in the 'bash' window of course, use these instructions:

sudo apt-get -y install python-pip python-dev libffi-dev libssl-dev
sudo pip install ansible

Should you get any permission errors (i did not have this time, but given the nature of how WSL works that could happen) install pip with the --user flag. This will cause it to install ansible in the users home dir, not globally.

You are done. Using the following command you can check what ansible version is now installed:

ansible --version

If you need the most recent version check out my other post here.

Ansible Linux

Install latest version ansible on Ubuntu 16.04 / 18.04

Ubuntu doesn't ship with the newest version of ansible out of the box, sadly. You have to manually configure the PPA on your system in order to upgrade to the stable version. Follow these commands to install the PPA:

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install software-properties-common
$ sudo apt-add-repository ppa:ansible/ansible

Hit Enter when asked, and once the process is done update your apt repos:

$ sudo apt update

Now you can either upgrade or simply install ansible:

$ sudo apt install ansible

This should be all, use the following to verify the ansible version:

$ ansible --version

Ubuntu 18.04 resize/expand (root) filesystem

Running Ubuntu 18.04 i ran out of disk space on my main partition. I increased disk space in VMware and needed to expand the partitions from within Ubuntu. Start with scanning for changes on your disk

echo 1 > /sys/class/block/sda/device/rescan

Verify that you can see the new (correct) disk space using:

fdisk -l

Create a new partition using "cfdisk", navigate to the free space and hit "new". After that hit Write to make sure the partition table gets written. Close cfdisk and either reboot or rescan to update your partition table. Now it's time to add the disk space. First, find the new partition number:

fdisk -l

In my case sda3 was created to i'm going to create a new volume on it

pvcreate /dev/sda3

now extent my volume group with the newly added volume:

vgextend ubuntu-dev-box /dev/sda3

Extend the volume with all available (new) disk space:

lvextend -l+100%FREE /dev/ubuntu-dev-box/root

Now resize the filesystem:

resize2fs /dev/mapper/ubuntu--dev--box--root
Linux Networking Security

SSH Tunnel to watch Netflix

I often use a 'hopping server' when connecting to clients, that means i need to login twice each time. To make my life easier i sometimes use an SSH tunnel so i can connect to clients directly.

SSH Tunnel can also be usefull when your office blocks netflix 😉

Local Port Forwarding

This will allow you to access remote servers direcly from your local computer. Let's assume you want to use RDP (3389) to a clients hosts ( and your hopping server is 'hopping.server'

ssh -L 6000: wieger@hopping.server

Now you can open Remote Desktop and connect to 'localhost:6000', directing you through the tunnel!

Remote Port Forwarding

This will make your local service/port acccessible from a remote host. Sometimes i use this to keep a 'backdoor' and login remotely (home or whatever).

Let's say you want to make a webapplication (TCP 443) availible at port 6000 on the remote SSH server

ssh -R 6000:localhost:443

Now you should be able to connect to port 6000 on the remote host (

Dynamic Forwarding (Proxy)

This is ideal for people who want to use the internet safely/anonymous or for offices where Netflix is blocked 😉

Use a remote server to tunnel all web traffic (eg. home server), connect through SSH to it using the -D flag

ssh -D 6000

Now open up your browser settings, navigate to the connection properties and enter a Proxy server (manually using SOCKS). Use as host and 6000 as port. The tunnel will remain open as long as you are connected through SSH.


mod_pagespeed Module on Ubuntu 18.04

mod_pagespeed is an open-source Apache module created by Google to help Make the Web Faster by rewriting web pages to reduce latency and bandwidth. mod_pagespeed releases are available as precompiled linux packages or as source. (See Release Notes for information about bugs fixed)


  1. Update system

    apt update -y
    apt upgrade -y

  2. Install Apache

    apt-get install apache2 -y

  3. Enable Apache Startup

    systemctl start apache2
    systemctl enable apache2

  4. Install mod_pagespeed

    dpkg -i mod-pagespeed-stable_current_amd64.deb
    systemctl restart apache2

  5. Verify mod_pagespeed is running

    curl -D- localhost | head | grep pagespeed

Web Interface

mod_pagespeed has a very simple web-interface to see statistics. If you do not case, skip this step.

nano /etc/apache2/mods-available/pagespeed.conf

Add these lines to it:

<Location /ps_admin>
    Order allow,deny
    Allow from localhost
    Allow from
    Allow from all
    SetHandler ps-admin

<Location /ps_global_admin>
    Order allow,deny
    Allow from localhost
    Allow from
    Allow from all
    SetHandler ps_global_admin

After restarting apache you can go to http://<your url>/ps_admin


BGP Route-Leak causes a significant shift in European internet traffic to Chinese Backbone network

A series of unfortunate configuration mistakes?

Last Thursday, June 6, 2019 at 12:00 a.m. Dutch time, a two-hour outage began at various customers. Both business customers for fiber optic internet as well as private internet connections. At first, the extent of the disruption was not yet completely clear.

After several reports from customers I noticed that also on Dutch websites (such as the complaints came in from people with connection problems. It soon turned out to be a national outage that primarily affected KPN (and indirectly many other parties including national payment systems).

A first traceroute showed that the traffic between Haarlem and Amsterdam went through different (completely illogical) paths. This had all the symptoms of a BGP hijack, The Dutch website also mentioned this and expressed this suspicion.

Doing a bit more digging on the route my traffic was taking, i saw that my VPN connecting between Haarlem and Amsterdam was suddenly taking a de-tour through AS4134, known as the Chinanet Backbone Network.

For two hours, much of Europe's Internet traffic passed through Chinese networks

This incident occurred after a Swiss hosting provider (Safe Host SA) started leaking more then 70.000 faulty routes to the Chinese backbone network. This Chinese network, in turn, forwarded these IP address announcements as valid to various large Tier-1 internet providers causing a huge traffic shift toward the Chinese Backbone network. The incident caused huge impact on the networks of Swisscom (AS3303), KPN (AS1130), Bouygues Telecom (AS5410) and SFR (AS21502).

Error or intention?

The official statement is; a configuration error by the Swiss hoster SafeHost. The remarkable thing is that the incorrectly advertised ranges were smaller and more specific then the ones advertised legitimatly. Some websites mention this could indicate the use of route optimizers but Safe Host SA confirms on twitter that they do not use bgp optimalisation software.

Now that Safe Host itself indicates that it is not the use of route optimizers that caused this disruption, the question arises how it is possible that such specific routes have been (wrongly) advertised. And why only to Chinanet Backbone? Safe Host SA is still investigating and not responding to questions on Twitter.

I am very curious about the explanation of this incident, if it ever comes.

Worldwide concern

Chris C. Demchak and Yuval Shavit released a publication in 2018 entitled "The Hidden Story of China Telecom’s BGP Hijacking". In this publication they write how China has been able to divert specific traffic through Chinese POPs several times via BGP hijacks and the possible implications this has.

They also describe their concerns about the large (infrastructural) network presence of the Chinese Backbone Network in America (which is no different in the EU) while no other international network has similar presence in China. This large presence is one of the things that makes it easy for the Chinese to conduct a BGP Hijack on large scale and at the same time protecting their own infrastructure.

Using these numerous PoPs, CT (China Telecom) has already relatively seamlessly hijacked domestic US and cross US traffic and redirected it to China over days, weeks, and months as demonstrated in the examples below. The patterns of traffic revealed in traceroute research suggest repetitive IP hijack attacks committed by China Telecom.

The report argues that the Chinese government is using local ISPs for intelligence gathering by systematically hijacking BGP routes to reroute western traffic through its country, where it can log it for later analysis and provides the following examples;

  • Starting from February 2016 and for about 6 months, routes from Canada to Korean government sites were hijacked by China Telecom and routed through China
  • On October 2016, traffic from several locations in the USA to a large Anglo-American bank headquarters in Milan, Italy was hijacked by China Telecom to China
  • Traffic from Sweden and Norway to the Japanese network of a large American news organization was hijacked to China for about 6 weeks in April/May 2017.
  • Traffic to the mail server (and other IP addresses) of a large financial company in Thailand was hijacked several times during April, May, and July 2017.

Doug Madory from Oracle confirmed that AS4134 was redirecting traffic in a blogpost posted on 5th of november 2018.

In this blog post, I don’t intend to address the paper’s claims around the motivations of these actions. However, there is truth to the assertion that China Telecom (whether intentionally or not) has misdirected internet traffic (including out of the United States) in recent years. I know because I expended a great deal of effort to stop it in 2017.

How to prevent/secure this?

The problem with incidents like this is that the internet is running on the BGP protocol. BGP is a global protocol running between organizations and country's crossing international borders. There is no single centralized authority, just internet providers that collaborate based on trust. The internet is intended to be open, transparant and so all ISP's are trusted to play nice. Furthermore, there are initiatives that can improve the security of BGP in general, but these must be introduced on a large scale.

On the 12th of June, the Dutch National Cyber Security Center issued a report in which it once again shows how great the (digital) risks are for the Dutch society. In this report they also specifically mention the influence of China (among others).

"Countries such as China, Iran and Russia have offensive cyber programs against the Netherlands. This means that these countries are digital use resources to achieve both geopolitical and economic objectives to be achieved at the expense of Dutch interests "

It is time to immediately acknowledge and address these risks.

Linux Networking

Test internet speed using speedtest-cli

Speedtest-cli is a great tool to test your internet speed using the Speedtest servers, make sure you have Python installed before installing Speedtest-cli.

Installing and Using Speedtest-CLI

  1. Update APT and install packages

    apt-get update; apt-get install python-pip speedtest-cli

  2. Test your speed!

    Testing download speed........................................
    Download: 913.12 Mbit/s
    Testing upload speed..................................................
    Upload: 524.12 Mbit/s

  3. Share your speed 🙂

    speedtest-cli --share
    This will provide you with an image to share proving the speed.

Linux Networking

Load Balancing Remote Desktop

Using HAProxy to loadbalance between RDS servers is usefull if you have more then one RDS servers and want users to connect to a single IP.

  1. Install Haproxy

    sudo apt-get update
    sudo apt-get install haproxy

  2. Add the RDP VIP (virtual IP) and RDP hosts

    clitimeout 1h
    srvtimeout 1h
    listen VIP1 193.x.x.x:3389
    mode tcp
    tcp-request inspect-delay 5s
    tcp-request content accept if RDP_COOKIE
    persist rdp-cookie
    balance rdp-cookie
    option tcpka
    option tcplog
    server win2k19-A weight 10 check inter 2000 rise 2 fall 3
    server win2k19-B weight 10 check inter 2000 rise 2 fall 3
    option redispatch

Now we have HAProxy running on a 193.x.x.x ip address, when you connect to that IP it will direct you to one of the Windows 2019 machines. If one dies, it will remove it and you can reconnect to the last one that is online.