Categories
Linux

Apply Database Partitions to a live Zabbix database – without downtime

Due to the growth of our database (> 1TB), the 'housekeeper' no longer worked properly. The best solution to this problem is to apply Database Partitioning, however with a database of this size this takes a lot of time if you want to keep the data. We tried this action in several ways, the one below was the only way we were able to implement partitioning without downtime.

The example below must be repeated for each table and takes several hours per table.

# Create temporary partition 
CREATE TABLE `history_log_tmp` LIKE `history_log`;
# Apply partitioning
CALL partition_maintenance('zabbix', 'history_log_tmp', 30, 24, 3);

# Rename tables so the new empty table will be used by Zabbix. Leaving the old one as backup
BEGIN;
RENAME TABLE history_log TO history_backup_log;
RENAME TABLE history_log_tmp TO history_log;
COMMIT;

# Output all data from backup table to file
SELECT * INTO OUTFILE '/var/lib/mysql-files/history_backup_log.sql' FROM history_backup_log;

# Open MySQL Shell and start import
mysqlsh
shell.connect('localhost:3306')
util.importTable("/var/lib/mysql-files/history_backup_log.sql", {schema: "zabbix", table: "history_log", columns: ["itemid","clock","value","ns"], dialect: "default", skipRows: 0, showProgress: true, fieldsOptionallyEnclosed: false, linesTerminatedBy: "\n",threads: 2, bytesPerChunk: "50M", maxRate: "10M"})
Categories
Ansible Security

Ansible Tower – Custom Credentials Type

Within playbooks you occasionally connect to external applications or services, in my case Zabbix and ServiceNow. Because I also need login details and do not want to leave this plain text in playbooks, I use a 'Custom Credentials Type'. The advantage of this is that I can use the login details within a playbook (as a macro) and they are stored encrypted in Ansible Tower.

I first create a new credential type by defining the fields it will have and how these will be passed to my playbook. Credential types consist of two parts – “inputs” and “injectors“.

  • Inputs:
    define the value types that are used for this credential – such as a username, a password, a token, or any other identifier that’s part of the credential.
  • Injectors:
    describe how these credentials are exposed for Ansible (or us) to use – this can be Ansible extra variables, environment variables, or templated file content.

Both these configurations are specified as YAML or as JSON. In my case, the new credential type is called "ServiceNow" and i’m providing the instance, username and password as part of this credential type:

fields:
  - id: instance
    type: string
    label: ServiceNow Instance
  - id: username
    type: string
    label: ServiceNow Username
  - id: password
    type: string
    label: ServiceNow password
    secret: true
required:
  - instance
  - username
  - password

Then in the Injector configuration:

extra_vars:
  snow_instance: '{{ instance }}'
  snow_password: '{{ password }}'
  snow_username: '{{ username }}'

Now go to Credentials and add a new one, selecting "ServiceNow" as Credential Type:

Thats it! When you link this credential to your host, or playbook, you can use this credentials from within your playbook!

Categories
Networking

Network Monitoring – Traps vs. Polling

As a network administrator, I have been (partly) responsible for the monitoring of network infrastructures or even entire companies for many customers. For some companies I have even completely redesigned it.

Many companies currently use tools such as Nagios, Solarwinds or a similar package. These tools are ( at least, in my opinion) a first-generation software package because it takes a lot of time to set the triggers (in particular). If you also have to deal with different vendors, it becomes even more complex; every vendor has its own event codes and descriptions, which causes unequal and therefore unclear alerts.

Many parties also rely on the use of SNMP Traps. Although you will not be able to completely disable SNMP-Traps, relying on (only) these traps is dangerous. This has various reasons.

Why we cannot trust SNMP Traps

SNMP Traps work on the basis of UDP and is a single datagram which is sent by the device. Consider for example a temperature that exceeds the limit, for which a switch will send a single UDP datagram once. However, UDP does not work in the same way as TCP, because there is no check whether the UDP datagram has ever arrived. TCP has TCP Retransmissions for this, UDP has no control whatsoever.

In the real world, it can happen that a switch gets into trouble due to (for example) spannig-tree recalculation and the SNMP-Trap does not arrive as a result. These notifications will not be sent again and will never be registered. Same thing is the connection between your monitoring host and the device is unstable. Not something you can count on in the event of disruptions.

Next-generation alerts

As I wrote above, I see the mentioned software packages as first-generation monitoring, this is mainly due to the method of generating alerts. For example, alerts are often set that go off when the bandwidth consumption of a gigabit interface is used for 90% or when a hard disk has only 20% space left.

But is it useful to know that bandwidth is going through that specific interface? Maybe someone is watching a video on Netflix that first has to buffer. Is it useful to know that a hard drive only has 20% space left? If it is a small disk it might, but if it's a 2TB disk then this does not seem worth mentioning.

Future-proof approach to Monitoring and Alerts

The only way to generate correct alerts, which actually require action, is based on metrics. Metrics, metrics and i say again: metrics. You can collect these metrics via SNMP Polling on the devices and with the collected data a trend line can be mapped out. Using the above-mentioned examples, we can determine on the basis of a trend analysis whether there is more often 90% consumption on the Gigabit interface. We can also determine how long it takes before a disk is really full and whether it requires action now.

Doors are opened by collecting metrics. Much more insight into what is happening on the infrastructure, outages can be prevented instead of resolved, better advice can be given in regards to capacity and growth path and it makes it easier for administrators.

The last huge difference is that all alerts are displayed on the same (unique) way. We are no longer dependent on the different vendors!

Steps to improve or setup network monitoring

  1. Draw the network infrastructure in a real-time map view
    No network administrator likes to make network drawings. Make it fun by playing with the real-time display of trends and data and at the same time gain insight into the network and up-to-date documentation.
  2. Transform from network monitoring to service and chain monitoring
    By gaining insight into the ultimate availability of the service, it is also clear what the impact is for the customer
  3. Reclassification of alerts based on impact.
    Considering point 2, we now know what the impact is for a customer. Many services are performed redundantly, so that the impact is less. Combine this with point 1 and you have a real-time view of where the disruption occurs without first having to search for half an hour on the various equipment.
  4. Create a performance baseline
    By measuring the services of the customer on performance (response time) in combination with the availability and response time of the network, it can quickly be determined whether there is a congestion.
  5. Work with trend analysis and forecast alerting
    By making use of all collected data, many false alerts can be prevented. With this data an indication can also be made about capacity and availability in the future.