NVME / SSD block cache – bcache vs lvmcache benchmark

Why care about IO performance?

Recently I’ve acquired some new hardware and I want it to perform as fast as possible. The setup is quite trivial for a home desktop nevertheless I wanted it to excel on IO performance as it will be use as my backup server too. A common way to improve performance is by adding a cache system, this applies to many things in IT and block devices are no exception.

The relevant hardware components for this post are the 7x 2TB drives and 1x NVME card. The setup is not ideal as the models are not all the same, some to perform better than others some are newer and others older. Nevertheless  money and storage capacity was important also I wanted to use them anyway. Security is also very important, so all the data written to these drives (including the NVME card) must be encrypted. On the other side I want to be able to expand the raid devices when the time comes so I also use LVM, as file system I use XFS with the default settings.

You may wonder why didn’t I use a simpler setup with BTRFS or ZFS? Mostly because I wanted to use raid 5 or 6 and on BTRFS the stability is still and issue on this form of raid. On the other hand with ZFS it would be difficult to grow to the pool in the future.

The logical setup is as follows

  • 1x raid 5 with 6 drives (+1 hot spare)
  • LVM on top of the raid device
  • Cache device or Logical volume
  • Block encryption layer – LUKS
  • File system – XFS

The hardware list

  • NVME Samsung SSD 960 PRO 512GB
  • ST2000VN004-2E4164
  • ST2000VN004-2E4164
  • ST2000VN004-2E4164
  • WDC WD2003FYPS-27Y2B0
  • WDC WD200MFYYZ-01D45B1
  • WDC WD20EZRX-00D8PB0
  • WDC WD2000FYYZ-01UL1B1

The NVME device is used both to the OS, Home, etc, but it does contain a LVM logical volume to be used as cache for the raid device. The number of IOPS / bandwidth the NVME is rather high, it goes all the way up to 440.000 IOPS and a bandwidth of 3.5GB/s, which is quite insane and I won’t be able to exhaust with my day to day use, so it can spare a few IOPS to make my backups go a bit faster.

I’ve tested bcache and lvmcache, as a benchmark tool I’ve used iozone. I’ve done the tests with 256kB,1MB,8MB block sizes, the test file is 96GB (as it needs to be bigger than the total ram amount 64GB).

The initial test was made using the full setup without any caching system, it will be used to set a base of comparison.

Each test was done with 3 different block sizes 256K, 1MB, 8MB, test settings and cache mode for all the test is “writeback”:

  • md device, raid 5
  • lvm volume
  • luks
  • xfs

Results

Using no cache

Test setup

 

  • MD Raid 5
  • LVM lv data
  • Luks
  • XFS

Using lvmcache

Test setup

  • MD Raid 5
  • LVM lv data
  • LVM lv meta
  • LVM lv cache
  • LVM lv cache pool
  • Luks
  • XFS

Using bcache

Test setup

  • MD RAID5
  • LVM LV data
  • LVM LV cache
  • bcache volume
  • Luks
  • XFS

Test results – Benchmark graph

Conclusions

In what regards to overall performance the outcome is not as expected. LVM cache really didn’t seem to improve the system performance. In some of the tests it was quite slower than the no cache mdraid and in some other just slightly faster. Nevertheless bcache did show real improvement being faster in all the tests, some by more than 30%.

Although bcache improves the system, it’s also the most difficult system to setup, lvmcache is totally integrated in LVM tools and in the kernel, bcache requires the installation of bcache-tools as it’s not a default on most distributions.

If you fell comfortable with Linux, block devices, mdraid and LVM I would recommend it without worries, if you’re not familiar with this set of tools I would recommend you to test your setup before you run it in a server / desktop environment.

The performance benefits are worth the extra work.

Test raw report files

Bellow are the iozone generated reports and the ods spreadsheet I used to build the graphs.

iozone_test_without_cache

iozone_test_with_cache_bcache

iozone_test_with_cache_lvmcache

Benchmark results

Decode CDP or LLDP packets with tcpdump

If you don’t have LLPDd available you may use tcpdump to get all LLDP information. Bellow are a few examples on how you can do it.CDP

## This will often show you the Cisco chassis switch, then use your firms asset management software to find the upstream switch.
## -s 1500 capture 1500 bytes of the packet (typical MTU size)
## ether[20:2] == 0x2000 – Capture only packets that are starting at byte 20, and have a 2 byte value of hex 2000

interface=eth0 ; tcpdump -i ${interface}-v -s 1500 -c 1 'ether[20:2] == 0x2000'


LLDP

## Switch:

interface=eth0 ; tcpdump -i ${interface} -s 1500 -XX -c 1 'ether proto 0x88cc'


## Port and CDP Neighbor Info:

interface=eth0 ; tcpdump -i ${interface} -v -s 1500 -c 1 '(ether[12:2]=0x88cc or ether[20:2]'


If you need more info about CDP or LLDP have a look on the links bellow.
CDP stands for Cisco Discovery Protocol, which is a layer 2 protocol and is used to share information about other directly connected Cisco equipment (WikiPedia). LLDP stands for Link Layer Discovery Protocol and replaces CDP. LLDP is a vendor-neutral Data Link Layer protocol used by network devices for advertising of their identity, capabilities and neighbours (WikiPedia).This is useful to find out what VLAN your network interface is connected to (assuming that your using tagged VLANS), or what port am I plugged into on which switch.

 

pv – Concatenate files, or stdin, to stdout, with monitoring

PV - botleneck

Few days ago I had the need to debug the output of a stream, the problem was that the output is bandwidth is not always constant and that seamed to affect the input application didn’t behave the same with different workloads.

A colleague told me to check pv. As I have never used it before I checked the man page first and it seamed promising. Bellow there are some details about it.

pv can be used to:

progress show progress bar
timer show elapsed time
eta show estimated time of arrival (completion)
rate show data transfer rate counter
average-rate show data transfer average rate counter
bytes show number of bytes transferred
format FORMAT set output format to FORMAT
numeric output percentages, not visual information

you can also change the standard behaviour of a pipe (and probably this is the most interesting part), you’ll be able to:

rate-limit RATE limit transfer to RATE bytes per second
buffer-size BYTES use a buffer size of BYTES
skip-errors skip read errors in input
stop-at-size stop after –size bytes have been transferred

Here are 3 pv usage examples:

Limit bw available within a pipe:

In this case I’ll limit the write of a file to 1024MB/s, while writing a 10MB file (please note that the limits on both dd and pv are in bytes)

dd count=10 bs=1048576 if=/dev/zero | pv -L 1048576 | dd of=to_delete.file
10+0 records in [1021kiB/s] [ <=> ]
10+0 records out
10485760 bytes (10 MB) copied, 9.84052 s, 1.1 MB/s
10MiB 0:00:09 [1.01MiB/s] [ <=> ]
20400+100 records in
20480+0 records out
10485760 bytes (10 MB) copied, 9.93452 s, 1.1 MB/s

Write only a 5 MB file from the pipe:

dd count=10 bs=1048576 if=/dev/zero | pv -S -s 5242880 | dd of=to_delete.file
5MiB 0:00:00 [92.4MiB/s] [===========================================>] 100%
10240+0 records in
10240+0 records out
5242880 bytes (5.2 MB) copied, 0.073623 s, 71.2 MB/s

Increase the buffer size for faster transfers

With default buffers (512KB):

dd count=500 bs=1048576  if=/dev/zero | pv | dd of=/dev/null
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 0.684948 s, 765 MB/s
500MiB 0:00:00 [ 731MiB/s] [ <=>                                                                       ]
1024000+0 records in
1024000+0 records out
524288000 bytes (524 MB) copied, 0.683847 s, 767 MB/s

With a bigger buffer (5MB):

dd count=500 bs=1048576  if=/dev/zero | pv -B 5242880 | dd of=/dev/null
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 0.667252 s, 786 MB/s
500MiB 0:00:00 [ 750MiB/s] [ <=>                                                                       ]
1024000+0 records in
1024000+0 records out
524288000 bytes (524 MB) copied, 0.667482 s, 785 MB/s

If you want to have more information or check some other use cases you may check this post on cyberciti.

See you next time,

Pedro Oliveira

 

 

MySQL-ZRM and BackupPC – CentOS 7

MySQL-ZRM and BackupPC for the resque

Backups can be a tricky thing, all of us that did system administration, maintenance, or system engineering or architecture at some point had to choose a backup mechanism, that depending on the requirements can be a simple bash script that uses tar or rsync, or a robust solution like BackupPC or Bacula, backup appliances and so on.

Today while reading the BackuPC mail list someone asked about the best way to use it backup a MySQL DB, as always a multitude of options, one of my favourite ones is using MySQL-ZRM and BackupPC. I’m a fan boy of BackupPC, I’ve used it for years both in personal projects as in different enterprise projects, I’m not going to describe how to install or how to make BackupPC run on your system. There is a lot of online information about this (just check BackupPC home page).

Although BackupPC is a great tool it won’t guarantee the status of your databases on the moment of the copy, for that you need another tool, my favourite one is MySQL-ZRM. MySQL-ZRM will make sure that your new MySQL or mariadb backup is consistent, this backup can be retrieved by BackupPC and stored in the backup server.

Procedure

Installing MySQL-ZRM on CentOS 7

As the title of the post says I’ll be using CentOS 7, so the first thing I need is to install the Epel repo on my CentOS 7 server:

rpm -Uvh https://ftp.fau.de/epel/7/x86_64/e/epel-release-7-1.noarch.rpm

Now that we have the repo installed we need to install MySQL-ZRM

yum install -y MySQL-zrm

Considerations on MySQL-ZRM on CentOS 7

There are two main differences in the configuration, the mode of the backup that can be:

  • RAW
  • Logical

Raw mode will make sure you that you’ll have the best performance possible during the backup, nevertheless it will need that you use LVM and I would only advice you to use it if you’re familiar with the concept. To start with you should have a logical volume for your mysql data dir (usually /var/lib/mysql/), then you should have available space on your volume group. At least double the space that you would need for MySQL operation during the backup, but please be generous here as if your ran out of space you will truncate your DBs. On the other hand the considerations over the performance may not be true as they will vary with your use case, RAW will make sure you that there were will be not locks on the DB during the time of the backup. If you really need performance to be unaltered during the seconds or minutes of the backup I would recommend a master/slave setup where you would to the backups from the slave host, thus not impacting the master.

The Logical backup mode doesn’t have any special requirements, nevertheless you’ll be “write locking” the tables during the time of the backup, with recent hardware even big backups can be fast, but if you are talking of a 200GB DB miracles won’t happen, in this cases I would recommend the RAW mode.

 

Setting up your MySQL server to make it suitable for MySQL-ZRM

To make your MySQL server suitable for MySLQ-ZRM you need to create a user with the right set of permission, also if you are not backing up data on the same server that you’re running MySQL-ZRM you need to enable TCP on your mysql.

Create mysql user with the correct set of permissions

mysql -h localhost -p # or whatever IP or hostname where your MySQL lives

grant select, insert, update, create, drop, reload, shutdown, alter, super, lock tables, replication client on *.* to ‘backupuser’@’localhost’ identified by ‘very secret password‘;

Setting up MySQL-ZRM on CentOS 7

After installing MySQL-ZRM we need to set it up, to do this we need to edit its configuration:

The config file is located at:

/etc/mysql-zrm/mysql-zrm.conf

In this example we will use the Logical backup mode the main configuration changes are:

    backup-mode=logical
destination=/var/lib/mysql-zrm # backups destination folder (can be a NFS share, smb share, usb mount point, etc)
retention-policy=15D # How many days to keep the backup on the destination folder.
compress=1 # compress backups 1 = enabled, 0 = disabled
compress-plugin=/usr/bin/gzip # you’re able
all-databases=1 # do you want to backup all the databases on the mysql server? In this case we do
user=”backupuser” # authorized user to backup your databases
password=”very secret password” # the password
host=”your.server.hostname” # server host name
routines=1  # do we want to backup MySQL routines? In this case yes
verbose=0 # do we want the log to be verbose
mailto=”backup-list@linux-geex.com” # backup admin email, if you have a local MTA correctly configured you’ll receive an email if backups didn’t finish properly, this will depend on the email policy described below
mail-policy=only-on-error

If you are backing up a remote server you’ll also need to enable TCP transfers on my.cnf, this can be achieved by setting on the [mysql] section:

port = 3306

Please keep in mind that you should be very careful when exposing MySQL, so set your iptables firewall to only allow IP connections to the backup server and other desired mysql clients, bellow is an example of how to do it:

iptables -I INPUT -m tcp -p tcp –dport 3306 -i eth0 -s 10.2.3.4/32 -m comment -j ACCEPT –comment “Allow access to web server”

iptables -I INPUT -m tcp -p tcp –dport 3306 -i eth0 -s 10.2.3.5/32 -m comment -j ACCEPT –comment “Allow access to MySQL-ZRM server”

Where –dport is destination port, -i eth0 is the interface where you want the filter to be active (you may skip it and it will be active in all the interfaces), -s IP are the allowed IPs, and -j ACCEPT is the target for the rule, in this case ACCEPT the package.

 

Setting up MySQL-ZRM backup frequency

MySQL-ZRM uses cron to do the backups, so the frequency is the one defined in the cron entry, many just use root crontab to do everything, although this is possible it’s not the most correct way of doing it.

Again there are multiple possibilities of doing this:

  • Use mysql-zrm-scheduler, this is a tool that will help you create the crontab entry with the correct parameters, you can check how it works just by typing  mysql-zrm-scheduler on the command line.
  • Edit the crontab entry directly if you know the parameters (my favourite and all the parameters are also very well documented)

For a once a day backup of your database you would need to create the following file:

/etc/cron.d/mysql-zrm

With the following content:

0 1 * * * root /usr/bin/zrm-pre-scheduler –action backup –backup-set `hostname -s` –backup-level 0 –interval daily

0 3 * * * root /usr/bin/mysql-zrm –action purge

This will trigger a backup every night at 1:00 AM, and it will also trigger a purge of the old content at 3:00 AM, please not that if you’re backing up another server than localhost you should replace hostname -s for the FQDN of the desired server.

 

Integrating with BackupPC

Integration may be achieve by 2 distinct means:

  1. Let BackupPC retrieve the files from the destination folder specified above, easiest and probably will suit most setups.
  2. Trigger backup execution within BackupPC.

I’ll focus on the second option as the first one is enabled by default if you include the destination in the folders to be backed up by BackupPC.

 

MySQL-ZRM scheduler configuration if integrated with BackupPC

Edit your /etc/cron.d/mysql-zrm like this:

0 3 * * * root /usr/bin/mysql-zrm –action purge

As you see the there’s one entry that is missing, the command execution will be triggered by BackupPC.

 

BackupPC triggering MySQL-ZRM configuration

I’ll assume you already have your BackupPC server configured and that the destination folder is already in the path to be backed up.

  1. Login to BackupPC web interface
  2. Select the server that holds the DBs to be backedup
  3. Choose “Edit config”
  4. Choose “Backup Settings” tab (default)
  5. Bellow “User Commands” there is a text box with the name “DumpPreUserCmd” where you’ll insert:

mysql-zrm-backup -backup-set `hostname -s` –backup-level 0

Conclusion

Setting up MySQL backups is not a hard task, there are a multitude of options out there this is just one of them. I would recommend you guys to have a deep look at the official BackupPC and MySQL-ZRM documentation. This post touches just the surface of what those two pieces of software can do.

As important as doing backups is a good test on recovering the data to the desired state, it’s not enough to be able to list the backup content, you should be able to restore the full service, then you must check if you are able to do it from a full backup, then do it based on and differential or incremental backup. It’s also important to know what are those and be “fluent” with the backup software. This may be the difference between a headache and getting your head cut.

Keep calm and keep your backups up to date!

Pedro M. S. Oliveira

Click to access the login or register cheese