NVME / SSD block cache – bcache vs lvmcache benchmark

Why care about IO performance?

Recently I’ve acquired some new hardware and I want it to perform as fast as possible. The setup is quite trivial for a home desktop nevertheless I wanted it to excel on IO performance as it will be use as my backup server too. A common way to improve performance is by adding a cache system, this applies to many things in IT and block devices are no exception.

The relevant hardware components for this post are the 7x 2TB drives and 1x NVME card. The setup is not ideal as the models are not all the same, some to perform better than others some are newer and others older. Nevertheless  money and storage capacity was important also I wanted to use them anyway. Security is also very important, so all the data written to these drives (including the NVME card) must be encrypted. On the other side I want to be able to expand the raid devices when the time comes so I also use LVM, as file system I use XFS with the default settings.

You may wonder why didn’t I use a simpler setup with BTRFS or ZFS? Mostly because I wanted to use raid 5 or 6 and on BTRFS the stability is still and issue on this form of raid. On the other hand with ZFS it would be difficult to grow to the pool in the future.

The logical setup is as follows

  • 1x raid 5 with 6 drives (+1 hot spare)
  • LVM on top of the raid device
  • Cache device or Logical volume
  • Block encryption layer – LUKS
  • File system – XFS

The hardware list

  • NVME Samsung SSD 960 PRO 512GB
  • ST2000VN004-2E4164
  • ST2000VN004-2E4164
  • ST2000VN004-2E4164
  • WDC WD2003FYPS-27Y2B0
  • WDC WD200MFYYZ-01D45B1
  • WDC WD20EZRX-00D8PB0
  • WDC WD2000FYYZ-01UL1B1

The NVME device is used both to the OS, Home, etc, but it does contain a LVM logical volume to be used as cache for the raid device. The number of IOPS / bandwidth the NVME is rather high, it goes all the way up to 440.000 IOPS and a bandwidth of 3.5GB/s, which is quite insane and I won’t be able to exhaust with my day to day use, so it can spare a few IOPS to make my backups go a bit faster.

I’ve tested bcache and lvmcache, as a benchmark tool I’ve used iozone. I’ve done the tests with 256kB,1MB,8MB block sizes, the test file is 96GB (as it needs to be bigger than the total ram amount 64GB).

The initial test was made using the full setup without any caching system, it will be used to set a base of comparison.

Each test was done with 3 different block sizes 256K, 1MB, 8MB, test settings and cache mode for all the test is “writeback”:

  • md device, raid 5
  • lvm volume
  • luks
  • xfs

Results

Using no cache

Test setup

 

  • MD Raid 5
  • LVM lv data
  • Luks
  • XFS

Using lvmcache

Test setup

  • MD Raid 5
  • LVM lv data
  • LVM lv meta
  • LVM lv cache
  • LVM lv cache pool
  • Luks
  • XFS

Using bcache

Test setup

  • MD RAID5
  • LVM LV data
  • LVM LV cache
  • bcache volume
  • Luks
  • XFS

Test results – Benchmark graph

Conclusions

In what regards to overall performance the outcome is not as expected. LVM cache really didn’t seem to improve the system performance. In some of the tests it was quite slower than the no cache mdraid and in some other just slightly faster. Nevertheless bcache did show real improvement being faster in all the tests, some by more than 30%.

Although bcache improves the system, it’s also the most difficult system to setup, lvmcache is totally integrated in LVM tools and in the kernel, bcache requires the installation of bcache-tools as it’s not a default on most distributions.

If you fell comfortable with Linux, block devices, mdraid and LVM I would recommend it without worries, if you’re not familiar with this set of tools I would recommend you to test your setup before you run it in a server / desktop environment.

The performance benefits are worth the extra work.

Test raw report files

Bellow are the iozone generated reports and the ods spreadsheet I used to build the graphs.

iozone_test_without_cache

iozone_test_with_cache_bcache

iozone_test_with_cache_lvmcache

Benchmark results

Swap space increase on a running Linux server

swap

If you see that your server is running out of swap you should add more RAM, nevertheless this is not always possible or maybe you need that extra amount for a very specific usage.

If this is the case you just need to add some more swap to your system. There are several usage cases I’ll just cover the 2 most common ones, with and without LVM.

 

Adding swap space without LVM

If you’re not using LVM and you don’t have any other location to put your new swap partition you can do it in one of the file systems available in the system.

  • Create a file that can be use as swap, if you have more than one file system available choose the one with best performance, in this case we will use /, the file will have 16GB and it will be called extra_swap.fs.

dd if=/dev/zero of=/extra_swap.fs count=16000 bs=1048576

  • Format the file

mkswap /extra_swap.fs

  • Set the right permissions the file

chmod 600 /extra_swap.fs; chown root:root /extra_swap.fs

  • Enable it

swapon /extra_swap.fs

  • Make it permanent (if needed)

echo “/extra_swap swap swap defaults 0 0 ” >> /etc/fstab

 

Adding swap space with LVM (method 1)

This method applies if you have LVM and you’re not able to disable swap (for instance production servers that have a high system load and memory usage)

  • Add a new volume with 16G (on volume called VolumeGroupName, you will need to adjust this to the desired volume group)

lvcreate -n extra_swap_lv -L16G VolumeGroupName

  • Format the volume

mkswap /dev/VolumeGroupName/extra_swap_lv

  • Enable it

swapon  /dev/VolumeGroupName/extra_swap_lv

  • Make it permanent (if needed)

echo ” /dev/VolumeGroupName/extra_swap_lv swap swap defaults 0 0 ” >> /etc/fstab

 

Adding swap space with LVM (method2)

If you are able to disable swap for a while (<10minutes) this is the recommended method.

  • Disable  your current swap volume (please take in consideration that this can have a negative impact on performance, use with caution).

swapoff swap_volume_name

  • Expand your current volume, by adding 16GB to the volume swap_volume_name ( you will need to adjust this to the desired logical volume)

lvextend -L+16G swap_volume_name

  • Format the volume

mkswap /dev/VolumeGroupName/swap_volume_name

  • Enable it

swapon  /dev/VolumeGroupName/swap_volume_name

 

References

Check the excellent REHL manual about swap.

 

I hope you don’t have to go through this, as said before the best is to buy some new ram.

Cheers,

Pedro M. S. Oliveira

 

CentOS 7 – How to setup your encrypted filesystem in less than 15 minutes

IT-security-lg

Nowadays setting up an encrypted  file system is something that can be achieved in a matter of minutes, there’s a small drop in FS performance but it’s barely noticeable and the benefits are countless.

All the major distributions allow you to conveniently setup the encrypted volume during the installation and that is very convenient your for you laptop/desktop, nevertheless on the server-side these options are often neglected.

With this how to you’ll be able to set up your encrypted LVM volume in your CentOS 7 in 8 easy steps and less than 15 minutes.

I’m assuming that you’re running LVM already, and that you have some free space available on your volume group (in this case 249G):

 

The steps:

 

lvcreate -L249G -n EncryptedStorage storage

 

skip the shred command if you just have 15 minutes, look at the explanation bellow to see if you’re willing to do so.

 

shred -v –iterations=1 /dev/storage/EncryptedStorage

cryptsetup –verify-passphrase –cipher aes-cbc-essiv:sha256 –key-size 256 luksFormat /dev/storage/EncryptedStorage

cryptsetup luksOpen /dev/storage/EncryptedStorage enc_encrypted_storage

mkfs.ext4 /dev/mapper/enc_encrypted_storage

 

Edit /etc/cryptotab and add the following entry:

 

enc_encrypted_storage /dev/storage/EncryptedStorage none noauto

 

Edit /etc/fstab and add the following entry:

 

/dev/mapper/enc_encrypted_storage /encrypted_storage ext4 noauto,defaults 1 2

 

Finally mount your encrypted volume

 

mount /encrypted_storage

 

 

After reboot you’ll need to run these two commands to have your encrypted filesystem available on your CentOS 7 system:

 

cryptsetup luksOpen /dev/storage/EncryptedStorage enc_encrypted_storage

mount /encrypted_storage

 

 

Now the steps explained.

Step 1:

 

lvcreate -L249G -n EncryptedStorage storage

I’ve created a volume with 249GB named EncryptedStorage on my volume group storage (each distribution has a naming convention for the volume group name, so you better check yours, just type:

 

vgdisplay

The output:

— Volume group —
VG Name storage
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 499.97 GiB
PE Size 32.00 MiB
Total PE 15999
Alloc PE / Size 15968 / 499.00 GiB
Free PE / Size 31 / 992.00 MiB
VG UUID tpiJO0-OR9M-fdbx-vTil-2dty-c7PF-xxxxxx

— Volume group —
VG Name centos
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 23.51 GiB
PE Size 4.00 MiB
Total PE 6018
Alloc PE / Size 6018 / 23.51 GiB
Free PE / Size 0 / 0
VG UUID sncB8Z-0Upw-VrwH-DOPJ-hELz-377f-yyyyy

As you can see I have 2 volume groups, one installed by default on all VMs and it’s called centos, and another one installed by me called storage, in the how to I’m using the storage volume group.

Step 2:

 

shred -v –iterations=1 /dev/storage/EncryptedStorage

This command proceeds at the sequential write speed of your device and may take some time to complete. It is an important step to make sure no unencrypted data is left on a used device, and to obfuscate the parts of the device that contain encrypted data as opposed to just random data.

You may omit this step although not recommended.

Step 3:

 

cryptsetup –verify-passphrase –cipher aes-cbc-essiv:sha256 –key-size 256 luksFormat /dev/storage/EncryptedStorage

On this step we format the volume with our selected block cypher, in this case I’m using AES encryption with CBC mode, essiv IV and 256 bits key.

A block cipher is a deterministic algorithm that operates on data blocks and allows encryption and decryption of bulk data. The block cipher mode describes a way the block cipher is repeatedly applied on bulk data to encrypt or decrypt the data securely. An initial vector is a block of data used for ciphertext randomization. IV ensures that repeated encryption of the same plain text provides different ciphertext output. IV must not be reused with the same encryption key. For ciphers in CBC mode, IV must be unpredictable, otherwise the system could become vulnerable to certain watermark attacks (and this is the reason for the sha256).

 

Step 4:

 

cryptsetup luksOpen /dev/storage/EncryptedStorage enc_encrypted_storage

Here we assign and open the encrypted volume to a device that will mapped using device mapper, after this step you will be able to do regular block device operations like on any other lvm volume.

 

Step 5:

 

mkfs.ext4 /dev/mapper/enc_encrypted_storage

Format the volume with the default ext4 settings, you may use whatever flags you wish though.

 

Step 6:

Edit /etc/crypttab and the following line:

 

enc_encrypted_storage /dev/storage/EncryptedStorage none noauto

With this line we will permanently enable  /dev/storage/EncryptedStorage volume assignment to the enc_encrypted_storage mapped device.

The noauto setting is important to the server boot correctly if the blockdevice password is not entered during the boot process, this will enable you to use your custom script or manually insert the password in a later stage using ssh.

 

Step 7:

Edit /etc/fstab and add the following entry:

 

/dev/mapper/enc_encrypted_storage /encrypted_storage ext4 noauto,defaults 1 2

This is where we map the previously mapped device to a mount point, in this case /encrypted_storage, the noauto value is set due to the same reasons as in step 5.

 

Step 8

 

mount /encrypted_storage

Simple mount command, you’ll be able to store and access your files in /encrypted_storage, it will be a good place for the files you want to keep private on your CentOS system.

You may find more information about supported cyphers and options on Redhat documentation:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/

Cheers,

Pedro Oliveira

Testing BTRFS – Performance comparison on a high performance SSD (BTRfs vs Ext4)

Hi,
Today I was reading about btrfs and as never used it before I thought in giving it a try.
On my laptop I have a ssd with 256GB, there I created 2 LVM2 volumes to use and test btrfs.
It’s not the ideal solution because there’s a LVM layer but I’m not in the mood for backup,erasing,installing,erasing and installing. So the tests I’m going to do are just on the FS itself, not on all the layers that btrfs supports. A good thing in using a ssd card is that the access time is equal for all the block device and the data position on the block device is not accountable, so this is a very good opportunity to have measurements both on ext4 and btrfs.
Here’s the benchmark architecture, tools and setup:

Kernel:

Linux MartiniMan-LAP 2.6.38-31-desktop #1 SMP PREEMPT 2011-04-06 09:01:38 +0200 x86_64 x86_64 x86_64 GNU/Linux

LVM lv creation command:

lvcreate -L 20G -n TestingBTRfs /dev/mapper/system
lvcreate -L 20G -n TestingExt4fs /dev/mapper/system

LVM lvdisplay output:

--- Logical volume ---
LV Name /dev/system/TestingBTRfs
VG Name system
LV UUID zBYf0d-metk-VC9U-YkjE-z1Ts-NMLb-HzYmrJ
LV Write Access read/write
LV Status available
LV Size 20.00 GiB
Current LE 5120
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3

--- Logical volume ---
LV Name /dev/system/TestingExt4fs
VG Name system
LV UUID FJEfiv-Hs9W-zGuV-sJIo-3INN-gh52-YgmsVl
LV Write Access read/write
LV Status available
LV Size 20.00 GiB
Current LE 5120
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:4

FS creation command:

mkfs.ext4 /dev/system/TestingExt4fs
mkfs.btrfs /dev/system/TestingBTRfs

Processor:

model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz

HardDrive:

Device Model: SAMSUNG MMDOE56G5MXP-0VB

Mount command (as you can see I didn’t do any optimizations, noatime, etc):

/dev/mapper/system-TestingBTRfs on /mnt/btrf type btrfs (rw)
/dev/mapper/system-TestingExt4fs on /mnt/ext4 type ext4 (rw)

Test software:

'Iozone' Filesystem Benchmark Program

Version $Revision: 3.373 $
Compiled for 64 bit mode.

Command line for the tests:
Command line used:

 ./iozone -Ra -r4k -r8k -r16k -r32k -r64k -r128 -r1024 -r4096k -r16384k -s1g

This command was used in the btrfs and the ext4 volumes.
The options mean:
-R excel/

office compatible format.
-a auto test
-r the record size (you can see I used several (4k,8k…))
-s size of test file (I used 1GB)

Here’s the test results:

And the charts (The scale is logarithmic):

(you may download the data here: [download id=”1″])

Conclusions

As you can see on the charts for sequential reading/writing there’s a performance gain  in BTRfs with the smaller record sizes but the inverse is also true, EXT4 has more performance on larger record sizes.

If you look to the random data access while reading or writing you’ll see that EXT4 is far faster that BTRfs, and this is according to my daily usage pattern would be 70% of the access to my hard drive. To be sincere I’m a bit surprised on such a difference. I know I didn’t tune any of the file systems and the purpose of this benchmark is not having to, just playing with the defaults as most of the installations out there.

Another conclusion that is really simple to understand is that bigger record sizes mean best performance.

For now I think I’ll stick to EXT4 and LVM, who knows if I’ll sometime soon I’ll change to BTRFS, I’ll let it grow and advise you to do the same.

Cheers,

Pedro Oliveira

Using a recover CD to restore a backup made with BackupPC – BackupPC as disaster recovery

Sometimes things go wrong. We simply can’t avoid it, a simple power failure can harm your data and corrupt your system.

One of these day in a normal work day one small server I maintain add an hard disk failure (yes it’s true, it happened again for the 3rd time this month). In this system I don’t have a RAID setup so the data was lost, well no prob I thought in the end all day is on my backuppc server.

BackupPC is one of my favorite tools, it’s great to manage, easy and very flexible, I’m not going the write about using backuppc to backup data as there are plenty of docs and mailing lists out there that can give you excellent how to(s) on the subject.

Booted with OpenSuSE 11.1 DVD and selected rescue mode.

On the command prompt and using fdisk /dev/sda I partitioned the drive like the old one (both drives were sata II), but Linux is so flexible that you don’t even need to do that.

Usually I like to use a volume manager (lvm) but I was short on time and will so just created 3 partitions /sda1 (150MBfor /boot), sda2 (4GB for swap ) and sda3(100GB for /), leaving unpartitioned  the rest (400GB), I’ll be using the free space to create volumes afterwards and then move the data there.

Then formated the partitions:

mkswap /dev/sda2

mkfs.ext3 /dev/sda1

mkfs.ext3 /dev/sda3

After this I mounted the filesystems like this:

mount /dev/sda3 /mnt

created boot in /mnt – mkdir /mnt/boot

mount /dev/sda1 /mnt/boot

so now we need to get all the data in the file system… and this is the tricky part we need a ssh server to do this (we can use nfs or http download and then untar, but I still like sshd method better, it uses rsync so the transfer is really fast.)

To do this you need to set up a ssh server from a minimalistic boot system. This isn’t hard just follow the steps:

First give this machine your old ip address ex.: ifconfig eth0 192.186.1.1

Create sshd certificate, remember this certificate is just temporary so you can restore your backup.  You may delete it afterwards. To create it just type:

ssh-keygen -t rsa -f /mnt/ssh_host_rsa_key -N “”

start sshd by typing:

/usr/sbin/sshd -h /mnt/ssh_host_rsa_key

This will start up sshd with all the default options.

Now just give a password to your user root or you won’t be able to login:

passwd root

Add the backuppc public ssh key from the backup server to /root/.ssh/authorized_keys on the restore machine.

Finally accept host key on the backuppc key (you may do this by entering on the backuppc server and access the restore machine, it will ask you to had the machine key. Just accept it.) Then copy it to the backuppc user know hosts file ex.:

tail -n ~/.ssh/known_hosts >> ~backuppc/.ssh/known_hosts

Finally your done. If you find this large and complicated don’t think it’s like that, by now you may have configured and entire ssh daemon by hand.

Go to the BackupPC console, choose your host, select the backup you want and just press restore.

On the method choose rsync but on the destination dir choose /mnt. Go out and take a coffe the restore can take a while. After it’s done all you need is to reconfigure grub and maybe /etc/fstab.

Now that the restore is done just check if /mnt/etc/fstab reflects the partition scheme, change accordantly if it doesn’t.

Finally we need to setup grub edit /mnt/boot/grub/menu.lst and check if your root partition is on the right place.

Before you can run grub-install you need to mount 2 special partitions /dev and /proc, how do you do this on a mounted and running system? The answer:

mkdir /mnt/proc; mkdir /mnt/dev; mkdir /dev/sys

mount -o bind /proc /mnt/proc

mount -o bind /dev /mnt/dev

chroot /mnt

and finally the last command:

grub-install /dev/sda

if you got and ok just reboot your system, don’t forget to eject the dvd before system boots again.

I think this was my largest post, hope you find it useful

Cheers,

Pedro Oliveira