Today I’ve had a problem in one of the servers we support, no web access, no ssh, and no console just a bunch of sentences passing so fast I couldn’t read it on the terminal. The solution a simple hard reset and the system came online, it was a hard disk failure but the system online without trouble because we were using a raid configuration. One of the disks didn’t show up in the RAID array, a few tests later and declared the hardware fault the cause of the downtime.
But why did the system came down because of a disk failure if there was a RAID system available, simple the swap was spread among the disks but not in a raid system so no redundant swap partitions, when the need for data in the swap of that file system came there wasn’t any data available and the system came to a stop.
From now on we’ll create a redundant swap partition using a RAID volume so this doesn’t happen again as a server should never stop because of a disk problem, living and learning.
Pedro M. S. Oliveira
BTW – to reassemble the array I used mdadm, bellow there is a simple usage if you want to reassemble a previous build array:
mdadm –manage /dev/md0 –add /dev/sda1
this command will add the partition /dev/sda1 to the raid array /dev/md0
if you want to learn more about RAID in linux just type man mdadm or mdadm –help