Average file size on directory using gawk

gawk gnu awk

 

While tuning and benchmarking an HP backup device (HPD2D backup system) I needed to estimate the average file size of the IMAP server storage.
You may think I could just count the number of files and the divide the used space by the number of files, but that wasn’t the case because I didn’t want all the files to count, in this case I just needed the maildir files that have the email content.
So I did a little script (just a command line) using gawk to do it for me:

find /home/vmail -type f -name [0-9]* -exec ls -l {} \; | gawk '{sum += $5; n++;} END {print sum/n;}'

First I find all the files that starts with a number on the base directory of my dovecot server (IMAP storage), and the I need to list them as the size info is needed.
Second I’ve a little gawk script that will do the sum and division.
Let me say this took a few hours as the storage holding the files has 8TB of data.
Well hope this will help some1 else.
Cheers,
Pedro

Listing storage (scsi) paths for use with multipath

Hi!

About a year ago I’ve setup some linux RHEL 6 with multipath access to an HP EVA storage.

Today I needed to do it again, and to do so i needed to list all the path available to the storage device, here’s my command line (hope it helps someone else) to list all the path and volumes:


ls /dev/sd* | grep -E -v '[0-9]' | while read D ; do F=$(echo $D) ; \
echo -n $F ; echo -n " " ; scsi_id --page=0x83 --whitelisted --device=$F | \
sort -k 2 ; done | sort -k 2

The output should be something like this:


/dev/sda 3600508b1001c927a634cedb90322b49e
/dev/sdb 3600508b4000744ff0000a00001fd0000
/dev/sdf 3600508b4000744ff0000a00001fd0000
/dev/sdj 3600508b4000744ff0000a00001fd0000
/dev/sdn 3600508b4000744ff0000a00001fd0000
/dev/sdd 3600508b4000744ff0000a000025c0000
/dev/sdh 3600508b4000744ff0000a000025c0000
/dev/sdl 3600508b4000744ff0000a000025c0000
/dev/sdp 3600508b4000744ff0000a000025c0000
/dev/sde 3600508b4000744ff0000a000025f0000
/dev/sdi 3600508b4000744ff0000a000025f0000
/dev/sdm 3600508b4000744ff0000a000025f0000
/dev/sdq 3600508b4000744ff0000a000025f0000
/dev/sdc 3600508b4000744ff0000a00002660000
/dev/sdg 3600508b4000744ff0000a00002660000
/dev/sdk 3600508b4000744ff0000a00002660000
/dev/sdo 3600508b4000744ff0000a00002660000


As you can see I’ve one available disk on this server, actually this one is a RAID1 (HW) config (sda), 16 paths to my storage device, that delivers 4 different volumes (4 paths to each volume).
Sometime later I’ll discuss the multipath configuration but for now i just wanted to leave the command line that help me list all the paths ids.
Cheers,
Pedro Oliveira

Click to access the login or register cheese