Average file size on directory using gawk

gawk gnu awk

 

While tuning and benchmarking an HP backup device (HPD2D backup system) I needed to estimate the average file size of the IMAP server storage.
You may think I could just count the number of files and the divide the used space by the number of files, but that wasn’t the case because I didn’t want all the files to count, in this case I just needed the maildir files that have the email content.
So I did a little script (just a command line) using gawk to do it for me:

find /home/vmail -type f -name [0-9]* -exec ls -l {} \; | gawk '{sum += $5; n++;} END {print sum/n;}'

First I find all the files that starts with a number on the base directory of my dovecot server (IMAP storage), and the I need to list them as the size info is needed.
Second I’ve a little gawk script that will do the sum and division.
Let me say this took a few hours as the storage holding the files has 8TB of data.
Well hope this will help some1 else.
Cheers,
Pedro

Click to access the login or register cheese