Average file size on directory using gawk

gawk gnu awk


While tuning and benchmarking an HP backup device (HPD2D backup system) I needed to estimate the average file size of the IMAP server storage.
You may think I could just count the number of files and the divide the used space by the number of files, but that wasn’t the case because I didn’t want all the files to count, in this case I just needed the maildir files that have the email content.
So I did a little script (just a command line) using gawk to do it for me:

find /home/vmail -type f -name [0-9]* -exec ls -l {} \; | gawk '{sum += $5; n++;} END {print sum/n;}'

First I find all the files that starts with a number on the base directory of my dovecot server (IMAP storage), and the I need to list them as the size info is needed.
Second I’ve a little gawk script that will do the sum and division.
Let me say this took a few hours as the storage holding the files has 8TB of data.
Well hope this will help some1 else.

Listing storage (scsi) paths for use with multipath


About a year ago I’ve setup some linux RHEL 6 with multipath access to an HP EVA storage.

Today I needed to do it again, and to do so i needed to list all the path available to the storage device, here’s my command line (hope it helps someone else) to list all the path and volumes:

ls /dev/sd* | grep -E -v '[0-9]' | while read D ; do F=$(echo $D) ; \
echo -n $F ; echo -n " " ; scsi_id --page=0x83 --whitelisted --device=$F | \
sort -k 2 ; done | sort -k 2

The output should be something like this:

/dev/sda 3600508b1001c927a634cedb90322b49e
/dev/sdb 3600508b4000744ff0000a00001fd0000
/dev/sdf 3600508b4000744ff0000a00001fd0000
/dev/sdj 3600508b4000744ff0000a00001fd0000
/dev/sdn 3600508b4000744ff0000a00001fd0000
/dev/sdd 3600508b4000744ff0000a000025c0000
/dev/sdh 3600508b4000744ff0000a000025c0000
/dev/sdl 3600508b4000744ff0000a000025c0000
/dev/sdp 3600508b4000744ff0000a000025c0000
/dev/sde 3600508b4000744ff0000a000025f0000
/dev/sdi 3600508b4000744ff0000a000025f0000
/dev/sdm 3600508b4000744ff0000a000025f0000
/dev/sdq 3600508b4000744ff0000a000025f0000
/dev/sdc 3600508b4000744ff0000a00002660000
/dev/sdg 3600508b4000744ff0000a00002660000
/dev/sdk 3600508b4000744ff0000a00002660000
/dev/sdo 3600508b4000744ff0000a00002660000

As you can see I’ve one available disk on this server, actually this one is a RAID1 (HW) config (sda), 16 paths to my storage device, that delivers 4 different volumes (4 paths to each volume).
Sometime later I’ll discuss the multipath configuration but for now i just wanted to leave the command line that help me list all the paths ids.
Pedro Oliveira

Cloud computing – A must, a hype or something you had with a different name?

Usually I write about technical stuff, or my rc cars, but this time I’m going to write about cloud computing, which isn’t that technical.

While reading two magazines today one had in the cover “Cloud computing you can’t afford to leave this one out” and the other “Cloud computing a must for every company”.

So, if your in IT certainly heard about cloud computing, but lets start by defining cloud computing; cloud computing is is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that supports them (Wikipedia definition).

Having said this you probably are using the cloud, if you use gmail,hotmail, or something like that, apart from the mail service you may be using picasa storage, dropbox, or even HI5 or Facebook to share photos and if you use a blog is probable that’s on the cloud too.  But the cloud concept is wider. Imagine that your company as all the info on the cloud, all the applications that support your business, and that your systems are on the cloud too. You just leave your cheap PC clients, or thin clients, or whatever equipment you use to connect to the Internet and your piece of the cloud.

In theory this is a great tool, you won’t have to worry about uptime, backups, system maintenance, sys admins, power failures,air conditioning, but on the other end you’ll be dependent on your providers and your ISP. You won’t be free to change and you won’t be so versatile, your choices will be your providers choices and in the end applications and systems won’t be made to suit your needs but they’ll suit part of your needs and all your provider needs. Apart from that you’ll probably end spending more than you would if you had your own IT.

Sometime ago I was thinking in using amazon S3 for backing up my personal  data, photos, personal movies, my documents, as well as my family ones. Right now I’ve a BackupPC on a server to do it all and backing up about 3.5TB of info. With my usage profile amazon would cost me about 350€ a month, so as fast as I though in using amazon I lost the idea of using it, with 2 month of service I could buy a new server to do all the backup and with another month of service I could pay electric bill,space, and man work hour for a year.

Then a client that happily uses Sugar CRM, heard about “the cloud” and thought that easily could migrate sugar to SalesForce and all the applications on the company to Google Apps. So we asked for prices and the price of the cloud was about 960% more than the regular prices of applications and Sugar licenses, and this including all the system maintenance, space and electric costs.

So I started wondering, in the end I don’t see people pay less for the cloud usage, I see people having a smaller initial cost that in the end will be much greater than the original one.

I’m sure many of you had already made your own investigations about the cloud? Are you getting to the same conclusions?

Till now I’ve been writing  about costs, now lets get to  flexibility and limitations.

Usually when talking about the cloud everyone sells you that the cloud is flexible, that the cloud will suit your needs and that it will grow when your business grow and get smaller when your business is going through a bad time.

In the end your cloud won’t be that flexible, most of “cloud providers” will have well established limits on amount of CPU usage/time, there will be limits on bandwidth, limits on connections per second and if you need to pass those limits you’ll be paying a lot for it. Then the small letter of the contract, sometimes you can have more processor power because you needed it but then you have to keep it for the minimum period, sometimes a year or even more.

But well the cloud is cutting edge innovation so this is something worth paying for. Once again this isn’t totally true, IBM as a cloud scheme running for decades, corporate clients may pay for processor, MIPs, processor time and memory usage. Apart from IBM, other companies worked like this for ages, companies like HP, SUN, and others.

So what’s new? In my opinion the news are the way you interact with the cloud, making the browser the central part and unification point. The larger bandwidth available today also made this possible and the content is much richer.

I can see a really good usage for the home user who don’t want to worry with tech things, I see youtube, twitter, hi5, facebook and others growing and companies using those with a business mind, honestly I don’t see companies putting their secrets, their know how, their experience, and their core on the hand of a cloud, I may be wrong but right now I don’t see it moving that way (maybe I need glasses). I see a big fuss on the cloud as I’ve seen the .com bubble and IT recession, I’ve seen the thin-client revolution and the virtualization boom, now I see the cloud hype and in a few months or years something new will come up and all this will be forgotten. I’ll see companies moving towards a new hype and I investors spending they bucks on something else.

So to conclude; I don’t think the cloud is a must, I think it’s something that you already had with a different name, and  it became an hype because of a lot of marketing and publicity. If you think a little bit you’ll see who wins  with all the hypes, usually isn’t your company nor mine.


Pedro Oliveira

Using grep (and other gnu tools) to create a email list file

I’m not a fan of mass mail but sometimes it’s useful, just imagine that you want to send a email change notice not only to your personal contacts but to everyone you sent/got email from.

There’s a way, it’s a bit time consuming but it works very well (if you have a lot of email like i do it may take a few hours). This command line is meant to search mail in your kail storage dir, it will create you a file called EMAIL_LIST.txt in your home dir.

grep -r “From\: \| To\:” /home/$HOME/.$DESKTOP_SESSION/apps/kmail/* |grep -v “X\-Envelope” | grep “[A-Z0-9._%+-]\@” | tr -s ” ” | sed s/”\””/””/g | cut -d”:” -f 4 | cut -d “<” -f 2 | grep “>” | cut -d “>” -f 1 | grep @ |sort| uniq -u > /home/$HOME/EMAIL_LIST.txt

if you want to run it in your home folder the command line will be:

grep -r “[A-Z0-9._%+-]\@” $HOME/* | tr -s ” ” | sed s/”\””/””/g | cut -d”:” -f 4 | cut -d “<” -f 2 | grep “>” | cut -d “>” -f 1 | grep @ |sort| uniq -u > /home/$HOME/EMAIL_LIST.txt

Cheers and hope it helped