Today did not start our so well, my client ran out of home directory space. Not that this was unexpected.
Yesterday I noticed we were down to 20GB of shared space. Sounds like a lot, given that there are less than 100 people and this is unix home directory stuff. I went out a warning that we were running out of space and it was time to do some clean up. I guess it fell on deaf ears. This morning, it was down to 781MB. Someone had chewed up 19+ GB overnight.
Yeah, all of us could do that but you need to remember this is a restricted corporate network and there are no torrents and I have better internet bandwidth at home. This is a lot of space to vanish overnight in this environment.
Lets go back to yesterday evening when I noticed we were running out of disk. We will ignore how I discovered it for the moment, ask at the meeting next week ;) Luckily, I had written a script to get usage information a couple of years ago and figured that I’d just check the output fron the weekly cron job. Well… It’s broken… All of the paths to the home directories were changed. Fix script and start it running. In the past, it took 13 hours to run. I expected it would take longer this time, as the amount of space was almost doubled. It did, it took 16 hours (the new disk system is much faster).
To put all of this in perspective, this is a 5TB volume, yes, 5000 GB. assuming 100 people, that’s 50GB/person. I set an arbitrary cutoff at 40GB and got 20 people exceeding that. The ensuing email to all of them with all of their usage was amusing, especially as we went to 0 available disk around the time I sent it. The reactions were what made it amusing. I have how much disk space used? There is no way I’m using 350GB!
So after the email, I got a lot of “drop ins” hoping I could help them clean up without damaging anything. I personally nuked a 140 GB directory in a home directory and didn’t see any reduction in usage. and I could see the released space filling up almost as fast as we were deleting it. After a bit I realized what was happening… This is on a NetApp and the nightly snapshot was holding all of the released space. After deleting it (and the local safety net), we ended up with almost 1TB free. Nice start - 20% disk space recovered. I ended up sending out a note to all and sundry that we had made a great start and the immediate panic could subside, everyone could work. AND that they were to keep cleaning up. I would be sending out summaries over the next few days.
Now I have a new report to put in place AND if I play this right, I’ll have the ability to migrate all the users to individual qtrees which will make knowing how much disk space is in each acount a simple query. I guess I’ll be the first to migrate, as I happen to be one of the lowest users (and this is the account I had before, so it still has all of the data from my previous existance as well). I’ll post a note on that after I’ve done it. It will at least be an interesting exercise. I have to dress it up a bit to top the admin in Texas. She has a space pig page. I’m going to borrow from the muppets and use a pigs in space theme :D Hopefully some kind person has posted a picture of the Swinetrek to images.google.com to make my life easier.
So at the end of the day, it was a good day.