26 June 2007

Script to alert on high disk utilization

Below I've got a simple script that will check your disk utilization every time it's run and then alert you if it exceeds a certain level. It could be run as often as you like from cron. This script is not pretty but it's effective. It should probably be modified for your particular situation before you use it.

Although it's pretty simple, some parts of it may be mysterious if you're not familiar with the various Unix utilities it uses. Let's take a look at the code first, then I'll briefly explain some of the parts which you may want to tweak, and I'll touch on some of the other parts that would be useful to play with on the command line to learn more. So, to the script:

#!/usr/bin/env bash

if ! df -t ufs |grep ' \(9[5-9]\|10[0-9]\)% ' > /dev/null ; then
   exit 0

for MP in `df -t ufs | grep ' 9[5-9]% ' | awk '{ print $6 }'` ; do
   PU=`df -t ufs | grep " ${MP}$" | awk '{ print $5 }'`
   MSG="file system $MP at $PU"
   logger -p local3.warn -t `hostname | sed 's/\..*$//'` $MSG

for MP in `df -t ufs | grep ' 10[0-9]% ' | awk '{ print $6 }'` ; do
   PU=`df -t ufs | grep " ${MP}$" | awk '{ print $5 }'`
   MSG="${MP}: file system full, $PU"
   logger -p local3.crit -t `hostname | sed 's/\..*$//'` $MSG

What this script is doing is just checking the output of the df (disk free) command and then sending a message to the system log if it exceeds a certain level. I've got separate loops for high disk utilization and full disk utilization (100%+) so that you can use different messages. The first, small code block, the if clause, is to improve efficiency by running df just once when there is no alerting condition (presumably, the case most of the time).

One thing you might want to change in this script is the logging commands, which use logger. It may be preferable to you to send an email instead of a log message (or maybe both). So, for example, you might change the logger lines above to something like this:

   echo $MSG | mail -s "`hostname | sed 's/\..*$//'` disk warning" bsdguy@fake.net

Of course, you could also do both by just adding the mail line before or after the logger line. If you use email, though, consider carefully how often you will run the job and how long it might take you to get to the machine and correct the issue. Don't spam yourself with a ton of mail! A good approach would be to break this into two scripts which are run at different frequencies and perhaps with different alerting methods (this would also make the "if" block unnecessary).

Another thing you might want to change is the level at which the script will alert. Notice that I'm just using grep to identify the high percentages. So, the part of the regular expression that says '9[5-9]' is matching any number from 95 to 99. If I wanted to change that to match anything over 80% then I'd replace it with '[89][0-9]'. If I really had to match anything at 85% and higher then I'd need an alternation and would use this: '8[5-9]\|9[0-9]'.

The regular expression '10[0-9]' is not a mistake. FreeBSD's filesystem has a reserve which means it can go over 100% utilization in df! ... Be careful cutting and pasting: the spaces inside the quotes of the regular expressions are needed. ... If you're still learning Unix, try taking apart the pipelines in the script (several commands connected by '|') and running them on the command line in parts. For example, first run "df -t ufs" then run "df -t ufs | grep ' 9[5-9]% ' ", and so on. ... This script will run fine on Linux too (either as is or with very minor modifications).


06 June 2007

How many disklabel partitions should I have?

In a previous post, I talked about my typical disk layout. I don't do anything radical. From the standard, auto-defaults I eliminate the "/tmp" partition, linking it back to "/var/tmp", and then tweak the sizes of the partitions, based on factors like the machine's future role and total space available. To be clear, I'm just talking about the file system partitions here, not the swap partition(s).

Why change the layouts, why does it matter? If you've got a big disk, why shouldn't you make lots of partitions?

The problem with lots of partitions is that it creates bottlenecks. Rather than having your entire disk available, you really only have the space in the partition that is being used at the time. Exceed that and you'll get the dreaded "warning: filesystem full" message and probably application errors. So, in the absence of any other information, the best partition layout is one big partition for the entire disk. That's our starting point.

Right away, however, there is another partition that we should separate from the single mega-partition: we should have a separate root partition ("/"). The first reason for this is that FreeBSD is laid out in such a way that the essential files and utilities will normally end up in the root partition. "/etc", "/bin", and "/sbin" should all be under the root partition and then they will be easily available for booting, for use in repairing the other filesystems, and what have you. Historically, part of the idea was that perhaps you couldn't even mount the other filesystems before you'd used the various utilities on the root filesystem to repair them. That may not really apply in the same way now that we have background fsck. There've also been concerns about enabling soft updates on the root partition. See here, for example -- and even that (the FAQ) discounts those concerns somewhat.

So, what's the bottom line? Upon examination, none of the issues with a single, large, filesystem partition appear to be critical -- you probably could lay out your disk that way and it probably would work fine. But, I haven't tried that, and I maintain that having a separate root partition is still the best way to go, even if it's less crucial than it once was. The important thing is you have nothing to lose by making a separate "/" partition and several things to gain: faster booting and better recovery and repair, for example.

On all of my machines I will also add a partition for "/var". The rationale for this is precisely that it is a bottle neck! I wish to isolate the logs and mail spool and other write heavy applications from the rest of my disk so that if(/when) things go wrong, it won't fill the entire disk. This is a double-edged sword, of course. If you stuck with just two partitions (root and everything else) you'd have a lot longer to notice a run away log. But, I've found BSD to be very resilient to a full "/var" partition. (I've also found that it's good to have some alerting for high utilization. :) )

Another rationale for the separate "/var" partition is to allow FreeBSD to optimize the I/O of those files. Chapter 2 of the Handbook puts it like this:
"Putting these [/var] files on another filesystem allows FreeBSD to optimize the access of these files without affecting other files in other directories that do not have the same access pattern."
... Which makes sense since "/var" will presumably have many small writes, unlike the other file systems.

And, voilà, the rest of the disk should be dedicated to the "/usr" partition. Since "/home" is linked there and all third-party software is installed there by default, you're probably going to need all that space.