A Smart World After All

Volume Number: 21 (2005)
Issue Number: 3
Column Tag: Programming

Source Hound

by Dean Shavit

A Smart World After All

Having a hard drive die on you can be a life-altering experience. I'll never forget the first time it happened to me. It was 1996, and I was sitting, drinking an espresso in a local coffee house, working on an essay for my Ph.D. comprehensive examinations in English, when my PowerBook 520c froze. I didn't think much of it at the time, as I was running System 7.5, which tended to freeze without warning, but when I rebooted, I found myself staring dumbly at the blinking question mark. Hours later, after running Norton Utilities, and seeing that my 160 meg hard drive was now recognized as an 80 meg hard drive, I realized that my essays and research were lost. Sure I had a two-week old backup, but I'd been in a frenzy, working hard to make a deadline, and hadn't thought to keep my backup current. I took it as a sign, and even though I'm occasionally called "Dr. Dean" when I show up on-site to troubleshoot an ailing Mac, there's now no other reason to call me Doctor.

At the time, I wasn't what one would call a "computer professional," though I did some consulting work in the area of Desktop Publishing, training on QuarkXpress and Illustrator, and managed to subsidize my desire to become a poet while suffering as a poor grad student. Though I was quite aware of certain signs of hard disk trouble, such as "sticktion," which would delay startup, or the "whine," which indicated bearings that were about to go, my trusty PowerBook 520c exhibited no signs of any disk problems, not one disk error, not one bad sector. Its hard drive died suddenly, without even a hoarse whisper. I'm really not waxing poetic; back in the mid 90s, the only real clues that a hard drive was dying (other than the occasional -36 disk error or bad sector turned up by a disk utility), were audible (a scratch, a ping, a scrape), so the hyperactive herky-jerky animation of Dr. Peter Norton rubbing his stethoscope over a hard disk platter in Norton Utilities 3 wasn't really that far from the reality of troubleshooting a failing disk.

It was rather amazing how accepting I was of the situation. I turned in my books and moved on to a career in Macintosh technology. I took responsibility for being a dummy and not backing up. I never once thought to ask, how come my Mac didn't alert me that there was a problem with my disk? In 1996, we didn't have such high expectations of our personal computers. We wanted them to work, to print, and sometimes to connect to a network. A cell phone or pager was a luxury, and email accounts were generally bundled with a job, or student enrollment at a University, or an AOL or CompuServe membership.

Besides my life-changing data loss, it seems that 1996 was a pivotal year for hard disks in general. In April of that year, a group of researchers from several drive manufacturers was busy hammering out version 2.0 of SFF-8035i, a standard for hard drive diagnostics which had been proposed the previous year, defining thirty attributes related to performance and reliability that hard disks should track internally. The standards they developed became known as the Self-Monitoring, Analysis and Reporting Technology systems, now referred to as SMART.

Disks Will Die

One of the biggest issues Veterinarians have is that the pets they treat don't have the ability to communicate specific symptoms of disease. Dogs and cats with organ failure generally don't complain about pain, but simply seem to slow down, not run as fast, or jump as high. No matter how perceptive the Vet, or how experienced, I've often heard them exclaim, "I wish the animals could tell me what was wrong." How strange would it be if, like an automobile, the pet had a diagnostic port that would reveal a score about how well their heart or kidneys, or intestines were functioning? Automobiles have such diagnostic ports, and some older (much much older) Macs used to have them as well.

Hard disks are mechanical devices, just like Zip disks, or floppy disks. They have motors. They have platters that rotate at speeds that would make the engines in most cars overheat and explode. They have little arms with magnetic heads that dance around picking up blocks of data. Frankly, it's an amazing testament to the engineering skills of hard disk manufacturers that they are as reliable as they are. With OS X, hard disks will get even more of a workout if there's not enough RAM installed in a machines, due to virtual memory page outs, leading to a premature demise of the startup disk.

Most end-users and far too many system administrators don't realize that current hard disks are keeping an internal log of their own performance, based on the SMART attributes codified and updated with the ATA-3, 4. and 5 standards. Modern SCSI disks also have SMART capabilities as well. Some of the attributes that SMART tracks are: (see table)

SMART attribute implementations still vary, somewhat, by manufacturer, but they have enough in common to, at the very least, give a pass/about to fail status report when queried. Starting with OS X 10.3, Apple's own Disk Utility includes a simple SMART test, which will display a one-line S.M.A.R.T. status report when you select a physical disk:

ID# | Attribute     |    Description   
____|_______________|___________________________________________________________________________
1   | Raw Read      |    Count of non-corrected read errors. More errors (a smaller value)    
    | Error Rate    |    denotes a deteriorating disk surface.
____|_______________|___________________________________________________________________________
2   | Throughput    |    Throughput (I/O) performance of Hard Disk.   
    | Performance   |
____|_______________|___________________________________________________________________________
3   | Spin Up Time  |    Spindle spin up average time (from parked to ready).   
____|_______________|___________________________________________________________________________
4   | Start/Stop    |    Cycle count of spindle start and top. 
    | Count         | 
____|_______________|___________________________________________________________________________
5   | Reallocated   |    Count of reallocated sectors. A great indication of a failing disk.    
    | Sectors Count |    Reallocated sectors are marked as "unusable," which is why modern hard 
    |               |    disks don't show bad sectors.
____|_______________|___________________________________________________________________________
6   | Seek Error    |    Count of seek errors. This indicates errors when the heads have a    
    | Rate          |    mechanical failure or when the heads positioned over a data block 
    |               |    cannot read it due to poor disk surface conditions.
____|_______________|___________________________________________________________________________
7   | Seek Time     |    If this attribute is lower, it indicates mechanical problems with the    
    | Performance   |    heads or disk surface.
____|_______________|___________________________________________________________________________
8   | Power-On      |    Total of many hours the drive has been powered up.   
    | Hours         |
____|_______________|___________________________________________________________________________
9   | Internal      |    How hot the temperature sensors say the drive is. Can also be a great 
    | Temperature   |    indicator of how hot the inside or a computer is as well.

Figure 1.

The SMART test is also available from the Terminal in OS X, via the diskutil command, which is specific to OS X. First, you have to get the hard drive identifier:

In this case, the hard disk identifier is disk0, where as the volume identifier is disk0s3. Working with the hard disk identifier, we can then query diskutil to get info on the drive, which will include the SMART Status:

minime:~ dean$ diskutil info disk0|grep -i smart
   SMART Status:       Verified

Note that in piping to the grep command, I use the -i switch to turn off case sensitivity. Diskutil is very useful for shell scripts, such as the one I wrote below:

#!/bin/sh
## This script is designed to get the SMART status of a drive and send an email notice on fail
## to test, change the "good=1" below to "good=0" and you should receive a warning email
## Dean Shavit, MOST Training & Consulting dean@macworkshops.com http://www.macworkshops.com
## This is for OS X Machines with ATA drives only
##
## Step 1: Define a variable for a functional drive by counting the number of SMART Verified Disks
status=`diskutil info disk0|grep -ci verified`
## Step 2: Define a number for comparison against a failed drive
good=1
## Step 3: Define the warning message for the body of the email
warning="houston we have a problem!"
## Step 4: Define a variable for the computer name
box=`/usr/sbin/scutil --get ComputerName`
## Step 5: Define a variable - email address of person to notify
admin="dean@macworkshops.com"
## Step 6: compare current status with good status, if a match, echo, if not, notify
if [ $status == $good ]; then
echo $status
else echo From: $box- $warning!!! > /tmp/houston.txt| mail -s "SMART Alert Report" 
   $admin < /tmp/houston.txt
fi
done

This script, when run as a cron job, will check the hard disk periodically for SMART status and email you when the drive when the drive has a problem. There are two status codes that Disk Utility can report: "Verified" if none of the SMART attributes exceed their normal thresholds, or "About to Fail" which will appear in red letters, indicating that the drive will fail and should be replaced. Of course, "failed" isn't a status that the drive can report, but is a state! It is important to understand that even if the hard disk passes the SMART test, there's always the chance that the drive might suddenly expire, without ever issuing a warning. The SMART tests are diagnostic tests, and should never be used as a replacement for a good backup strategy. If anything, the advance warnings will help indicate when a drive needs to be replaced, so as to minimize any possible downtime of a server or workstation.

For those who don't want to script or want an easier path to SMART notification, there's a great donationware utility called SMARTReporter downloadable at http://homepage.mac.com/julianmayer that provides a nice Cocoa GUI for accomplishing the same scheduled diagnostic and warning email, without having to use the Terminal. For those consultants or admins who support off-site users, this can provide an invaluable early warning of a drive failure.

SMARTReporter Icon

SMARTReporter's a snap to install, just drag and drop it from the its disk image (.dmg) into your Applications or Utilities folder. When running, it can provide a SMART status indicator on your menu bar:

SMARTReporter Status Indicator

SMARTReporter's preferences allow an admin to change the interval of the check, define an icon set for the menu bar, set up email information, even launch another application if there's a hard disk failure, such as special Applescript Applet to throw up an alert to the user or give instructions on what to do. For email alerts, it can either use the mail information specified, or borrow the Apple Mail.app settings. If I had my druthers, Apple would include much the same functionality in future versions of Disk Utility.

SMARTReporter Preferences

Not So Smart

SMART Reporting in OS X does have its limits, however. SCSI and FireWire disks aren't supported, so if an admin is predisposed (like I am) use SCSI drives for internal RAID mirrors on servers other than Xserves, the diskutil command cannot get the SMART status of an SCSI disk:

host2:~ mostadmin$ diskutil list
/dev/disk1
   #:  type name                   size       identifier
   0:  Apple_partition_scheme      *17.0 GB   disk1
   1:  Apple_partition_map         31.5 KB    disk1s1
   2:  Apple_Driver_OpenFirmware   512.0 KB   disk1s2
   3:  Apple_Boot_RAID             17.0 GB    disk1s3

host2:~ mostadmin$ sudo diskutil info disk1
   Device Node:          /dev/disk1
   Device Identifier:    disk1
   Mount Point:        
   Volume Name:        

   Partition Type:   Apple_partition_scheme
   Bootable:         Not bootable
   Media Type:       Generic
   Protocol:         SCSI

   Total Size:       17.0 GB
   Free Space:       0.0 B

   Read Only:        No
   Ejectable:        No
   OS 9 Drivers:     Yes
   Low Level Format: Not Supported

Note that the limitation for SMART reporting to ATA disks is something inherent in the way that OS X treats disks. FireWire disks, though, which always contain ATA or SATA drives running on Firewire bridge boards, don't report SMART status to diskutil either. This can be a major problem for Mac admins, who are increasingly relying on storage devices for second-tier file services or in some cases, low-cost backup devices. In the arena, all enclosures are definitely not created equal, as shown by Granite Digital's Firevue(TM) drive enclosures (http://www.granitedigital.com), which feature an LCD panel that displays the SMART status of the disk housed inside of it. This gives their enclosures quite an advantage over others, considering that OS X doesn't have the ability to get past the bridgeboard to read the SMART status of the ATA drive inside. Look for other enclosure makers to follow suit in the near future.

Smart, And Its Mirror, Trams

Ok, I'm not talking about Disney World here, and the super long trains of trams that shuttle folks back and forth from the Magic Kingdom to the parking lots. I am talking about RAID Mirrors on OS X, however. One of the problems many admins have complained about with OS X and OS X server, is the lack of notification that a RAID Level 1 (mirror) is going to go south. The upside is that if workstation or server in question does have a mirror, the other drive will continue to function until it's replaced, and the mirror's rebuilt. That's why the word SMART is mirrored above (ha).

Setting up a RAID Level 1 (mirror set) is easy in OS X. After booting off the installation CD for OS X or OS X Server, drag the disks you want to mirror into the RAID window in Disk Utility and choose mirror, then create the mirror set. The result will be a RAID disk that appears as two stacked hard drives in drive and volume list on the left hand side.

Mirror Raid Level 1 in Disk Utility

Of course, diskutil is available for those who want to create, repair, check, or destroy a mirror set from the terminal. For example, even though SMART status might not be available for SCSI disks, it's easy to script notification for a SCSI mirror, and any RAID 1 mirror for that matter, all based on output from diskutil.

So what I'm looking for in the script below is simply an OK from both disks that are members or "slices" of the RAID

host2:~ mostadmin$ diskutil checkRAID disk3  
RAID SETS
---------

Name:          boot
Unique ID:     bootf5b49d82471e11d98b06003065be09be
Type:          Mirror
Status:        Running
Device Node:   disk3
----------------------------------------------------------
#   Device Node    Status
----------------------------------------------------------
0   disk1          OK
1   disk2          OK
----------------------------------------------------------

Mirror set, note how similar it is to the script that checks the SMART status of a single hard drive:

#!/bin/sh
## This script is designed to get the status of a mirror set and send an email notice on fail
## to test, change the "good=2" below to "good=1" and you should receive a warning email
## Dean Shavit, MOST Training & Consulting dean@macworkshops.com http://www.macworkshops.com
## This is for servers using software RAID mirror sets only
##
## Step 1: Define a variable for a functional raid by counting the number of good disks
status=`diskutil checkraid disk3|grep -c OK`
## Step 2: Define a number for comparison against a failed raid
good=2
## Step 3: Define the warning message for the body of the email
warning="houston we have a problem!"
## Step 4: Define a variable for the computer name
box=`/usr/sbin/scutil --get ComputerName`
## Step 5: Define a variable - email address of person to notify
admin="dean@macworkshops.com"
## Step 6: compare current status with good status, if a match, echo, if not, notify
if [ $status == $good ]; then
echo $status
else echo From: $box- $warning!!! > /tmp/houston.txt| mail -s "RAID Alert Report" 
   $admin < /tmp/houston.txt
fi
done

So, with both the SMART script running as a cron job, along with the RAID script, it's easy to get an advance warning of an impeding single disk failure, or when a RAID Level 1 has a drive fail or go out of sync. It's tougher with or SCSI drives, but if they're used in a RAID Level 1 configuration, the script above can provide the same advance warning that it can for SCSI disks under OS X or OS X Server.

Maxwell Smart

Recently, I was onsite for a couple of months helping a customer upgrade two hundred Macs to OS X. Many of the machines running OS 9 showed signs of hard drive failure, such as frequent boot failures, corrupt directories, slow time to spin up and metallic whining sounds. One day, a department manager's hard drive failed, taking several years of email with it. We sent the drive into DriveSavers (http://www.drivesavers.com) for hardware data recovery, but no dice, the mechanism was too far gone, too much damage done to recover what was needed. It caused quite a stir in the IT department, and a discussion about early warning of hard drive failures. An interesting fact about this customer was that they had several UNIX/LINUX experts on staff who were very interested in learning more about OS X, but interestingly enough, tended to treat an OS X workstation as they would a Linux server. They'd try this command or that command and comment that OS X was very different. I shared my SMART alert script with them, and they found SMARTReporter on Versiontracker. However, they wanted to be able to have a constant real-time log, on each Mac OS X box, of SMART status and more sophisticated scripts that would report back attribute information such as temperature or reallocated (bad) sectors. It was time to go hunting for open-source tools.

The first stop was at the Maxwell web site, http://maxwell.sourceforge.net. Maxwell's a tiny command-line program that queries a hard drive for SMART information and can either report a pass/about to fail condition, just like Disk Utility, or a more comprehensive report of SMART attributes. The source code for Maxwell is less than 100k in size, and is such a simple program, compiling it doesn't require configuration first.

1. To install Maxwell, download the source code from SourceForge, then unpack the tarball into a folder on the Desktop. Open a Terminal window, then navigate to that folder:

2. Issue the following commands as in the example below:

[minime:~/Desktop/maxwell-0.5.1] dean% sudo make install
cc   -c -o maxwell.o maxwell.c
cc   -o maxwell -framework IOKit -framework CoreFoundation maxwell.o
/usr/bin/install -d -m 755 /usr/local/doc/maxwell
/usr/bin/install -m 644 LICENSE /usr/local/doc/maxwell/LICENSE
/usr/bin/install -m 644 README /usr/local/doc/maxwell/README
/usr/bin/install -d -m 755 /usr/local/bin
/usr/bin/install -m 755 maxwell /usr/local/bin/maxwell
/usr/bin/install -d -m 755 /usr/local/man/man8
/usr/bin/install -m 644 maxwell.8 /usr/local/man/man8/maxwell.8
[minime:~/Desktop/maxwell-0.5.1] dean%

Once installed Maxwell can be run to get a simple pass/fail status by simply invoking it at the command line:

[minime:~] dean% maxwell
Device: TOSHIBA MK8026GAX                                 Reported PASS status

Or, for a more complete SMART report, it can be run with the -r switch:

[minime:~] dean% maxwell -r
BSD Path:/dev/disk0
Serial:            54IK0355T
Model: TOSHIBA MK8026GAX                       
Firmware: PA002B  
Device supports S.M.A.R.T. operations
SMART self-test supported
SMART error logging supported
S.M.A.R.T. operations are enabled
SMART self-test enabled
SMART error logging enabled
Off line collection status is 0
Time to complete Off line Data collection: 3 hours, 20 minutes
Status is GOOD
   TEST                                                  THRSH   VALUE   STATUS    RAW
--------------------------------------------------------------------------------------
(  1)   Raw Read Error Rate                              50      100     0x0b00    0
(  2)   Throughput Performance                           50      100     0x0500    0
(  3)   Spin Up Time                                     1       100     0x2700    1417
(  4)   Start/Stop Count                                 0       100     0x3200    430
(  5)   Reallocated Sector Count                         50      100     0x3300    0
(  7)   Seek Error Rate                                  50      100     0x0b00    0
(  8)   Seek Time Performance                            50      100     0x0500    0
(  9)   Power-On Hours Count **                          0       94      0x3200    2540
( 10)   Spin Retry Count                                 30      108     0x3300    0
( 12)   Device Power Cycle Count                         0       100     0x3200    335
(192)   Power Off Retract Count                          0       100     0x3200    2
(193)   Load/Unload Cycle Count                          0       78      0x3200    228634
(194)   Device Temperature                               0       100     0x2200    223339413546
(196)   Reallocation Event Count                         0       100     0x3200    0
(197)   Current Pending Sector Count                     0       100     0x3200    0
(198)   Off-Line Scan Uncorrectable Sector Count         0       100     0x3000    0
(199)   Ultra DMA CRC Error Count                        0       200     0x3200    0
(220)   Unknown                                          0       100     0x0200    184
(222)   Unknown                                          0       98      0x3200    977
(223)   Unknown                                          0       100     0x3200    0
(224)   Unknown                                          0       100     0x2200    0
(226)   Unknown                                          0       100     0x2600    241
(240)   Unknown                                          1       100     0x0100    0
--------------------------------------------------------------------------------------
Device temperature is 30 degrees centigrade
--------------------------------------------------------------------------------------
The names listed above may not actually be correct for each test value. The real values 
printed may make no sense whatever and the temperature may be some crazy value. 
Different device manufacturers do things differently :(.
--------------------------------------------------------------------------------------
Device: TOSHIBA MK8026GAX                        Reported PASS status

With this much information at hand, it's quite a simple matter to filter this output so that, say, the Reallocated Sector Count test had a RAW value other than zero, then it could trigger an alert email, or if the temperature reaches a certain level, or the spin retry count is greater than zero. The man page for Maxwell also suggests that it was designed precisely for scripting purposes and for integration with cron. While Maxwell is certainly a notch above the simplistic pass/fail test of diskutil, it isn't exactly the comprehensive, enterprise-wide type of solution that my customer was looking for, either. They wanted something more integrated into the operating system, a tool that would keep a log, and send that log back to a central syslog server, where trends in hard disk wear and tear could be monitored over a period of time.

It's A Smart World

"It's a world of laughter, world of cheer, it's a world of hope, and a world of fear." Sound familiar? I can almost hear the little mechanical people singing! Our company currently keeps a few servers at Equinix, a state-of-the-art colocation facility near downtown Chicago. With giant steel doors, handprint readers, bag checks, and multiple security stations, it's not hard to imagine at times you're entering the back room of a Disney World exhibit filled with trade secrets. It's an interesting side note, that over the years, I've seen more and more Xserves and Xserve RAIDs populating the cabinets at Equinix. I'm sure many a pager would go off if one or more of those Xserve RAIDs had a drive failure, or if one of the internal disks inside an Xserve died, triggering an alert by the Server Monitor Application.

But now the Mac Mini, Apple's little darling unveiled at MacWorld, seems to have caught on as a the "server for the rest of us." There's even a hosting service for Mac Minis, called "Macminicolo," http://www.macminicolo.com, that allows Mac owners to "park" their Minis in a cabinet for as little as $29.95 per month. While a colocated Xserve with a dedicated 1U space in a rack at a facility goes for the street price of about $150 per month, including one megabit of bandwidth (200 gb of transfer per month), there's absolutely no doubt that the proprietor of Macminicolo can certain cram a whole lot more Mac Minis into the same cubic space an Xserve would occupy.

Rack of Xserves, Rack of Minis

On a side note, the success of (and need for) of such a colocation facility where "everyone can have their own server," shines a glaring spotlight on one of the great weaknesses of Apple's OS X Server software: the lack of virtualization. In the Linux world (on IBM and Red Hat and Sun solutions), virtualization of admin tools and even the actual processors are a standard way of allowing companies to "rent" the resources of the server, without having to actually "own the box." Whether it's only being able to see your users when you log into a virtualized server, or your file system, or on higher end solutions, your virtual server running on a Logical Partition of the server's resources (LPAR), OS X Server's Admin Tools have absolutely no virtualization capabilities as of this writing. It's going to be quite a while before OS X would allow me to log into an eight or sixteen processor Xserve and administer my LPAR, which would appear a single, dual, or four processor server, with my own environment, file system, list of users and server daemons. Today, Apple's solutions seem to focus on miniaturization, not virtualization.

Asides from the obvious management dilemmas that would face the sysadmins of MacMinicolo, where Mac Minis are smushed into a rack three across and three deep with their power supplies and cooling fans. It seems that the system administrators at Macminicolo have found a cool way to monitor their farm of Mac Minis:

InsideOut monitoring was developed to monitor mac a [sic] based server from the inside out. A headless application that is placed in the startup folder of any OSX machine contacts a master logging server. We monitor factors such as network availability, Hard Drive space, network traffic moved, temperature and the up/down status of specified applications. Customers subscribing to this service can choose to receive email notification if limits or events trigger warnings.

This sounds to me very much like the capabilities of the open-source monitoring software Nagios (http://www.nagios.org), which can log all of the above events and present reports, graphs, and network maps in a web page. Although temperature, application stats, network activity and free hard drive space are all good things to monitor, I would have to make the suggestion that in a situation with multiple Mini Macs with laptop hard drives stuffed like muffins into a rack, SMART status should be the number one indicator monitored, because should a hard drive fail, there'll not only be an unhappy customer who will have lost all of their data if they don't have an offsite backup, but there'll also be an unhappy technician scrambling to pull out a single Mini and replace the hard drive as quickly as possible. If a SMART monitor could give advance warning of a failure, that would make the whole process much easier on everyone.

So, in addition to an enterprise or medium-sized business needing to monitor the SMART status on lots of hard drives in client workstations, it seems that Macminicolo would be another good example of where it would work quite well. However, the solutions I've looked at so far, like diskutil and the script I wrote, SMARTReporter, and Maxwell, are all missing an important capability, none of them have the ability to write a log of SMART events either on the local machine, or to a syslog server on the network, which of course, was the same thing my customer wanted for their Macs.

The Smart Test Of All

Much of the really great open-source software for OS X seems to originate from the Linux community, and such is the case with the smartmontools, http://smartmontools.sourceforge.net, which by all accounts is the top tier solution for SMART monitoring and logging in a non-commercial distribution. Bruce Allen, maintainer of the smartmontools project, explains the origins of the package in an article that appeared in the Linux Journal:

By profession I am a physicist. My research group runs a large computing cluster with 300 nodes and 600 disk drives, on which more than 50TB of physics data are stored. I became interested in SMART several years ago when I realized it could help reduce downtime and keep our cluster operating more reliably. For about a year I have been maintaining an open-source package called smartmontools. . .

In July 2004, Geoff Keating ported the smartmontools to OS X for ATA drive support only, (though the Linux version includes support for ATA/ATAPI-3 to -7 disks and SCSI disk and tape devices) bringing a sophisticated and flexible SMART monitoring suite to OS X for the first time since Granite Digital discontinued their commercial SMARTVue(TM) software nearly two years ago. Smartmontools consists of two main components, a command, smartctl, and a daemon, smartd. While smartctl functions like an improved version of Maxwell with various switches to control the verbosity and depth of the reports and self-tests, it's smartd that really takes the monitoring to a real-time level, where any changes in SMART status, not just errors, are logged to a specified syslog facility.

To install the smartmontools from the source code, make sure you have the latest Xcode tools (Apple Developer Tools) installed, then download the source tarball from http://smartmontools.sourceforge.net, and expand it to a folder on your Desktop, there's many download packages available, but the one we're looking for is called: smartmontools-5.32.tar.gz. Open the Terminal, navigate inside the folder and do the following:

minime:~/Desktop/smartmontools-5.32 dean$ ./configure

You'll see a long list of checking for (various components). When the configure process is finished, type the following:

minime:~/Desktop/smartmontools-5.32 dean$ sudo make install

After a page or two of build feedback, smartmontools is now installed on your system. To start smartd, type the following command:

minime:~/dean$ /usr/local/etc/rc.d/init.d/smartd start

To run smartd automatically on startup, type the following:

minime:~/dean$ sudo ditto /usr/local/etc/rc.d/init.d/ /Library/StartupItems/SMART/

Now you'll need to add a line to the end of your /etc/hostconfig file that reads:

SMART:=-YES-

That will allow the SMART StartupItem to run every time your Mac boots, and you'll also see the reassuring message "Starting SMART disk monitoring" as the daemon loads before the Login Window appears. Now that smartd is running as part of the OS X startup process, the next step's to configure logging:

minime:~/dean$ sudo touch /var/log/smartd.log

This creates an empty logfile to contain the smartd output. The next step is add the line below to the /etc/syslogd.conf file so that smartd knows where to write that output, in this case using the free local3 log facility:

local3.* /var/log/smartd.log

or, if you'd like to send the log to a remote syslog server, then

local3.* @my.syslogserver.domain

and, if you'd like to make sure that the smartd.log file is rotated weekly, modify the /etc/periodic/weekly/500.weekly file, so that the following loop statement has smartd.log in the list of rotated log files.

for i in ftp.log lookupd.log lpr.log mail.log netinfo.log hwmond.log ipfw.log smartd.log; do

Test it with:

minime:/etc/periodic/weekly dean$ sudo 500.weekly

You should see the following line appear:

Rotating log files: ftp.log lookupd.log lpr.log mail.log netinfo.log ipfw.log smartd.log

Lastly, you'll need to modify the /usr/local/bin/etc/smartd.conf file to specify which drives you'd like to monitor, and which tests you'd like to run, as well as an admin notification email address. Remember, in OS X the startup disk is almost always disk0, if the computer has a single physical drive, when the /dev/hda placeholders in the smartd.conf file are meant for other UNIX systems, not OS X.

First, open up the /usr/local/etc/smartd.conf file in your favorite editor. Luckily, the smartd.conf file has an excellent synopsis of what each command and directive accomplishes, so it's pretty self-explanatory:

minime dean$ sudo pico /usr/local/etc/smartd.conf

Find the line that reads:

DEVICESCAN

And edit it so that it reads

#DEVICESCAN

This will tell the configuration file to explicitly use the device you specify, rather than the one it finds on its own.

Find the line that reads:

#/dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03)

And edit it so that it reads

/dev/disk0 -a -o on -S on -s (S/../.././02|L/../../6/03)

This command will perform short and long self-tests and report full SMART status results to the smartd.log file on a schedule.

Find the line that reads:

#/dev/hdc -H -m admin@example.com

And edit it so that it reads

/dev/disk0 -H -m dean@macworkshops.com

This test will notify you by email if the simple health check (pass/about to fail) shows that the disk is in trouble. Now, restart the smartd process with the following command so that smartd will re-read its configuration file:

minime:~/dean$ sudo killall -HUP smartd

Now the smartd.log is visible in the Console application located in the /Applications/utilities folder, and we can read the events we just kicked off. Now your Mac has a real-time log of SMART events and a notification that is emailed immediately when the failure occurs, rather than waiting for cron to do its thing.

With log analysis tools such as Oak (http://www.ktools.org/oak.html), smartd logs can be analyzed to project trends of failure that might occur in waves, due to defective hard drives. Remember the Quantum Bigfoot 5 _ inch hard drives from the late 90s? They never were shipped in a Mac, but at the time Compaq put them in tens of thousands of their Deskpro workstations, and nearly every single disk drive had to be replaced. A tool like smartd can help prepare companies for a run of defective hard drives that might die, en masse. Other features of smartd include a database of hard drive SMART attributes (not nearly complete, but you can submit a device report from your hard drive for submission to the database by using smartctl -a to and emailing it). The other "half" of the smartmon tools is the smartctl command, which can be queried through cron scripts or directly from the command line, much like Maxwell.

Console

Everybody Now . . .

I realize that downloading, compiling, and configuring the smartmontools isn't for everyone, but it certainly is a rewarding journey for those administrators who want piece of mind and a logfile, at least until the storm hits. . .but even for consultants who support home users the smartmontools or even a simple script or SMARTReporter can be a real-life saver. Wouldn't it be nice to call a customer and say, "Hey, I'm going to come over with a brand-new hard disk, backup your data, and replace your drive, because it's going to fail . . ." Or better yet, "Backup your data, now, before it's too late!" Or even, "It seems like those Hitachi hard drives are dropping sectors like crazy, maybe we'd better turn up the air conditioners." So, for those who'd like to get started quickly with the smartmontools, I've whipped up a Installer package that puts the smartmontools on your Mac, as well as the requisite StartupItem. However, you'd still need to configure the /etc/hostconfig, /etc/syslog.conf, /usr/local/etc/smartd.conf files and create a smartd.log file yourself. The installer is available along with the scripts in this article at http://www.themachelpdesk.com. Now, I'd like everybody to join me ". . . There's so much that we share, that it's time we're aware, it's a SMART world after all. . ."

In Next Month's "Source Hound"

Steve Jobs, in his keynote address at MacWorld, repeated the following mantra over and over "H.264, H.264, H.264." Sure, H.264 support is with QuickTime 7, but if you want it now, it's available today, not from Apple, but from the deepest trenches of the open-source world, the realm of the Coneheads and the video rogues, where encoders are encoders and QuickTime is QuickTime, and never the twain shall meet, or will they?

Dean Shavit is an ACSA (Apple Certified System Administrator) who loves to use a Mac, but hates paying for software. Since he's not into breaking the law, his most common response to any cool solution is: "Does that cost money?" If it does, you can bet he's on the hunt for an Open-Source or freeware alternative. Besides surfing for hours, following the scent of great source code, he's a partner at MOST Training & Consulting in Chicago, where he trains system administrators in OS X and OS X Server, conducts large-scale Mac Deployment and Upgrade projects, and writes for his own website, http://www.themachelpdesk.com. If you have questions or comments you can contact him: dean@macworkshops.com.

Software Updates via MacUpdate

Latest Forum Discussions

Combo Quest (Games)

Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »

Hero Emblems (Games)

Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »

Puzzle Blitz (Games)

Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »

Sky Patrol (Games)

Sky Patrol 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: 'Strategic Twist On The Classic Shooter Genre' - Indie Game Mag... | Read more »

The Princess Bride - The Official Game...

The Princess Bride - The Official Game 1.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.1 (iTunes) Description: An epic game based on the beloved classic movie? Inconceivable! Play the world of The Princess Bride... | Read more »

Frozen Synapse (Games)

Frozen Synapse 1.0 Device: iOS iPhone Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Frozen Synapse is a multi-award-winning tactical game. (Full cross-play with desktop and tablet versions) 9/10 Edge 9/10 Eurogamer... | Read more »

Space Marshals (Games)

Space Marshals 1.0.1 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.1 (iTunes) Description: ### IMPORTANT ### Please note that iPhone 4 is not supported. Space Marshals is a Sci-fi Wild West adventure taking place... | Read more »

Battle Slimes (Games)

Battle Slimes 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: BATTLE SLIMES is a fun local multiplayer game. Control speedy & bouncy slime blobs as you compete with friends and family.... | Read more »

Spectrum - 3D Avenue (Games)

Spectrum - 3D Avenue 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: "Spectrum is a pretty cool take on twitchy/reaction-based gameplay with enough complexity and style to stand out from the... | Read more »

Drop Wizard (Games)

Drop Wizard 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Bring back the joy of arcade games! Drop Wizard is an action arcade game where you play as Teo, a wizard on a quest to save his... | Read more »

Price Scanner via MacPrices.net

Our MacBook Price Trackers will show you the...

Our Apple award-winning MacBook Price Trackers are continually updated with the latest information on prices, bundles, and availability for 16″ and 14″ MacBook Pros along with 13″ and 15″ MacBook... Read more

Amazon is offering a 10% discount on Apple’s...

Don’t pay full price! Amazon has 16-inch M4 Pro MacBook Pros (Silver and Black colors) on sale today for 10% off Apple’s MSRP. Shipping is free. These are the lowest prices currently available for 16... Read more

13-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 13″ M4 MacBook Airs on sale for $150 off MSRP right now, starting at $849. Sale prices apply to most colors and configurations. Be sure to select Amazon as the seller, rather than a... Read more

15-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 15″ M4 MacBook Airs on sale for $150 off Apple’s MSRP, starting at $1049. Be sure to select Amazon as the seller, rather than a third-party: – 15″ M4 MacBook Air (16GB/256GB): $1049, $... Read more

Amazon is offering a $50 discount on Apple’s...

Amazon has Apple’s 11th-generation A16 iPads in stock on sale for $50 (or a little more) off MSRP this week. Shipping is free: – 11″ 11th-generation 128GB WiFi iPads: $299 $50 off MSRP – 11″ 11th-... Read more

Clearance 13-inch M1 MacBook Airs available f...

Walmart has clearance, but new, Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) available online for $649, $360 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBooks for... Read more

iPad minis on sale for $100 off Apple’s MSRP...

Amazon is offering $100 discounts (up to 20% off) on Apple’s newest 2024 WiFi iPad minis, each with free shipping. These are the lowest prices available for new minis among the Apple retailers we... Read more

AirPods Max headphones on sale for $479, $70...

Amazon has AirPods Max with USB-C on sale for $479.99 in all colors. Shipping is free. Their price is $70 off Apple’s MSRP, and it’s the lowest price available today for AirPods Max. Keep an eye on... Read more

14-inch M4 Pro/M4 Max MacBook Pros on sale th...

Don’t pay full price! Get a new 14″ MacBook Pro with an M4 Pro or M4 Max CPU for up to $320 off Apple’s MSRP this weekend at these retailers…they are the lowest prices available for these MacBook... Read more

Get a 15-inch M4 MacBook Air for $150 off App...

A couple of Apple retailers are offering $150 discounts on new 15″ M4 MacBook Airs this weekend. Prices at these retailers start at $1049: (1): Amazon has new 15″ M4 MacBook Airs on sale for $150 off... Read more

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech

Source Hound

A Smart World After All

Disks Will Die

Not So Smart

Smart, And Its Mirror, Trams

Maxwell Smart

It's A Smart World

The Smart Test Of All

Everybody Now . . .

In Next Month's "Source Hound"

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board