Backup Management
Volume Number: 16 (2000)
Issue Number: 3
Column Tag: Systems
Backup Management
by Paul Shields
A guide to managing local and remote backups for your Mac
Introduction
When was the last time you lost data because of a hard drive failure? How about by accidentally deleting a file, whose name you did not recognize or felt you no longer needed? Have you ever lost data because of a fire or theft? All of these represent real situations that people face every day. How long before you encounter a similar situation?
Backups are one of the most critical, but often neglected, tasks in proper data management. Both home users and network administrators have lax attitudes towards backups until the day comes when they lose data, especially important business-related data. With the refinements in hardware and software technologies backups are easy and relatively inexpensive, even for the small office and home users.
How effective is a company where the list of customers is lost because of malicious activity by a former employee or Internet hacker? Can a company survive if its financial system lies in ruins because a hardware failure rendered it useless? As systems administrators your responsibility is to protect corporate computing assets. Since data is becoming one of the most valuable assets a company has, implementing a reliable backup system is essential. As a business owner, you must take interest in the processes used to protect your assets. There is no excuse for being caught off-guard.
Backup Terminology
Before beginning a discussion on implementing a backup system, we must define some of the common terms.
Full backup
A full backup is a complete copy of everything on the source drive to the backup media. Most users and administrators run a full backup on a weekly or monthly basis.
Incremental backup
An incremental backup only copies the data that has changed since the last backup. Administrators will run incremental backups on a daily basis to capture the data files added or updated during the day.
Differential backup
A differential backup copies the data that has changed since the last full backup. It is important to understand the difference between an incremental and differential backup. Administrators use differential backups to facilitate more complex tape rotation schemes and sometimes in database environments.
Archiving
Too many administrators fail to understand the difference between archives and backups. They fail to see the powerful capabilities of archiving and use their normal backups as archival storage. Archiving is the process of moving data from one media (typically a hard drive) to another media (optical disks or tapes). Archiving deletes the files from the original location. Companies may use archiving to store things such as historical financial data. Archiving saves space on hard drives, while retaining the ability to restore the data quickly.
Picking the Right Media
Each media type has dramatically different characteristics including lifespan, storage capacity, re-usability, and costs. Even within a category such as tape drives the number and variety of options available is broad enough to suit a wide spectrum of users. One important factor in selecting a media is the length of time you plan to store the media. Table 1 provides a lifespan comparison of some of the most popular backup media.
Media | Lifespan (years) |
Magnetic Tape | 1 - 3 (newer tape technologies are extending this) |
Magnetic Disks | 3 - 5 |
Optical Disks | 30 |
Write-once CD-ROM | 30+ |
Table 1.Average lifespan of common backup media. This is strictly a shelf -life and does not factor in the number of uses.
A common mistake is to pick a media type based on the needs of today's systems. The data requirements of businesses are growing at a phenomenal rate estimated to be as high as 70% per year by the Gartner Group. A system that barely meets the capacity and performance requirements of your current systems will be inadequate within 18 months and the search will be on for a replacement. Switching media types is an expensive and time-consuming proposition.
Tape drives
Tape is the standard in the backup systems of the majority of corporations. Tape offers extremely high capacities, excellent performance characteristics, and a reasonable lifespan. The variety of formats makes tape a versatile media that can fit in a number of environments. Table 2 provides a summary of the most common tape types and their specifications.
Type | Capacity | Speed | Drive Costs | Media Costs |
| (uncompressed) | (uncompressed) |
|
QIC (Travan) | TR-4: 4 GB | 25 MB/min | $350 | $7/GB |
| TR-5: 10 GB | 60 MB/min | $550 | $4/GB |
|
DAT | DDS-2: 4 GB | 46 MB/min | $650 | $2/GB |
| DDS-3: 12 GB | 60 MB/min | $1050 | $2/GB |
| DDS-4: 20 GB | 180 MB/min | $1350 | $2.50/GB |
|
DLT | DLT 4000: 20 GB | 90 MB/min | $2000 | $4/GB |
| DLT 7000: 35 GB | 300 MB/min | $4000 | $2.50/GB |
|
AIT | AIT: 35 GB | 180 MB/min | $2000 | $3/GB |
| AIT II: 50 GB | 360 MB/min | $4000 | $2.75/GB |
Figure 2.Tape types and specifications. Most drive manufacturers offer compression and advertise compressed storage capacities that are roughly double their uncompressed capacities.
For a home or small business user, the low cost of the Travan drives makes them an appealing option. The catch is the high media costs per GB and the slow performance. Although 60 MB/min sounds fast, a typical hard drive is now 4 - 8 GB, which translates into a full backup time of 1 - 2 hours. This long backup time makes it a nuisance and you are less likely to be anxious to give up access to your machine for a few hours while it backs up.
For a large organization, picking the right technology depends on the type of backups. Most administrators deploy DAT tape drives for network backups because they are less expensive and offer good performance and capacity. AIT and DLT offer more capacity and better performance but are expensive. DLT and AIT are perfect for situations where you attach the tape directly to a fileserver because of the large amounts of data on the server.
The reason DLT and AIT may not fit well in a typical network backup is the potentially slow performance of moving data between machines. The latest releases of Retrospect and OS 8.5 combined with a quality TCP/IP network can come close to the speeds needed to use the full capacity of a DLT or AIT system.
To find out why DAT may be better for network backups than DLT compare the difference in performance between a 10Base-T network and DLT drives. At best, a 10Base-T network can move data at about 1 MB/s. A DLT writes data at a minimum of 1.8 MB/sec. This gap is even wider with the new AIT II tapes and the proposed SuperDLT format. DAT drives more closely match the performance of a 10Base-T network and have a lower cost per MB. DLT is still viable and I use it for network backups on a regular basis, primarily because of its longer lifespan and better durability.
Another reason to choose tape as a backup media is the availability of tape libraries. A tape library is a device that contains one or more tape drives along with dozens or hundreds of tape slots. These systems have a robot that can automatically swap tapes as directed by the host computer. Tape libraries greatly enhance the storage capabilities of a backup server. No longer do administrators have to limit backups to what will fit on a single tape or constantly monitor the server and rotate tapes. The software automatically swaps tapes as they fill. Retrospect supports a number of tape library systems via the Advanced Driver Kit option.
Most tape drives have a SCSI interface, which poses a problem for iMac and B&W G3 owners without a SCSI card. Until Firewire or USB connections become more common on tape drives, users must consider alternatives or buy an adapter. USB is a low-speed interface and while adapters exist, the performance they offer will be disappointing. Firewire offers performance levels higher than traditional SCSI, so the use of an adapter should provide adequate performance. Tape drives with native Firewire should begin to appear in the next 6 - 12 months.
CD and DVD
Over the last few years, CD and DVD have become viable backup media for small businesses and critical data in large businesses. As shown in Table 2, both CD and DVD drives offer good storage capacity and performance with moderate media costs.
Capacity (uncompressed) | Speed | Drive Costs | Media Costs |
CD | 650 MB | 600 KB/sec | $379 | R: $1/GB |
RW: $10/GB |
DVD | 5.2 GB (2.6 GB/side) | 1300 KB/sec | $599 | $8/GB |
Figure 2.CD and DVD specifications. These media types offer smaller capacities than tape but have other advantages that make them appealing for archival purposes.
The real advantage of a CD or DVD is the long lifespan of the media. Optical media can last as long as 30 years before oxidation starts to affect the stored data. This makes optical media perfect for storing archives of critical data such as financials or personnel records.
One note of caution when considering DVD is that the capacity is 2.6 GB per side. Switching sides on the DVD recorder is a manual process, which interferes with scheduling unattended backups.
Removable drive technologies
Zip drives are almost universally available and are relatively inexpensive. Jaz drives offer more capacity and better performance while retaining a relatively low cost. These media though do not offer extremely high levels of reliability and are best suited for file transfer, occasional ad-hoc backups, and short-term storage.
Another problem with removable media is the high costs of media. A typical Jaz cartridge costs $90 and stores 1 GB of data. A typical hard drive now tends in the 6 GB range. If we assume the hard drive is full and that you run a full backup once per week, then the yearly cost approaches $28,000. While you may be able to reduce this cost slightly by recycling cartridges, these costs do not account for having duplicates of your backup media (highly recommended) nor the media used for daily incremental backups. This compares poorly to DAT, which under the same conditions has a yearly cost of just over $1000.
One promising technology is the Orb drive from CastleWood http://www.castlewood.com/. The Orb uses traditional hard drive media and offers the potential for improved reliability as compared to the Zip/Jaz technologies. EIDE versions are shipping now, while SCSI, USB, and Firewire version should be available by the middle of the year. The Orb solves some of the media cost problem by reducing media costs to approximately one-third the cost of Jaz media. While this difference does not make the Orb a competitor to tape in most situations, it may make it acceptable in some circumstances.
Which one is right for your needs?
Most users should avoid removable magnetic drives because of their limited capacity and short lifespans. A small business with a few machines could probably do well with the older DAT technologies (DDS-2 or DDS-3). Both offer good performance and low media costs. The newer DDS-4 drives also support the older tapes, so when it comes time to upgrade there is an evolutionary path.
If you needs are beyond the capacity of DAT, both DLT and AIT are excellent solutions. DLT has been in the market place longer and is potentially a more stable technology. Sony is pushing the AIT standards hard though and the recent AIT II drives surpass DLT in performance, capacity, and media costs. A network administrator will not go wrong with either of these technologies, especially for backups of large data servers.
Media Rotation
Where to put your backup tapes?
Few administrators take the appropriate actions when it comes to storing their backup tapes. All too often when you ask the systems administrator where the backup tapes are, they point to a cabinet in their office. This is strictly a matter of convenience and complacency. Administrators hesitate to send the tapes off-site because as soon as they do, a user will request a file restore and they must bring the tapes back on-site. The complacency is because so few network administrators have been through a major disaster that resulted in a major loss. Few companies have a disaster recovery plan they enforce and there is little incentive for the network administrator to create one.
Home users and small business should invest in a fireproof safe at a minimum. A better solution is to find a secure off-site storage location like a safe deposit box at a local bank. Either way, the data is safe from the most common hazards such as fire, theft, or other unfavorable environmental conditions.
For large organizations, storing the tapes at a remote office or an off-site storage company is ideal. When selecting a site, ensure that the provider maintains proper environmental conditions and takes measures to control access to the backup media. Our local off-site storage company will not reveal its exact location under normal circumstance. They provide a courier pick-up and drop-off service for transporting the tapes between sites. We ship the tapes in sealed boxes to ensure that no one tampers with the contents.
Once you select a storage location, its time to decide how often the media will rotate off-site.
Developing a rotation schedule
The media rotation schedule forms the basis of your backup schedule. Its at this point that you start asking questions like, how much data can I afford to lose? There is no set standard for defining a schedule but there are a few guidelines. The exact schedule you choose will depend on the amount of backup data and the critical nature of the data.
The most common method is to do one full backup per week, usually over a weekend, and incremental backups the rest of the week. You have the option of placing the full and incremental backups on separate tapes, allowing you to ship the full backups off-site immediately on Monday morning. The incremental backups from the rest of the week are on a second set of media. You then ship the incremental tapes off-site on Friday before the next full backup. This gets the media off-site quickly and minimizes the amount of data at risk.
The problem with such an aggressive off-site storage policy is the inevitable restore request. The administrator must pull tapes back on-site to complete the restores. A potential solution is to bring the older backup tapes back on-site which may cover some restore requests and minimize the amount of tape shuffling.
Another option is to duplicate tapes before sending them off-site. While this method does double your media costs, it allows you to have one copy off-site for disaster recovery and one on-site for restores. To duplicate backup media you will need a machines with two drives. Retrospect has a feature to facilitate duplication of media.
The important thing is to keep the tapes off-site as much as possible. The whole purpose of backups is to provide safe and secure storage for critical data. Leaving them on-site means they are vulnerable to the same disasters that may destroy the original source.
Picking a Backup Package
There are few options when it comes to quality backup software for the Mac. Dantz Development's Retrospect is the standard by which we judge other packages. Three versions of Retrospect are available. The target customer for each package is different and selecting the one that meets your needs is relatively easy.
Retrospect
Retrospect is the most comprehensive backup package for the Mac and will be the package administrators choose for their network backups. Retrospect can backup both local and remote drives. It also provides support for backing up Windows 95/98/NT machines from your Macintosh server. There is an add-on package, the Advanced Driver Kit, that adds support for high-end tape devices and tape libraries.
Retrospect Network Backup Kit
This package includes both the Retrospect server and Retrospect client software. This is the package network administrators will need in order to do centralized backups of all their machines using one or more servers. Each Retrospect client requires a unique license and additional licenses are available in 5, 10, and 50 packs.
Retrospect Express
Retrospect Express is what one might call the consumer version of Retrospect. The Express version lacks support for tape drives, but still has the ability to backup to local removable drives or over the Internet (via FTP). These features along with its ease-of-use and low price make it a package suitable for home users.
Backups for One
Local backups are best suited for home users and very small offices where there are only a few machines. In an office of dozens of users, there is a certain economy of scale to switching from local backups to a centralized dedicated backup server.
Many administrators will also use local backups for servers with large amounts of data to improve backup performance. Trying to backup several hundred gigabytes of data over the network is a time-consuming task that may not complete in a constrained backup window.
Configuring Retrospect for local backups
Retrospect offers an excellent utility called Easy Script to guide you through the process of building a backup script and schedule (Figure 1). Easy Script will automatically run the first time you launch Retrospect. Otherwise, it is available from the Help menu or the Scripts tab in the main window.
Figure 1.Retrospect Easy Script utility for generating a backup script and schedule. Easy script is available from the Help menu or the Scripts tab.
Individuals and network administrators will find the interface of Easy Script easy to use and thorough. Easy Script is part of both Retrospect and Retrospect Express. When you start Easy Script, it asks if you are configuring the script for local or network backups. The next step is to select the backup media (Figure 2).
Figure 2.Selecting the media type in Retrospect's Easy Script utility. Retrospect Express would not include the tape options.
Each step in the Easy Script process includes additional information on the options available. After selecting how often you want to backup, Retrospect asks how often you want to rotate the media. The most commonly recommended pattern is to backup daily and rotate media on a weekly basis (Figure 3).
Figure 3.Selecting the media rotation in Retrospect. While available, few should have a reason to select the no rotation option.
After selecting the media rotation schedule, Retrospect provides a summary of the intended backup schedule and prompts the user to provide a preferred time for backups (Figure 4). When you click create, Retrospect will prompt you for the name of the storageset catalogs (Figure 5).
The default names are the word "Storageset" followed by a letter (A, B, C). While this naming convention works, it is not ideal. A more descriptive naming convention would be appropriate. One suggestion is to use the name of the server such as "Finance Backup Server A" and "Finance Backup Server B." These names indicate the server and backup rotation, making it easier to track which tapes go with each server. There is a 31-character limit because of filename restrictions in the Macintosh OS.
Figure 4.Retrospect displays a summary of the backup schedule and prompts for a preferred backup time. At this point Retrospect will build the backup script.
Figure 5.The final step is to name the catalog files. Retrospect provides default names, but most administrators will prefer to set a more descriptive name.
At this point, your backup server is ready to go and the only thing left is to load the media and wait for the next scheduled backup. Retrospect offers more flexibility to control the backup process, but most home users and small offices can ignore these options and use the default settings.
Backups for Everyone
Configuring Retrospect for network clients
The process of configuring the server to backup network clients is almost identical except for one step, the configuration of each network client. For network backups, you will need a unique license key for each Retrospect clients. License packs are available in units of 5, 10, and 50. Some OEM drive manufacturers will include a 3-user license with high-end DLT, DAT, and AIT drives.
The first step in setting up network clients is to install the Retrospect client control panel on each computer (Figure 6). The control panel is mostly non-obtrusive and in my experience has a tremendous record for stability and compatibility with both other extensions and OS updates. The name of the file has a special character at the beginning to force it to the bottom of the control panel's alphabetic list thus causing it to load last.
Figure 6.The Retrospect Remote control panel on each network client is easy to install and has few compatibility issues.
After installing the Retrospect client on each computer, you follow the same Easy Script process outlined for a single machine environment. The only change is the prompt to find and configure clients (Figure 7). Retrospect supports AppleTalk and TCP/IP connections to clients. When you first install the Retrospect client on a Mac, it will generally default to AppleTalk. Once captured and configured in Retrospect, the administrator can change a client to use TCP/IP. Windows clients only support TCP/IP. The primary advantage of TCP/IP is a potential for a 2X increase in backup performance.
When you click on a zone, Retrospect sends out a broadcast to the zone looking for clients. If you select TCP/IP, the server will send out a broadcast to the subnet looking for clients. If you want to configure a TCP/IP client on another subnet, you will need to know its IP address or DNS name.
To configure the client, you double-click on it in the window and enter a license code. During the client configuration, Retrospect will also prompt you for a client name and password. Specifying a name allows an administrator to identify the machine, instead of relying on the Machine (File Sharing control panel) name or DNS name (PC clients). While you can choose to leave this password blank, doing so creates a major security hole in your backup system.
The password ensures that only authorized backup servers attach to the client and pull the client's data. By not setting a password, you leave the machine open to anyone on your network configuring a copy of Retrospect to capture and backup the data on the client. The person with the rogue backup server now has access to other people's data.
With a password configured, when a Retrospect server attempts to connect to the client for the first time, a password prompt will appear and the administrator must enter the correct password. The server will cache the password so that it can connect to the client for future backups without having to prompt the administrator.
For those administrators with sensitive data that requires an even higher level of security, Retrospect offers the option of encrypting the data before sending it across the network. This keeps someone from configuring a protocol analyzer on the network and capturing the data as it travels between client and server. On newer machines, the performance overhead of encryption is minimal.
Figure 7.The interface for adding network clients supports both AppleTalk and TCP/IP clients.
After configuring each client, the process continues with the schedule and media rotation settings.
The problem with portables
The main issue when backing up clients over the network is finding an appropriate time to run the backups. Portable computers, which the users tend to disconnect from the network at night, complicate the issue. The most common mistake is to configure the client machines during the day when they are all on and connected to the network without realizing the machine is a portable. When backups run at night, the machine is gone.
The most common solution was to pick a time of day when there is a high probability of the machine being present on the network such as lunchtime and scheduling backups of portables to occur then. This method presents two basic problems. First, there is still no guarantee that the machine will be present on the network. Second, lunchtime varies from user to user and many may be in the middle of completing a critical task when the server starts a backup. No matter how fast the machine, backups can be an intrusive function that affects performance. In the past, we have found that the user reaction to such an immediate slowdown in performance is to restart the machine in the hopes that it would solve the problem. This results in the cancellation of the backup of that machine. The Retrospect client provides a partial solution to the problem, by letting a user lower the priority of the backups in the Retrospect control panel preferences. With the faster processors available today, this option can allow backups to proceed at almost full speed while maintaining usability for the user.
An administrator can carefully monitor the backup logs and attempt to contact the user in cases of repeated backup failures, but is a burdensome task. Contacting the user via the telephone can be just as difficult as finding their machine on the network. In addition, just because you manage to get hold of them, does not mean they will let you initiate a backup of their system immediately.
Retrospect 4.0 and later provides a solution to this problem that I have yet to find an equivalent of in any other software package. In version 4, Dantz introduced the concept of a backup server. Administrators configure backup server scripts the same as any other backup script, although you cannot use the Easy Script tool. The administrator selects the backup media, clients, storagesets, and schedule. A typical backup server script might look like the example shown in Figure 8.
The difference is that a backup server script runs continuously, constantly monitoring the network for clients. The backup server works through the list of clients and checks to see if the client is visible on the network. If the client is visible, the backup script checks the amount of time since the last backup (a configurable option). If the client is due for a backup, then the server sends a notice to the client informing them that backup will begin in 20 seconds (another administrator configurable option). The user has a few options at this point. They can let the backup proceed, which is what happens by default or they can reschedule it for another time. If they choose to reschedule the backup, the server notes the requested date/time and makes an effort to retry the backups then.
Figure 8.Typical backup server script backs up all local and network drives to storageset C or D. This script is always active but will not backup a particular machine more than once a day unless the user requests it.
Using the backup server script gives the user another option. Often a portable user will come in to the office for a short period to get a specific task done. If they know they will only in the office for a short time and want to ensure backups initiate, they can go to the Retrospect control panel and schedule a backup with the server. They can specify a certain time and date or simply tell the server to back them up ASAP. The server will move their machine to the top of the list and begin backups as soon as any in progress backups complete.
The only problem is explaining to the user the difference between ASAP and now. Many of them assume by clicking ASAP, they are telling the server to backup them up immediately. Since Retrospect can only backup one client at a time, the request goes into a queue. Depending on what stage the in-progress backup is at; this could mean a significant wait.
While the backup server option is infinitely better than the alternatives, it still has limits. Tape rotation is not an integral part of the backup server script. The backup server will use any tape that matches one of the configured storageset options. Administrators must be careful to load and name the tapes before the backup server script initiates. There is no way to code in the script to rotate tapes on a weekly basis, so inserting the wrong tape or using the wrong name may result in incremental backups using the wrong storageset.
A configuration that combines the power of regular backup scripts and the backup server might look like this. The administrator defines two scripts, one called Full backups and one called Daily backups. The administrator sets the full backup script to alternate between tapes on a weekly and runs every Saturday. The daily backups are a backup server script and run continuously Monday - Friday. The administrator inserts a blank tape on Friday evening which the full backup script sees and names. On Monday the administrator can leave the same tape in and store both the Full and Daily backups on the same tape. The other option is to remove the tape, insert a new blank tape and name it to match the storageset from the Full backup. This ensures that Retrospect bases the next incremental backups on the most recent full.
Summary
Backups are a critical but often neglected task. Proper backups are relatively easy to manage and the time spent in doing backups will be worth every minute the next time you or one of your users accidentally deletes a critical file.
There are three basic steps to developing a backup plan. First, select a media type that meets your storage and archival needs. For most home users, magnetic media or low-end tape drives will be adequate. Network administrators will want a high performance drive mechanism like DAT. DLT or AIT. If the goal is long-term archiving than CD-ROM or DVD is a perfect solution.
The second step is deciding on a media rotation schedule and planning a storage location for the backup media. You should make every effort to store backup media off-site such as in a safe-deposit box or other storage facility. While on-site, tapes should be in a fireproof safe. How often you rotate backup tapes between sites depends on the critical nature of the data and the number of restore requests you process on a daily basis.
The final step is configuring the software for your local machine or on a server that can handle backing up remote clients. Once implemented most backup systems are easy to maintain. Retrospect offers some excellent reporting capabilities that make it easy to monitor the backup status of all machines. This does not mean you can configure the system and ignore it.
Paul Shields is currently the Technology Advisor and Project Manager for a major telecommunications firm in Dallas, TX. In his role Paul selects and implements the critical technologies that comprise the backbone of the computing infrastructure. He also writes for several publications including AppleLinks http://www.applelinks.com/ and develops disaster recovery plans that minimize downtime and maximize recoverability. Feel free to forward any questions or comments to him at pshields@cyberramp.net.