OSX IP Failover, Part 2
Volume Number: 23 (2007)
Issue Number: 04
Column Tag: Network Administration
OSX IP Failover, A Beginners Guide (Part 2)
Giving IP Failover a Real Job
By Ben Greisler
Introduction
In the previous article, we looked at the general concepts involved with IP failover on OSX and how to set it up. At this point if you set up a pair of machines and tried out the failover process you would have simply had one machine take the IP address of the other. While that is certainly nice, it doesn't really do anything productive for us. In this article we will look at making IP failover do some real work and start to become productive.
NOTE: In Part 1 of this article, there was a small terminology error: The line, "Within that folder can be 4 subfolders: PreAcq, PostAcq, PreRel and PostRel." Should read, "Within that folder can be 4 scripts: PreAcq, PostAcq, PreRel and PostRel." Instead of 'subfolders', it's 'scripts'. I apologize for any confusion.
IP Failover Considerations
At its core, IP failover describes one machine taking the IP of another. Unless we tell the machines otherwise, this is all that will happen. We need to take a few additional steps for the secondary machine to be able to take on the personality of the failed primary server.
We first need to determine what services need to be running on the primary machine so we can make sure they are available on the backup machine when failover occurs. Is it a web server? A file server? An application server? Are there ancillary services running that support the primary function such as a database that feeds a web server? Everything that is running on the primary server needs to run on the secondary.
If you need authentication services for the secondary server in an Open Directory system, consider making the secondary server a replica of the OD master. This gives you built-in and automatic failover for OD.
Another consideration is how do we get the data that the primary server uses over to the secondary server. For static content such as web pages, it may be appropriate to make a copy of the data on the secondary server and be done with it. The problem with that is if the data changes at any point, you have to remember to update it on both machines. If you don't,
you will end up with old data being served by the secondary server. This indicates the need for shared data between the two servers where if it is updated or changed, it is automatically updated on both.
So what are our choices for synchronized data? If the amount of data is relatively small, we could write a script to copy the data from the primary server to the secondary server on a regular basis. This may be appropriate for web pages or content that doesn't change that much. If we have data that is constantly changing, we may want to look into shared storage such as an XSan volume. Another choice is to use external storage that gets mounted on demand when the failover occurs.
Action Scripts
Part of the process of IP failover is the server checking the /Library/IPFailover/<IP address> folder on the secondary server for various scripts. The scripts need to be prefaced with PreAcq, PostAcq, PreRel or PostRel. If there are multiple scripts of one of the 4 types, they will be performed in alphabetical order (i.e.: PostAcq_a, PostAcq_b, etc.).
A script with the PreAcq prefix is run before the network interface on the secondary server takes on the IP of the primary server during failover.
A script with the PostAcq prefix is run after the network interface on the secondary server takes on the IP of the primary server during failover.
A script with the PreRel prefix is run before the network interface on the secondary server releases the IP of the primary server during failback. A script with the PostRel prefix is run after the network interface on the secondary server releases the IP of the primary server during failback. The scripts can be written to perform just about any action you may need to occur during each part of the process. You may want to include writing out to a log or maybe send informational emails. You may need the scripts to start or stop services and mount or unmount volumes.
For example, we might need to mount a volume on the secondary server since we don't want to have the same volume mounted on both servers at the same time, assuming we don't have them as part of a SAN. By the way, a SAN is a great idea for this situation as we can avoid many of the pitfalls of having to mount and unmount volumes. The fstab entry for preventing the mounting of a volume will look something like this:
UUID=2B228FFC-B727-2910-A3B9-917CBAD7134F none hfs rw,noauto
We will need a PreAcq script that may look like this:
diskutil list | grep webstuff | awk ' {print $6 } ' >> /tmp/XRAID_1
diskutil mount $(cat /tmp/XRAID_1)
mount -uw /Volumes/webstuff
This PreAcq script mounts the volume named "webstuff" and mounting the volume would be required if we had edited fstab to prevent mounting of the volume on startup of the secondary server. Once we have mounted the volume and the secondary server has taken over the IP of the primary server, we need to start the services on the secondary server. We might use a PostAcq script that looks like this:
sudo serveradmin start web
or
sudo serveradmin start afp
Once we have the secondary server up and running we are in good shape. While I recommend not directly failing a server back, you might decide that this is fine for your needs. In this case we need to look into the Pre and Post Release scripts. As we know that the services are about to be reassigned to the primary server, we need to stop the services on the secondary server. The PreRel script might look like this:
sudo serveradmin stop web
or
sudo serveradmin stop afp
Then, as part of the same script or possibly another PreRel script, remembering to name them in the order you want them acted on, you will need to unmount the volume that will be picked back up by the primary server:
sudo diskutil unmount /Volumes/webstuff
Once the secondary server has relinquished the primary servers IP the final script is run and that is the PostRel script or scripts. You might want to put an email notification script together or simply log the event:
logger "Completed failback"
AFP Special Considerations
One of the obvious uses of IP failover is to provide seamless protection for file sharing. While IP failover can be used for AFP file servers, AFP has an interesting twist; AFP uses a special cache at /etc/AFP.conf to determine if incoming connections are new or reconnects. The Finder on a client machine disconnected during failover will try to reconnect and is expecting a reconnect. However, if the secondary server has picked up AFP duties from the primary server, it won't know that the client is trying for a reconnect and not a new connection.
Hmm, so what do we do about this? By providing a place for AFP.conf to live on some shared storage is the key. This might be an Xserve RAID volume that both machines can see at the same time. I know that we try to avoid the situation where more than one server can write to a volume, but we have to do that in this case. We also need to point the reconnectKeyLocation key that lives in the /Library/Preferences/com.apple.AppleFileServer.plist to point to the shared location:
<key>reconnectKeyLocation</key>
<string>/Volumes/AFPToken/AFP.conf</string>
And while it is tempting to do so, I can't recommend using IP failover for network home folders. Try it if you want, but I suspect you will find that the clients will have issues with it, even with automatic reconnect.
Conclusion
IP failover can be a powerful tool when it is well thought out and matched to the appropriate need. It is not the end all, be all of high availability, but it can help us add an occasional extra "9" to our uptime percentages. The intention of this article was to give you the most basic tools needed to implement IP failover and to give you a jumping off point. Take these tools and examples and try it out on your own.
Ben has worked Apple based technology integration projects from Maine to Japan while learning all the way. When not collecting frequent flyer miles he spends his favorite time with his wife and 2.5 year old daughter at their home outside of Philadelphia. He can be reached at magikben@mac.com.