Diagnose Viruses
Volume Number: | | 6
|
Issue Number: | | 5
|
Column Tag: | | Programmer's Workshop
|
Programmer, Heal Thyself
Diagnosing Virus Infections
By Mike Morton, University of Hawaii
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Introduction
Macintosh users have a variety of defenses against infection by computer viruses. Public-domain, shareware, and commercial applications are available to prevent, detect, and repair virus infection. But most of these solutions require some effort on the users part, and users who have never had their software infected are (understandably) not motivated to use these tools. This article describes how your application can detect if its been infected, and gives sample code in Think C. Self-diagnosis is not the solution, but it can be part of it -- it helps alert users early so they can start repairing their disks as soon as possible.
Many thanks to Andrew Levin and virus maven John Norstad for their comments on this project.
Checksums
All known Macintosh viruses work by adding new CODE resources or expanding existing CODE resources. (Of course, there might be other methods of infection, but why think up new methods to encourage the turkeys who write viruses?) At first, I thought the obvious method to detect changed code was with a checksum: just add up all the bytes in all the code resources. There are a couple of problems with this
First, your code is going to look something like:
/* 1 */
#define EXPECTED_SUM 12345
short actualSum;
actualSum = sum_of_CODE ();
if (actualSum != EXPECTED_SUM)
virus_alert (actualSum);
During development, you dont know the sum. When you run the application, itll compute the actual checksum and display it. So you change the #define and recompile the code. But now the code is different (because the constant is different), and sums to a new value. There might be no end of this cycle, because the checksum includes its own value. There are ways around this moving-target problem, but its still a hassle.
Second, lets assume that the detection technique described here becomes standard. A widely-used technique is likely to become the victim of viruses designed specifically to thwart it. A clever virus could attach CODE which is checksum-neutral -- the additional bytes sum to zero. No matter what the checksum algorithm is, you can extend the summed data with virus code and then append constants to keep the checksum the same. Sure, you can make the checksum method more and more convoluted, but this only encourages virus authors and wastes everyones time.
Checksums and related check-methods are intended to catch unintentional changes made to data, and you can make the chance of undetected errors vanishingly small. But I dont think theyre always appropriate against malicious changes.
Length sums
Instead, you can make life much harder for the virus by taking advantage of the fact that it must somehow add code to an application. Its hard for virus code to replace the existing code in an application without crippling it. So what if you sum the total length of the CODE resources? This detects the changes wrought by every known Macintosh virus, and doesnt suffer from the self-reference problem (where changing the expected checksum also changes the real checksum).
Of course, you can sum other types of resources besides CODE. You probably dont want to checksum things like MENU or ICON, since that would prevent users from legitimately changing them. But if you store executable code in a custom control -- a CDEF resource -- for instance, you might want to check for tampering with those resources.
Using the diagnosis routines
The diagnostic code (see the source listing for virusCheck.c and also details in virusCheck.h) is fairly simple, but there are some tricks to calling it.
Probably the ugliest problem shows up under Think C, or any other development system which runs applications inside it. If you run an application inside Think C, and you use a resource file (named project.rsrc), the current resource file will be this file -- not your application -- when the application starts up. So, the first trick is never to try to do diagnosis in this environment. If you use such a resource file (and it has no CODE resources of its own), a quick-and-dirty check for whether the application is standalone is to do this at startup:
/* 2 */
if (Count1Resources (CODE) == 0)
were under Think C
else were standalone
The demo application checks this and refuses to continue running inside Think C. Youll probably want something in your application like:
/* 3 */
if (Count1Resources (CODE))
{ check for viruses
}
The simplest diagnosis routine is vCodeCheck (), which checks if the CODE resources in the current resource file look right -- whether there are the right number of them and they have the right total length. You pass the count and length you expect, and a flag to say whether it should report errors with a debugger. A sample invocation might be:
/* 4 */
/* Expected values: */
#define COUNT 10
#define LENGTH 10000
if (Count1Resources (CODE))
if (vCodeCheck (COUNT, LENGTH, 1))
Alert ( );
Of course, the count of code segments and the total length of them will be different for your application. Make up any values you like for the first time around. (In Think C, the count of code segments seems to be two greater than the number of segments visible in the project -- one for the jump table, one for initialization?) Build a standalone application and run it with a debugger installed. If the values arent right (and theyre probably not), the diagnosis routines will spit out messages in your debugger like:
Got count right for resource type CODE instead of wrong
and
Got length right for resource type CODE instead of wrong
Youll also probably get the message from your Alert call, since the function will report an infection.
Go back to the source, and change the #defines for the count and length to be the right values, from the debugger output. Then set the last argument to vCodeCheck () to be -1, to prevent debugger messages. Lastly, build your application and run it again. You shouldnt see any message from the debugger or the Alert call.
[Why -1 and 1 for the flag instead of true and false? Because the code which Think C emits to pass true and false differs in length. By turning off debugger output, this would change the length of the CODE Most compilers will pass 1 and -1 with code of the same length. Sorry ]
You should do this every time youre ready to distribute a standalone version of the application. If you build new versions often, you might want to conditionally compile out the code to save a lot of time, then put it back before shipping.
Be sure to test that it diagnoses correctly. Make a couple of working copies of it; use ResEdit to add a CODE resource into one and type a few extra bytes onto the end of a CODE resource in another. Launch both and make sure they diagnose themselves. Note that some development environments (like Think C) will lock and/or protect resources, and you may need to undo this protection before altering them with ResEdit.
Lastly, a slightly more general form of checking for any kind of resource can be had with the vResCheck () function. This is identical to vCodeCheck (), except that you pass the type of resource as a first argument. You may want to use this for resources of type CDEF, LDEF, WDEF, or in general anything which contains executable code.
Notes on the code
The file virusCheck.h contains important notes on using this technique, and you should at least skim it. A couple of especially important notes:
It suggests a few ways to make non-functional changes to the code, so that viruses wont evolve which recognize and disable the code.
You must verify that the System is 3.2 or newer, or that the ROM is Mac Plus or newer.
You can call the check function from more than one place at more than one time.
Theres very little interesting in virusCheck.c -- just a straightforward walk through the resources in the applications resource file. Note that in case of errors, the function chooses to be conservative and reports this as an infection.
One interesting problem which I havent resolved is whether all the GetResource calls will have any significant effect. The function sets ResLoad to false, so no resources are actually loaded. But some proofreaders have pointed out that extra master pointers are allocated. For CODE resources, this is no big deal -- your applications main loop will probably unload them anyway. But Id be interested in comments on what to do here to minimize side-effects.
This is pretty generic Mac code, and it should port easily to any other C system, and reasonably to any language which can access the Toolbox. Probably the thing youll have to watch out for the most is how constants are compiled. For instance, as pointed out above, the Think C version of this cant use true and false because they compile to different lengths.
Notes on the sample application
The demonstration program is pretty simple. It sets the count and length of CODE resources with #defines, and calls vCodeCheck (). (First, it asks if you want errors reported via the debugger, so you can calibrate the #defined values for your version of the C compiler and libraries.) Then it calls the function two more times to make sure it detects an incorrect count and length of CODE. Thats all!
It doesnt demonstrate checking of types of resources other than CODE, but thats pretty straightforward.
Other methods
You can do things to protect your application at the time you build it, as well as at the time it runs.
John Norstad, principal author of Disinfectant, suggests that you mark resources as protected and/or locked -- Think C already does this for you.
In Disinfectant, John wants to discourage any changes at all, so he:
marks the applications resource map reade only
checksums all resources at startup
(this means you cant modify menus, etc.)
marks the application shared and locked
Of course, the usual defenses are still important -- keep several up-to-date virus-diagnosis applications on a locked floppy disk and inspect all your disks regularly. If youre producing a commercial product, do everything you can think of -- twice -- to check the master disk of an application you are about to ship.
You may also want to manually inspect your application with ResEdit, to help spot new kinds of viruses.
Breaking the news
So, your application thinks its infected. Should you refuse to run? I think not -- what if some weird INIT or some change in system software is tripping you up? You might not really be infected. And the user may desperately need to use your application. (AppleLink refuses to run, but thats because it might transmit an infection.) You might choose to offer the user the option of continuing or quitting, but in a recent application, I stuck with this terse explanation:
Its probably not worth suggesting an application to repair the infection, since theres no easy way to know if the virus is too recent.
Above all, dont be flippant -- some users have no idea what a virus is or what it means. Make it clear that they should get help from an expert. If your application includes a manual, you might want to devote a little space to recommending a user group or two for help. Also, to be on the safe side, if your manual describes the self-diagnosis feature, remind the user that self-diagnosis shouldnt be relied upon as a substitute for more general diagnosis applications, which are regularly updated.
The last word
In the war on biological viruses, new antibiotics eventually bring about the evolution of resistant strains. The same is true of software defenses: copy-protection, system security, and virus diagnosis/repair all are made obsolete, though by intentional evolution instead of haphazard selection.
Apples Developer Technical Support even declines to offer much in the way of specific help. A note I got from them in October 1989 said, Supporting an anti-Virus procedure in an application is something that MacDTS will not be able to support. It is similar to attempting to support copy protection: it will be a never ending battle.
I agree that an officially-supported diagnosis or protection method is a bad idea, but not for the same reason. A never-ending battle per se isnt a problem (does Apple quit competing with IBM just because theres no end in sight?). But any standard solution invites attacks targeted specifically for its methods. While I hope the virusCheck functions see wide use, I hope that people will modify them to prevent viruses from recognizing them -- and improve them to detect future strains.
Evolving new diagnosis and protection techniques, remember, is only part of the picture. User education, careful administration of shared machines (such as in academic computing centers), and active efforts to find virus authors are needed. (But see Jim Matthews sobering letter in the September 89 Communications of the ACM for thoughts on overreacting to digital diseases.)
Its sad that some developers have to spend their time producing high-quality defenses against viruses, but sadder still that equally-talented developers waste everyones time because they dont have the maturity to put their abilities to good use. As Spock said to Trelane in the Star Trek episode The Squire of Gothos, I object to you. I object to intellect without discipline. I object to power without constructive purpose. Lets hope MacTutor doesnt need many follow-ups to this article.
Listing: virusCheck.h
/* virusCheck.h -- Functions for self-diagnosis of virus infections.
Copyright © 1989 by Michael S. Morton.
Special thanks to John Norstad and Andrew Levin for advice. You may
copy, alter, use, and distribute these routines if you leave this file
unchanged up to this line.
Think C version.
Notes:
----
You are STRONGLY urged to make non-functional changes to both C functions,
to discourage the invention of viruses which recognize this code and
disable it. Specifically:
- all parameters and local variables are now declared register; delete
the register keyword for randomly-chosen variables
- declare your own variables and pepper the code with assignments involving
them -- a = b+c/d*e+f. (Be sure to avoid division by 0.)
- reorder pairs of lines which are preceded by this comment: You can
swap the order of the next two lines
- test your application (see below) after all these changes
- remember that you can call this function from more than one place
in your application
To calibrate your application:
- set the applications calls to vResCheck or vCodeCheck to pass 1
for the report parameter
- build a standalone, double-clickable application
- make sure that you have a debugger installed which can intercept calls
to DebugStr () -- MacsBug or TMON will do
- run the application; if you get messages of the form: Got count CC
for resource type <type>, instead of <expected> Got length LL for resource
type <type>, instead of <expected> then change
the arguments in your calls to vCodeCheck() and vResCheck() to pass
CC for count and LL for long
To test your applications virus-detection:
- calibrate it as above
- change the applications calls to pass -1 for reporting
- build a standalone, double-clickable application
- use ResEdit to add a CODE resource from anywhere to your application
- launch the application and make sure it detects and reports infection
- delete the added CODE resource or build the application again
Both of these C functions require EITHER a Mac Plus or 512KE or later,
OR System file 3.2 or later (for the one deep resource calls).
The application must check this before calling these.
N.B.: Im not 100% sure that System 3.2 will work on 128K/512K ROM;
please try it if you expect your application to work on this configuration.
The report parameter takes 1 and -1, not 1 and 0, because many compilers
will compile a parameter 0 in less space than a parameter 1.
If these functions encounter an unexpected error, they act conservatively
and assume theres an infection.
If youre working under Think C, the checksum will be different depending
on whether your application is running as a project or as a standalone
application.
You may want to use this technique (invented by David Oster, I think)
which tests whether the Think C environment is present. It relies
on the fact that your projects resource file is the current
resource file when you start up, and it contains no CODE resources.
if (Count1Resources (CODE))-- are we standalone or project?
if (vCodeCheck ( )) -- standalone: do the check
{ } -- check failed: report virus
For this to work, you must have a resource file project.rsrc.
In certain obscure cases, you may find that changing the
arguments changes the code length. For example, changing:
vCodeCheck (3, 0L, 1);
to
vCodeCheck (3, 12345L, 1);
will do this. If this is a problem, move the constant out of the code
with something like:
static long expected = 12345L;
vCodeCheck (3, expected, 1);*/
#ifndef _virusCheck_ /* already seen this */
#define _virusCheck_ /* yes: dont define it again */
/* vCodeCheck -- Check for apparent alteration of CODE resources. Return
TRUE if the count/length do NOT match, meaning an apparent infection.*/
extern Boolean vCodeCheck (
short expectedCount,/* expected number of CODEs */
long expectedLen, /* expected total size of CODEs */
short report); /* >0 => report errors to developer */
/* vResCheck -- Check for apparent alteration of resources. Return TRUE
if the count/length do NOT match, meaning an apparent infection.*/
extern Boolean vResCheck (
ResType type, /* type of resource to sum */
short expectedCount,/* expected number of resources */
long expectedLen, /* expected total size of resources */
short report); /* >0 => report errors to developer */
#endif _virusCheck_
Listing: virusCheck.c
/* virusCheck.c -- Functions for self-diagnosis of virus infections.
Copyright © 1989 by Michael S. Morton.
You may copy, alter, use, and distribute these routines if you leave
this file unchanged up to this line.
See notes in the .h file.
Think C 3.0 version.*/
/* History:
26-Nov-89 -- MM --No longer needs strings library.
Various small documentation changes.
6-Nov-89 -- MM --First version.
Enhancements needed:
Should we do a ReleaseResource for some resources, to ditch the master
pointer which gets allocated only because we asked for it? How can we
know when to do this? For CODE resources, its not a big problem, since
there arent vast numbers of them.
Consider checking if the ROM/System file is recent enough for us */
#include virusCheck.h /* get our own prototypes */
/* Local prototypes: */
static void fail (char *kind, ResType type, long actual, long expected);
static void append (char **pPtr, char *s);
/* vResCheck -- Check that the resources of a specified type in the application
havent been altered. Return TRUE if theres apparent tampering. */
extern Boolean vResCheck (type, expectedCount, expectedLen, report)
register ResType type; /* INPUT: type of resource to sum */
register short expectedCount;
/* INPUT: expected number of resources */
register long expectedLen;
/* INPUT: exp. total len of resources */
register short report;
/* INPUT: >0 => report errors w/debugger */
{register short actCount;
/* actual count of rsrcs of this type */
register long actLen;/* actual total length of resources */
register Handle rsrc; /* resource to check */
register Boolean failFlag = false;
/* any problems encountered? */
register short oldResFile;
/* for preserving current resource file */
register Boolean oldResLoad;
/* for preserving ResLoad flag */
/*Switch to the applications resource file. Note that all resource
calls from here on are the one deep calls from Inside Mac, vol. IV.
*/
/*You can swap the order of the next two lines: */
oldResFile = CurResFile ();
/* remember initial resource file */
oldResLoad = ResLoad; /* remember ResLoad state */
/*You can swap the order of the next two lines: */
UseResFile (CurApRefNum);/* search application for resources */
SetResLoad (false); /* dont load resources right away */
/*You can swap the order of the next two lines: */
actLen = 0;/* initialize length */
actCount = Count1Resources (type);/* how many of this type are there?
*/
if (actCount != expectedCount)
{ if (report > 0) /* is the developer listening? */
fail (count, type, actCount, expectedCount);
failFlag = true;/* TAMPERING DETECTED */
} /* end of mismatched resource count */
while (actCount)/* loop actCount down to 1 */
{ /* Get the resources handle, but dont load it. */
rsrc = Get1IndResource (type, actCount);
/* see if its already in memory */
if (! rsrc)/* not available? */
{ if (report > 0) /* is the developer listening? */
DebugStr (\pResource not available!);
failFlag = true;
/* error detected; ASSUME TAMPERING */
goto EXIT; /* sorry, Dr. Dijkstra */
}
/*You can swap the order of the next two lines: */
actLen += SizeResource (rsrc); /* sum up length of rsrcs of this type
*/
--actCount;/* get next index number */
} /* end of loop through resources */
if (actLen != expectedLen)
{ if (report > 0) /* is the developer listening? */
fail (length, type, actLen, expectedLen);
failFlag = true; /* TAMPERING DETECTED */
}
EXIT: /* goto here on tampering or error */
/*You can swap the order of the next two lines: */
UseResFile (oldResFile); /* restore original resource file */
SetResLoad (oldResLoad); /* restore original loading state */
return failFlag;/* TRUE => error or tampering */
}/* end of vResCheck () */
/* vCodeCheck -- Check CODE resources havent been altered.*/
extern Boolean vCodeCheck (expectedCount, expectedLen, report)
register short expectedCount; /* expected number of CODEs */
register long expectedLen;/* expected total size of CODEs */
register short report;
/* INPUT: >0 => report errors w/debugger */
{
return vResCheck (CODE, expectedCount, expectedLen, report);
}/* end of vCodeCheck () */
/* fail -- dump a string like: Got <kind> <actual> for resource type
<type>, instead of <expected> */
static void fail (kind, type, actual, expected)
char *kind; /* INPUT: count or length */
ResType type; /* INPUT: resource type which failed */
long actual, expected; /* INPUT: counts or lengths */
{char buffer [100]; /* for accumulating output message */
char *bufp; /* pointer into buffer[] */
Str255 actualText, expectedText;
/* formatted from params */
union /* to get ResType to be like string */
{ char resName [5];
ResType theType;
} u;
NumToString ((long) actual, & actualText);
PtoCstr ((char *) & actualText);
NumToString ((long) expected, & expectedText);
PtoCstr ((char *) & expectedText);
u.theType = type; /* set up resource type */
u.resName [4] = \0; /* to be a NUL-ended C string */
bufp = buffer; /* point to output buffer */
append (& bufp, Got );
append (& bufp, kind);
append (& bufp, );
append (& bufp, (char *) & actualText);
append (& bufp, for resource type );
append (& bufp, u.resName);
append (& bufp, instead of );
append (& bufp, (char *) & expectedText);
*bufp++ = \0;
CtoPstr (buffer);
DebugStr (buffer);
}/* end of fail () */
/* append -- Append a string to an output buffer. This routine lets
us avoid pulling in the strings library. */
static void append (pPtr, s)
char **pPtr; /* UPDATE: VAR ptr to output */
register char *s; /* INPUT: string to append */
{register char *p; /* output ptr */
register char c;
p = *pPtr; /* pick up output pointer */
while (c = *s++)/* loop through all non-nulls */
*p++ = c;/* storing them in buffer */
*pPtr = p; /* return updated output pointer */
}/* end of append () */
Listing: virusDemo.c
/* virusDemo.c -- Demonstration of virus self-diagnosis.
Copyright © 1989 by Michael S. Morton. */
/* History: 26-Nov-89 -- MM --First version.*/
#include virusCheck.h /* get our own prototypes */
/* Local and C library prototypes: */
void main (void);
int printf (char *formatn, ...);
int getche (void);
/* NOTE: The total length of the CODE resources will vary with your development
system, so youll have to calibrate this.*/
#define CODELENGTH 15948L
/* actual length of CODE resources */
/* NOTE: The number of CODE resources will vary with how you arrange
your project. This count assumes that all source files and libraries
are grouped into a single segment. The count is higher because of the
resources which Think C adds.
*/
#define CODECOUNT (1+2) /* actual count of CODE resources */
#define WRONGLENGTH (CODELENGTH+1) /* guaranteed to fail */
#define WRONGCOUNT (CODECOUNT+1) /* ditto */
void main ()
{char dbgResponse; /* user response for debugger query */
short debugger; /* debugger installed? 1=y, -1=n */
if (! Count1Resources (CODE))
{ SysBeep (5);
printf (This demo must be run standalone, not within Think C\n);
printf (Press any key to exit );
getche ();
ExitToShell ();
}
printf (Report unexpected errors with the debugger? );
dbgResponse = getche ();
printf (\n);
if ((dbgResponse == y) || (dbgResponse == Y))
debugger = 1;
else debugger = -1;
printf (\nChecking -- shouldnt fail );
if (vCodeCheck (CODECOUNT, CODELENGTH, debugger))
printf (FAILED! *** This application is apparently infected. ***\n);
else printf (didnt fail.\n);
printf (\nChecking -- should fail on count );
if (vCodeCheck (WRONGCOUNT, CODELENGTH, -1))
printf (failed.\n);
else printf (DIDNT FAIL!\n);
printf (\nChecking -- should fail on length );
if (vCodeCheck (CODECOUNT, WRONGLENGTH, -1))
printf (failed.\n);
else printf (DIDNT FAIL!\n);
printf (\nPress any key to exit );
getche ();
}/* end of main () */