Jan 00 Getting Started
Volume Number: 16 (2000)
Issue Number: 1
Column Tag: Getting Started
Getting Started
by Dan Parks Sydow
An introduction to including speech in a Macintosh program
In February and March of last year, Getting Started discussed how a Macintosh program plays pre-recorded digitized sounds. In April's article we looked at how a program can record and later play back sounds - including speech. But there's another way - an easier and disk space-saving way - to give your Mac application speech capabilities. This month we'll look at the Speech Manager and how its functions allow your program to easily generate spoken words. In the example program you'll see that the words that are to be spoken can be supplied in a variety of ways: either by hard-coding them in your program, including them in strings in a resource, or by allowing the user to enter them.
Speech Basics
A program that includes the ability to generate synthesized speech - speech that results from the conversion of text to spoken sound - uses the Speech Manager. The Speech Manager accomplishes this with the assistance of a speech synthesizer. The Speech Manager passes text to a synthesizer. It is the synthesizer's built-in dictionaries and sets of pronunciation rules that enable text to be processed and turned into recognizable speech. After the synthesizer does its job, it passes_the converted data to the Sound Manager for output to the Mac's audio hardware.
The Speech Manager, speech synthesizers, and the Sound Manager are all system software. As is often the case in Mac programming, to achieve your results you won't have to know all the details of the system software that does the dirty work. Instead, you'll become familiar with the Toolbox functions that serve as your interface between your programming ideas and low-level code that carries out your wishes.
Initializations
Most programs don't make use of speech, so the function prototypes for the speech-related Toolbox functions may not automatically be included in your CodeWarrior project. To ensure that they are, include the Speech.h universal header file near the top of your source code:
#include <Speech.h>
If your program is to make use of a speech synthesizer, it needs to verify that the host computer supports speech. To do that, begin with a call to Gestalt():
OSErr err;
long response;
long mask;
err = Gestalt( gestaltSpeechAttr, &response );
if ( err != noErr )
DoError( "\pError calling Gestalt" );
If Gestalt() does its work without fail, it puts a value of noErr in the OSErr variable err. Then it's time to examine the value returned in response:
mask = 1 << gestaltSpeechMgrPresent;
if ( response & mask == 0 )
DoError( “\pSpeech Manager not present “ );
The gestaltSpeechAttr selector code results in Gestalt() returning more than one piece of information in response. Here we're only interested in whether the system the program is running on has the Speech Manager present. That information is held in one bit in the response parameter - the bit defined by the Apple-defined constant gestaltSpeechMgrPresent. To determine if one particular bit in a variable is set (on), a logical AND operation is necessary. In the previous snippet we're looking to see if the bit number defined by the constant gestaltSpeechMgrPresent is set in the response value returned by Gestalt(). Some bit-shifting sets up mask so that it can be used in the test of response. If the logical AND operation results in a non-zero value, then the test passes and we know the user's machine supports speech. If the operation results in a value of 0, DoError() is invoked to post an error message and to exit.
Getting used to bit shifting takes some people a little time and practice. If the above explanation isn't clear, read this extra explanation.
A function may return a variable that embeds several pieces of information in that one variable. It does this by having different bits in the variable represent different things. That is, several or all of the individual bits in the byte or bytes of the variable tell the program whether certain things are true or not. A variable of this type may consist of any number of bits. If such a variable is declared to be of type short, it is eight bits, so it can hold up to eight pieces of information (often referred to as flags). If the variable is declared to be of type long, it is 32 bits, so it can hold up to 32 flags. Consider if the eight bits of such a variable look like this:
00100011
Bit numbering is from right to left, with the first bit considered bit number 0. In this example bit 0 has a value of 1, bit 1 has a value of 1, bit 2 has a value of 0, and so forth. If we want to know the value held in bit 5, we'd need to look at the sixth bit from the right. Starting with the rightmost bit and counting from zero we see that bit 5, the sixth bit, has a value of 1. You can easily see this, but the program needs to use a mask and the logical AND operation to determine this. By placing a value of 1 in bit 5 (the sixth bit) of a mask variable, and then ANDing that mask with a variable that holds a number of flags, the value of only the sixth bit of the variable is revealed. Here the above variable is AND'ed with a mask that has just the sixth bit set:
00100011 (variable value)
& 00100000 (mask value)
00100000 (result of AND)
An AND operation looks at the corresponding bits of two operands and returns a value of 1 for that bit position if and only if both bits have a value of 1. In the above example, only bit 5, the sixth bit from the right, has a value of 1 in both operands. That's shown in the result, which has only a single value of 1. To make use of our result we simply check to see if the overall result is greater than or equal to 0. If the bit of interest was 0, then the result of the AND operation would be 0. If the bit of interest was 1, then the result of the AND operation would be greater than 0 (the exact value of the result would depend on which bit position held the value of 1, but in any case the result would be greater than 0. Refer back to the speech check to see that we don't care about the exact value of the result of the AND operation - we only care whether the value of the result is or isn't 0.
Speaking a String
The SpeakString() function is a Toolbox routine that makes speech easy to accomplish. Pass this function a Pascal-formatted string and SpeakString() sees to it that the text that makes up that string turns into speech that emits from the user's Mac. Here's an example:
OSErr err;
err = SpeakString( "\pThis is a test." );
The string that's passed to SpeakString() can also be in the form of a string variable or constant, as in:
Str255 theString = "\pThis is another test." );
err = SpeakString( theString );
It's good programming practice to check the OSErr value that SpeakString() returns. Your program can handle an error as it sees fit - I'll handle such an error by passing a descriptive message to my own DoError() error-handling function:
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
The Speech Manager produces asynchronous speech. Asynchronous speech means that before a call to SpeakString() completes, control is returned to your program. That means that the code following a call to SpeakString() could very well start executing before a spoken phrase is completed. That might be okay, but it could also produce unintended results. For instance, if your program speaks two phrases in a row, both phrases may not be heard. To avoid this situation, force your program to generate synchronous speech. That is, have your program speak a string only after a previous string has been completely spoken. To accomplish this, have your program enter a "do-nothing" loop after speech starts. You can easily determine when this loop should terminate by using the Speech Manager function SpeechBusy():
while ( SpeechBusy() == true )
;
Because the SpeakString() generates speech asynchronously, the while loop begins executing almost immediately after SpeakString() begins executing. SpeechBusy() returns the number of active speech channels. A speech channel is a data structure that keeps track of traits of speech, such as the voice to be used for speaking. When SpeakString() executes, it opens a speech channel. When SpeakString() completes, it closes that channel. If SpeechBusy() is called while SpeakString() is speaking text, SpeechBusy() returns a value of 1 (or more than 1 if other speech channels happen to be active at the same time). When SpeakString() completes, a call to SpeechBusy() returns a value of 0. Knowing this, you can opt to replace the above snippet with the following code (my preference is to use true for clarity, but the choice is yours to make)
while ( SpeechBusy() > 0 )
;
While SpeakString() is executing, SpeechBusy() keeps returning a value of 1 and the while loop keeps cycling. Only when speech completes will SpeechBusy() return a value of 0, and only then will the while loop terminate to let the program go on its way.
Let's put the above code snippets together to see just how a complete string of text is spoken:
OSErr err;
err = SpeakString( "\pThis is a test." );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
SpeechIntro
This month's program is SpeechIntro. As has become our habit, the example program doesn't make use of menus. Instead, the focus is on the speech-related code rather than menu and event code. SpeechIntro demonstrates how a program can generate speech from hard-coded text, user-entered text, and resource-supplied text. When run, SpeechIntro starts by speaking two phrases. Just after launching SpeechIntro you'll hear your Mac say "This is the first phrase." A moment later your computer says "And this is the second phrase." After that a dialog box like the one shown in Figure 1 appears. Here you supply the text to speak. Enter a few words, then click the Speak button to hear those words. When finished, click the Done button to dismiss the dialog box and to bring on a second dialog box. This one (shown in Figure 2), lets you hear speech that originates as string resources. After clicking the Speak Short String and Speak Long String buttons, click the Quit button to end the program.
Figure 1.Entering a phrase for SpeechIntro to speak.
Figure 2.Listening to speech generated from resources.
Creating the SpeechIntro Resources
Begin by creating a new folder named SpeechIntro in your CodeWarrior development folder. Start up ResEdit and then create a new resource file named SpeechIntro.rsrc. Specify that the SpeechIntro folder serve as the resource file's destination. This resource file will hold resources of the types shown in Figure 3.
Figure 3.The SpeechIntro resources.
The resource file will hold one alert resource - ALRT 128. Corresponding to this ALRT is DITL 128. Together these two resources define the program's error-handling alert. The resource file also holds two dialog resources - DLOG 129 and 130 and a DITL resource for each DLOG (DITL 129 and 130). Figure 4 shows DLOG 129 - the DLOG resource that's used to define the look of the dialog box shown back in Figure 1. Note that the type of dialog box, and its exact size and screen placement, aren't relevant. Figure 5 shows the corresponding DITL resource.
Figure 4.The DLOG for the text input dialog box.
Figure 5.The DITL for the text input dialog box.
DLOG 130 defines the look of the program's second dialog box - the one shown back in Figure 2. Again, the type, size, and placement of the dialog box resulting from DLOG 130 aren't critical. Figure 6 shows the DITL that corresponds to DLOG 130.
Figure 6.The DITL for the resource-supplied text dialog box.
Clicking on either the Speak Short String or Speak Long String in the dialog box shown in Figure 2 results in the program reading a string from one string list resource - STR# 128. Figure 7 shows the two strings that reside in this one resource.
Figure 7.The STR# used by the resource-supplied text dialog box.
Creating the SpeechIntro Project
Create a new project by launching CodeWarrior and then choosing New Project from the File menu. Use the MacOS:C_C++:MacOS Toolbox:MacOS Toolbox Multi-Target project stationary for the new project. Uncheck the Create Folder check box before clicking the OK button. Now name the project SpeechIntro.mcp, and make sure the project's destination is the SpeechIntro folder.
Next, add the SpeechIntro.rsrc resource file to the project. Remove the SillyBalls.rsrc file. Feel free to remove the ANSI Libraries folder from the project window if you want, as this project doesn't use any of these libraries.
If you plan on making a PowerPC version (or fat version) of the SpeechIntro program, make sure to add the SpeechLib library to the PowerPC targets of your project. Choose Add Files from the Project menu and maneuver on over to this library. You'll find it in the Metrowerks CodeWarrior:MacOS Support:Libraries:MacOS Common folder. If you don't find the library in that folder, use Sherlock to search your hard drive for it. When you add the library to the project CodeWarrior displays a dialog box asking you which targets to add the library to. Check the two PPC targets.
Now create a new source code window by choosing New from the File menu.. Save the window, giving it the name SpeechIntro.c. Choose Add Window from the Project menu to add the new empty file to the project. Remove the SillyBalls.c placeholder file from the project window. Now you're all set to type in the source code.
If you want to save yourself some work, connect to the Internet and head over to MacTech's ftp site at ftp://ftp.mactech.com/src/mactech/volume16_2000/16.01.sit. There you'll find the SpeechIntro source code file available for downloading.
Walking Through the Source Code
SpeechIntro.c starts with the inclusion of the Speech.h file. If you attempt to compile the file and you receive a number of errors related to undefined functions, then you forgot to include this universal header file.
/********************** includes *********************/
#include <Speech.h>
After the #include come a number of constants. The constant kALRTResID defines the ID of the ALRT resource used to define the error-handling alert. Constant kDLOGUserResID defines the ID of the DLOG resource used for the dialog box that accepts user input. The constants kSpeakButton, kDoneButton, and kPhraseEdit are constants denoting the item numbers of the three items in the user-input dialog box. The constant kDLOGStringsResID defines the ID of the DLOG resource used for the dialog box that makes use of resource-supplied strings. The constants kSpeakShortButton, kSpeakLongButton, and kQuitButton correspond to the item numbers of the three items in that dialog box. The program's two strings are stored in a STR# resource with an ID of kStringListResID. The two strings are items number kShortStrIndex and kLongStrIndex in the string list resource.
/********************* constants *********************/
#define kALRTResID 128
#define kDLOGUserResID 129
#define kSpeakButton 1
#define kDoneButton 2
#define kPhraseEdit 3
#define kDLOGStringsResID 130
#define kSpeakShortButton 1
#define kSpeakLongButton 2
#define kQuitButton 3
#define kStringListResID 128
#define kShortStrIndex 1
#define kLongStrIndex 2
Next come the program's function prototypes.
/********************* functions *********************/
void ToolBoxInit( void );
void SpeakCodeStrings( void );
void SpeakUserInputStrings( void );
void SpeakResourceStrings( void );
void DoError( Str255 errorString );
The main() function of SpeechIntro starts with the declaration of two variables that are used in the determination of whether speech generation is possible on the host computer. After the Toolbox is initialized and the speech-related tests are made, three application-defined functions are invoked. SpeakCodeStrings() demonstrates how a program speaks text that originates as strings hard-coded into the program. SpeakUserInputStrings() provides an example of how a program accepts text from the user and then speaks that text. Finally, SpeakResourceStrings() shows how a program can load, then speak, the text from items in a string list resource.
/********************** main *************************/
void main( void )
{
OSErr err;
long response;
long mask;
ToolBoxInit();
err = Gestalt( gestaltSpeechAttr, &response );
if ( err != noErr )
DoError( "\pError calling Gestalt" );
mask = 1 << gestaltSpeechMgrPresent;
if ( response & mask == 0 )
DoError( "\pSpeech Manager not present " );
SpeakCodeStrings();
SpeakUserInputStrings();
SpeakResourceStrings();
}
ToolBoxInit() remains the same as previous versions.
/******************** ToolBoxInit ********************/
void ToolBoxInit( void )
{
InitGraf( &qd.thePort );
InitFonts();
InitWindows();
InitMenus();
TEInit();
InitDialogs( nil );
InitCursor();
}
SpeakCodeStrings() uses calls to SpeakString() to speak text, and SpeechBusy() to time the start of one speech with the end of another. The code in SpeakCodeStrings() was covered in detail earlier in this article.
/***************** SpeakCodeStrings ******************/
void SpeakCodeStrings( void )
{
OSErr err;
err = SpeakString( "\pThis is the first phrase." );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
err = SpeakString( "\pAnd this is the second phrase." );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
}
SpeakUserInputStrings() opens and controls a standard modal dialog box. If you've ever included a non-movable dialog box in any of your own programs, or if you read last month's Getting Started article on QuickTime, then most of the SpeakUserInputStrings() code should look familiar to you. The function begins with a number of local variable declarations. After that the dialog box is opened and displayed:
/*************** SpeakUserInputStrings ***************/
void SpeakUserInputStrings( void )
{
DialogPtr theDialog;
Boolean done = false;
short item;
short type;
Handle handle;
Rect rect;
Str255 theString;
OSErr err;
theDialog = GetNewDialog( kDLOGUserResID, nil, (WindowPtr)-1L );
ShowWindow( theDialog );
SetPort( theDialog );
Next, SpeakUserInputStrings() enters a loop that executes until the dialog box Done button is clicked. When the user clicks the Done button, the local variable done gets set to true, the loop ends, and the dialog box is dismissed. Each call to the Toolbox function ModalDialog() returns the item number of a clicked on item (if in fact any item is clicked on). This item number is used in a switch statement to determine how the mouse click is to be handled.
while ( done == false )
{
ModalDialog( nil, &item );
switch ( item )
{
case kSpeakButton:
GetDialogItem( theDialog, kPhraseEdit, &type,
&handle, &rect );
GetDialogItemText( handle, theString );
err = SpeakString( theString );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
break;
case kDoneButton:
done = true;
break;
}
}
DisposeDialog( theDialog );
}
A click on the Speak button results in a call to the Toolbox function GetDialogItem(). This function returns information about the clicked-on item, including a handle to the item. We're interested in that handle, which we use in a call to GetDialogItemText(). This routine gets a copy of the text that's currently in the text item referenced by the supplied handle. This routine also returns that text to the program in the string variable theString. The text in this string is then spoken by making a call to the very handy SpeakString() function.
A click on the Done button dismisses this dialog box. It also returns control to main(), where a call to SpeakResourceStrings() is made.
The SpeakResourceStrings() routine is set up similar to the SpeakUserInputStrings() function: several local variables are declared, a dialog box is opened, and a loop is performed to check for mouse button clicks on items in the dialog box.
/**************** SpeakResourceStrings ***************/
void SpeakResourceStrings( void )
{
DialogPtr theDialog;
short item;
Boolean done = false;
Str255 theString;
OSErr err;
theDialog = GetNewDialog( kDLOGStringsResID, nil, (WindowPtr)-1L );
ShowWindow( theDialog );
SetPort( theDialog );
while ( done == false )
{
ModalDialog( nil, &item );
switch ( item )
{
case kSpeakShortButton:
GetIndString( theString, kStringListResID,
kShortStrIndex );
err = SpeakString( theString );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
break;
case kSpeakLongButton:
GetIndString( theString, kStringListResID,
kLongStrIndex );
err = SpeakString( theString );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
break;
case kQuitButton:
done = true;
break;
}
}
DisposeDialog( theDialog );
}
SpeakResourceStrings() relies on the Toolbox routine GetIndString(). If the user clicks on the Speak Short String button, the kSpeakShortButton case section executes. Here a call to GetIndString() returns the string that's defined as the first item in the program's string list resource. That returned string is then passed to the SpeakString() function to be spoken. A click on the Speak Long String button results in a similar action - the difference being that the program ends up speaking the second of the two strings stored in the program's STR# resource. When the user is finished experimenting with this dialog box a click on the Quit button dismisses the dialog box and ends the program.
DoError() is unchanged from prior versions. A call to this function results in the posting of an alert that holds an error message. After the alert is dismissed the program ends.
/********************** DoError **********************/
void DoError( Str255 errorString )
{
ParamText( errorString, "\p", "\p", "\p" );
StopAlert( kALRTResID, nil );
ExitToShell();
}
Running SpeechIntro
Run SpeechIntro by choosing Run from CodeWarrior's Project menu. After the code is compiled, CodeWarrior runs the program. After the program speaks two phrases, the dialog box shown back in Figure 1 appears. After entering some text to speak, and then clicking the Speak button, you should be satisfied that user-entered text can result in spoken words. After clicking the Done button you see the dialog box shown back in Figure 2. Click the buttons to hear the text stored in the program's resource file. When finished, click the Quit button to exit the program.
Till Next Month...
Like movie-playing, speech is a nifty technology that adds flair to your program. Speech won't always be necessary, but when planning out your next project you should closely examine the things your program is to do, and see if any parts of the program could be enhanced by the inclusion of speech.
In this article we've touched on the basics of speech. As you can imagine, there's much more to Apple's implementation of speech, including speech channels that allow for variations in speech, and voices. Next month we'll explore these topics so that your program can really take advantage of the power of the spoken word...