Mar 00 Getting Started
Volume Number: 16 (2000)
Issue Number: 3
Column Tag: Getting Started
Speech and Voices
by Dan Parks Sydow
Writing a Macintosh program that makes use of different voices
Two months ago, Getting Started introduced the topic of adding computer-generated speech to a Mac program. In that article you learned how to verify that the user's Mac is able to output speech and then use the Toolbox SpeakString() function to speak the text of one string. Last month's Getting Started article carried on with the topic of speech by demonstrating how a program can create a speech channel in order to speak more than a single string. This month we complete the trilogy of speech articles by seeing how a speech channel is used by a program that wants to alter the voice that's used in the generation of speech.
Speech Basics Review
A Mac program that is to speak should include the Speech.h universal header file to make sure that the compiler recognizes the speech-related Toolbox functions. The program should also verify that the user's Mac is able to generate speech. After that, a string of text can be spoken by calling the Toolbox function SpeakString(). A Pascal-formatted string is the only argument. After that, repeatedly call the Toolbox function SpeechBusy() to allow for the Mac to complete talking. If the following snippet doesn't look familiar, refer back to the January Getting Started article for more information.
#include <Speech.h>
OSErr err;
long response;
long mask;
err = Gestalt( gestaltSpeechAttr, &response );
if ( err != noErr )
DoError( "\pError calling Gestalt" );
mask = 1 << gestaltSpeechMgrPresent;
if ( response & mask == 0 )
DoError( "\pSpeech Manager not present " );
err = SpeakString( "\pThis is a test." );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
Speech Channels Review
Last month's Getting Started introduced speech channels. In order to specify which voice to use, you're program needs to allocate such a channel - so take a quick look at some of the code from last month.
A speech channel is a data structure that holds descriptive information about the voice that is to be used to speak. If you're program needs to alternate voices, you may want to create two or more speech channels - one for each voice. The Toolbox function NewSpeechChannel() creates a new speech channel record and returns a SpeechChannel - a pointer to that record.
SpeechChannel channel;
OSErr err;
err = NewSpeechChannel( nil, &channel );
The first NewSpeechChannel() argument is a pointer to a voice specification data structure. Passing a value of nil as the first argument results in assigning the system default voice as the voice to be used by speech that comes eventually comes from this new channel.
Voices
To this point we've relied on the default voice - the voice the user's Mac uses when a program doesn't specify a particular voice. As shown in Figure 1, the_user uses the Speech control panel to determine which voice is to be the default voice.
Figure 1.The Speech control panel.
As shown on the left of Figure 2, the Speech control panel includes a pop-up menu that lets the user choose a voice. And as shown on the right of Figure 2, each voice in the Speech control panel pop-up menu corresponds to a voice in the System Folder of the user's Mac (in the Voices folder in the Extensions folder, to be exact).
Figure 2.Each voice in the Speech control panel pop-up menu corresponds to a system voice file
As shown in Figure 2, voices each have a name. So it might seem logical that your program would choose a voice based on its name. However, such is not the case. Voice files can be added and removed from the Voices folder, and there's no way of ensuring that any one particular voice is present in the Voices folder of every Mac user. So rather than specifying a voice by name, your program should specify one or more characteristics the desired voice should have. After that your program can examine the voices present on the user's machine in order to find an appropriate voice. When a match is found, the voice can be associated with a speech channel, and subsequent text that is sent to that channel will be spoken using the desired voice.
The data that makes up any one voice can be placed in a VoiceDescription data structure. Such a structure has several fields, the most important of which are the gender and age fields. We'll look at these fields after first learning how a VoiceDescription is obtained.
Obtaining the Description of a Voice
To find a voice that meets your needs, your program will examine the voices in the Voices folder of the user's machine. Begin this chore by determining how many voices are in that folder. A call to the Toolbox function CountVoices() returns this number in its one argument. Use the number of voices as a loop counter, where the body of the loop examines each voice for a suitable match.
OSErr err;
short numVoices;
short i;
err = CountVoices( &numVoices);
for ( i = 1; i <= numVoices; i++ )
{
// obtain a voice description for one voice
}
Obtaining a voice takes two steps. The first step is to get a VoiceSpec for a voice. The VoiceSpec consists of an identification number of the speech synthesizer for which the voice was created, and an identification number for the voice itself. Any number of voices can share the same speech synthesizer ID, but each voice of those voices with the same speech synthesizer ID will have a voice ID that is unique. Passing the Toolbox function GetIndVoice() an index of a voice in the Voices folder results in GetIndVoice() returning a VoiceSpec for that one voice. We'll be placing the call to GetIndVoice() within the above for loop, so we can use the current loop counter as the index:
VoiceSpec theVoiceSpec;
err = GetIndVoice( i, &theVoiceSpec );
The VoiceSpec for a voice let's use access the VoiceDescription structure of that voice. Call the Toolbox function GetVoiceDescription() to do that.
VoiceDescription voiceDesc;
err = GetVoiceDescription( theVoiceSpec, &voiceDesc, sizeof( voiceDesc ) );
The first argument to GetVoiceDescription() is the just-obtained VoiceSpec. The second argument is the VoiceDescription structure, and will be filled in by GetVoiceDescription(). The third argument is the number of bytes in the returned VoiceDescription structure. Use the sizeof() function to get this value. Here's how the voice-selecting code looks - so far:
OSErr err;
short numVoices;
short i;
VoiceSpec theVoiceSpec;
VoiceDescription voiceDesc;
err = CountVoices( &numVoices);
for ( i = 1; i <= numVoices; i++ )
{
err = GetIndVoice( i, &theVoiceSpec );
err = GetVoiceDescription( theVoiceSpec, &voiceDesc, sizeof( voiceDesc ) );
// examine the characteristics of this one voice
}
Selecting a Voice Based On Characteristics
At this point we know how to determine the number of voices in the user's Voices folder and then get a VoiceDescription for each voice. Now we need to look at how we select a voice based on information in that VoiceDescription structure. For most programmers the most meaningful fields of this structure are the gender field and the age field.
The gender field can have one of three values, each represented by an Apple-defined constant. The kMale and kFemale gender values are self-explanatory. The third gender value, kNeuter, describes a voice that is robotic-sounding. The age field holds the approximate age that a speaker of a voice would have.
If you want your program to generate speech using a male voice, you'll be testing the gender field of a VoiceDescription structure. Assuming we've used the above method to obtain a VoiceDescription for a voice, that test looks like this:
if ( voiceDesc.gender == kMale )
If you want your program to generate speech using a voice of a teenager, then you'd test the age field of a VoiceDescription structure. Again assuming a VoiceDescription has already been obtained, the code to accomplish the task at hand looks like this:
if ( ( voiceDesc.age > 12 ) && ( voiceDesc.age < 20 ) )
// we've found a teenage voice
And if we want to specify a voice based on both the gender and the age? Combine both tests (which order you perform the tests in isn't important).
if ( ( voiceDesc.age > 12 ) && ( voiceDesc.age < 20 ) )
{
if ( voiceDesc.gender == kMale )
// we've found the voice of a teenage male
}
Using the Selected Voice
When we've found VoiceDescription information that matches our voice interests, we've found a suitable voice. If working from within a loop, now's the time to terminate the loop and make use of the VoiceSpec.
for ( i = 1; i <= numVoices; i++ )
{
err = GetIndVoice( i, &theVoiceSpec );
err = GetVoiceDescription( theVoiceSpec, &voiceDesc, sizeof( voiceDesc ) );
if ( ( voiceDesc.age > 12 ) && ( voiceDesc.age < 20 ) )
{
if ( voiceDesc.gender == kMale )
// use the VoiceSpec held in variable theVoiceSpec
}
}
Last month you saw the creation of a new speech channel handled in this manner:
SpeechChannel channel;
err = NewSpeechChannel( nil, &channel );
Using nil as the first argument meant the returned speech channel makes use of the default voice. If we now instead pass the just-obtained VoiceSpec, the speech channel will be created such that it makes use of the voice referenced by this VoiceSpec:
err = NewSpeechChannel( &theVoiceSpec, &channel );
Last month's Getting Started article introduced SpeakText(), the Toolbox function that speaks text from a buffer. The first argument to that routine is a SpeechChannel. Now that the channel variable references a speech channel that's associated with a specific voice, that voice (rather than the system default voice) will be used in the generation of the speech that SpeakText() emits.
err = SpeakText( channel, (Ptr)(str + 1), str[0] );
SpeechVoice
This month's program is SpeechVoice. When you run SpeechVoice you'll be presented with the dialog box pictured in Figure 3.
Figure 3.The SpeechVoice dialog box.
In the SpeechVoice dialog box you click on one of two radio buttons to select a voice to use, then click the Play Speech button to have the program speak the string "Is this the voice you're looking for?" using the selected voice. If the user's machine doesn't hold a voice that has the specified characteristics, the program instead uses the system default voice. You can repeat the process of choosing a voice and speaking the phrase as often as you wish. When finished, click the Quit button to end the program.
Creating the SpeechVoice Resources
Start your resource development by creating a new folder named SpeechVoice in your main CodeWarrior folder. Start ResEdit and create a new resource file named SpeechVoice.rsrc. Specify that the SpeechVoice folder act as the resource file's destination. This resource file requires four resources, two of which you're very familiar with: the one ALRT and one of the two DITLs. ALRT 128 and the corresponding DITL 128 together define the program's error-handling alert. If the SpeechVoice program doesn't experience a serious error while executing, then this alert won't be seen by the user.
The remaining two resources are DLOG 129 and its corresponding DITL 129. Together these resources define the dialog box shown back in Figure 3. The size, placement, and type of dialog that the DLOG defines aren't too critical, though it makes sense to select the type that doesn't have a close box (since the dialog is to remain on-screen until the user quits). Figure 4 shows the DITL that defines the type and placement of the items in the dialog box. To view the item numbers (as displayed in Figure 4), check Show Item Numbers from ResEdit's DITL menu. Take note of the item numbers - they'll appear as constants in the source code.
Figure 4.The SpeechVoice resources.
Creating the SpeechVoice Project
Create a new project by launching CodeWarrior and choosing New Project from the File menu. Use the MacOS:C_C++:MacOS Toolbox:MacOS Toolbox Multi-Target project stationary for the new project. Uncheck the Create Folder check box, then click the OK button. Name the project SpeechVoice.mcp and choose the existing SpeechVoice folder as the project's destination.
Add the SpeechVoice.rsrc resource file to the project, then remove the SillyBalls.rsrc file. You can remove the ANSI Libraries folder if you want as the project won't be making use of any ANSI C libraries.
If you plan on making a PowerPC version or fat version of the SpeechVoice program, add the SpeechLib library to the two PowerPC targets of your project. As mentioned in last month's Getting Started, you'll want to choose Add Files from the Project menu and then maneuver your way to this library. The most likely spot to find this library is in the Metrowerks CodeWarrior:MacOS Support:Libraries:MacOS Common folder. If it's not there, use Sherlock to search your hard drive. After adding the library to the project, CodeWarrior displays a dialog box asking you which targets to add the library to. Check the two PPC targets.
Now create a new source code window by choosing New from the File menu.. When you save the window, give it the name SpeechVoice.c. Choose Add Window from the Project menu to add the new empty file to the project. Now remove the SillyBalls.c placeholder file from the project window. You're all set to type in the source code.
If you want to save yourself a little typing, connect to the Internet and visit MacTech's ftp site at ftp://ftp.mactech.com/src/mactech/volume16_2000/16.03.sit. There you'll find the SpeechVoice source code file available for downloading.
Walking Through the Source Code
SpeechVoice.c begins with the inclusion of the Speech.h file:
/********************** includes *********************/
#include <Speech.h>
After the #include comes a number of constants, most of which are resource-related. The constant kALRTResID holds the ID of the ALRT resource used to define the error-handling alert. kDLOGResID holds the ID of the DLOG resource used to define the program's dialog box. The constants kPlaySpeechButton, kSetSpeechWomanRadio, kSetSpeechRobotRadio, and kQuitButton each hold the item number of one of the items in the dialog box (compare these constants to the items in DITL 129 as shown back in Figure 4). The constant kNoMatchingVoiceErr is our own (arbitrary) value that we'll use in the event the program fails in its attempt to find an appropriate voice. The constants kControlOn and kControlOff are used in the turning on and off of the dialog box radio buttons.
/********************* constants *********************/
#define kALRTResID 128
#define kDLOGResID 129
#define kPlaySpeechButton 1
#define kSetSpeechWomanRadio 2
#define kSetSpeechRobotRadio 3
#define kQuitButton 4
#define kNoMatchingVoiceErr -999
#define kControlOn 1
#define kControlOff 0
Next come the program's function prototypes.
/********************* functions *********************/
void ToolBoxInit( void );
void OpenSpeechDialog( void );
SpeechChannel OpenOneSpeechChannel( VoiceSpec );
OSErr GetVoiceSpecBasedOnAgeGender( VoiceSpec *,
short, short, short );
void DoError( Str255 errorString );
The main() function of SpeechVoice begins with the declaration of three variables, all of which are used in the determination of whether speech generation is possible on the user's Mac.
/********************** main *************************/
void main( void )
{
OSErr err;
long response;
long mask;
After the Toolbox is initialized the speech-related tests (as described in January's Getting Started) are made. After that main ends with a call to the application-defined function OpenSpeechDialog().
ToolBoxInit();
err = Gestalt( gestaltSpeechAttr, &response );
if ( err != noErr )
DoError( "\pError calling Gestalt" );
mask = 1 << gestaltSpeechMgrPresent;
if ( response & mask == 0 )
DoError( "\pSpeech Manager not present " );
OpenSpeechDialog();
}
ToolBoxInit() remains the same as previous versions.
/******************** ToolBoxInit ********************/
void ToolBoxInit( void )
{
InitGraf( &qd.thePort );
InitFonts();
InitWindows();
InitMenus();
TEInit();
InitDialogs( nil );
InitCursor();
}
OpenSpeechDialog() is responsible for opening, displaying, and monitoring the program's dialog box. It's also responsible for generating speech. The function begins with a host of variable declarations, each to be discussed at the time it's used in the function:
DialogPtr dialog;
short oldRadio;
short type;
Handle handle;
Rect rect;
short item;
Boolean done = false;
OSErr err;
SpeechChannel channel;
Str255 str = "\pIs this the voice you're looking for?";
short ageLow;
short ageHigh;
short gender;
VoiceSpec defaultVoiceSpec;
VoiceSpec theVoiceSpec;
VoiceDescription voiceDesc;
Creating a new dialog box, and saving a pointer to it in variable dialog, is the first order of business:
dialog = GetNewDialog( kDLOGResID, nil, (WindowPtr)-1L );
Both radio buttons will initially appear unselected, so it's up to us to turn one on. GetDialogItem() returns (among other pieces of information) a handle to the item named in the second argument. This generic handle is then typecast to a ControlHandle and used in a call to SetControlValue() to turn the radio button on. To let the routine know which radio button is now on, the variable oldRadio is set to match the item number of the just turned on button.
GetDialogItem( dialog, kSetSpeechWomanRadio, &type,
&handle, &rect );
SetControlValue( ( ControlHandle )handle, kControlOn );
oldRadio = kSetSpeechWomanRadio;
Now we'll call ShowWindow() to handle the case of a DLOG resource that specified that the dialog box be initially visible. A call to SetPort() ensures that the newly opened dialog box is the window that receives updating.
ShowWindow( dialog );
SetPort( dialog );
Before jumping into the loop that will watch for, and handle, the user's actions, we need to take one more preliminary step. As shown earlier in this article, the GetVoiceDescription() function is usually called with a VoiceSpec as the first argument. If a value of nil is passed instead, then a VoiceDescription for the system default voice is returned. We'll do that here in order to get, and save, the default voice. If our later attempts to find a particular voice fail, we'll use the system voice.
err = GetVoiceDescription( nil, &voiceDesc, sizeof( voiceDesc ) );
defaultVoiceSpec = voiceDesc.voice;
We now begin the while loop that executes until the user clicks the dialog box Quit button. When the user clicks on any one of the four items in the dialog box, ModalDialog() returns the item number of the clicked-on item. We use that returned value in a switch statement:
while ( done == false )
{
ModalDialog( nil, &item );
switch ( item )
{
If the user clicks on the radio button labeled Woman: 20 - 40 years, then the following code executes:
case kSetSpeechWomanRadio:
GetDialogItem( dialog, kSetSpeechWomanRadio, &type, &handle, &rect );
SetControlValue( (ControlHandle)handle, kControlOn );
GetDialogItem( dialog, oldRadio, &type, &handle, &rect );
SetControlValue( (ControlHandle)handle, kControlOff);
oldRadio = kSetSpeechWomanRadio;
gender = kFemale;
ageLow = 20;
ageHigh = 40;
break;
The above code turns the previously off button on, then turns the previously on button off (got that?). The newly turned on button is now considered the old button - which's information necessary for the next time a radio button is clicked. Three local variables, gender, ageLow, and ageHigh, are then set to values appropriate to the selected option. We'll use these values when the user eventually clicks the Play Speech button.
If the user instead clicks on the radio button labeled Robot: any age, then the following code executes. Recall that the Apple-defined constant kNeuter is used to specify a robotic voice. Since we've set up this option such that there's no age restrictions on the voice, we set the lowest acceptable age to 0 and the highest acceptable age to an arbitrarily large value (keeping in mind that a variable of type short can hold a value a little larger than 32,000).
case kSetSpeechRobotRadio:
GetDialogItem( dialog, kSetSpeechRobotRadio, &type, &handle, &rect );
SetControlValue( (ControlHandle)handle, kControlOn );
GetDialogItem( dialog, oldRadio, &type, &handle, &rect );
SetControlValue( (ControlHandle)handle, kControlOff);
oldRadio = kSetSpeechRobotRadio;
gender = kNeuter;
ageLow = 0;
ageHigh = 30000;
break;
Clicking either of the radio buttons doesn't cause a search for the desired voice to take place. Instead, we wait until the user clicks the Play Speech button. When the user takes that step, we call our application-defined function GetVoiceSpecBasedOnAgeGender(). When we pass this function the address of a VoiceSpec, a range of ages, and a gender, the function searches the system for a voice with matching characteristics and fills in the VoiceSpec variable with the voice specification for that voice (more on this function ahead):
case kPlaySpeechButton:
err = GetVoiceSpecBasedOnAgeGender( &theVoiceSpec, ageLow, ageHigh, gender );
GetVoiceSpecBasedOnAgeGender() is written such that a failed attempt to find a voice results in the returning of an error value of -999, or kNoMatchingVoiceErr. In such a situation we set our local theVoiceSpec variable to the previously saved default voice. For any other error value we instead call our own error-handling routine that displays a short message and terminates the program.
if ( err == kNoMatchingVoiceErr )
theVoiceSpec = defaultVoiceSpec;
else if ( err != noErr )
DoError( "\pError finding voice" );
Now it's time to open a new speech channel. In last month's Getting Started we wrote OpenOneSpeechChannel() to handle that task. Here we call a slightly modified version of that routine. In this new version we pass a VoiceSpec along for use in opening the channel. This is the VoiceSpec returned by GetVoiceSpecBasedOnAgeGender() (or the VoiceSpec of the default voice in the event that a matching voice wasn't found).
channel = OpenOneSpeechChannel( theVoiceSpec );
if ( channel == nil )
DoError( "\pError opening a speech channel" );
With a new speech channel open (and with that channel associated with the desired voice), it's time to test things out by speaking a phrase. Variable str holds the text to speak. The SpeakText() routine speaks the text of that string (refer to last month's Getting Started for the details on SpeakText()).
err = SpeakText( channel, (Ptr)(str + 1), str[0] );
if ( err != noErr )
DoError( "\pError attempting to speak a phrase" );
while ( SpeechBusy() == true )
;
err = DisposeSpeechChannel( channel );
if ( err != noErr )
DoError( "\pError disposing speech channel" );
break;
When the user is finished a click of the Quit button ends the dialog box loop, and ends the program.
case kQuitButton:
done = true;
break;
}
}
DisposeDialog( dialog );
}
OpenSpeechDialog() made calls to two application-defined routines: GetVoiceSpecBasedOnAgeGender() and OpenOneSpeechChannel(). Here's that first routine:
OSErr GetVoiceSpecBasedOnAgeGender(
VoiceSpec *theVoiceSpec,
short ageLow,
short ageHigh,
short gender )
{
OSErr err;
short numVoices;
short i;
VoiceDescription voiceDesc;
err = CountVoices( &numVoices );
if ( err != noErr )
return ( err );
for ( i = 1; i <= numVoices; i++ )
{
err = GetIndVoice( i, theVoiceSpec );
if ( err != noErr )
return ( err );
err = GetVoiceDescription( theVoiceSpec, &voiceDesc,
sizeof( voiceDesc ) );
if ( err != noErr )
return ( err );
if ( (voiceDesc.age >= ageLow) && (voiceDesc.age <= ageHigh) )
if ( voiceDesc.gender == gender )
return ( noErr );
}
return ( kNoMatchingVoiceErr );
}
We can get by with very little descriptive information about this routine because its code has already been described in this article. Recall that CountVoices() returns the number of voices in the user's system, and that number can then be used as a loop index. Each pass through the loop calls GetIndVoice() to obtain a voice specification for one voice, and then calls GetVoiceDescription() to get the voice description of that one voice. From the voice description it can be determined if a suitable match has been made. Here we look to see if the voice's age falls into the range of ageLow and ageHigh, and whether the voice is of the proper gender. If a match is made, return ends the loop. At this point theVoiceSpec holds the voice specification for a matching voice.
The second application-defined function called by OpenSpeechDialog() is OpenOneSpeechChannel(). As you saw in last month's article, this routine calls NewSpeechChannel() to create a new speech channel. Last month we passed nil as the first argument, telling NewSpeechChannel() to associate the system default voice with the channel. Here we pass in a voice specification and use that as the first argument, telling NewSpeechChannel() to associate this voice with the new channel.
/*************** OpenOneSpeechChannel ****************/
SpeechChannel OpenOneSpeechChannel( VoiceSpec theVoiceSpec )
{
SpeechChannel channel;
OSErr err;
err = NewSpeechChannel( &theVoiceSpec, &channel );
if ( err != noErr )
{
err = DisposeSpeechChannel( channel );
channel = nil;
}
return ( channel );
}
DoError() is unchanged from prior versions. A call to this function results in the posting of an alert that holds an error message. After the alert is dismissed the program ends.
/********************** DoError **********************/
void DoError( Str255 errorString )
{
ParamText( errorString, "\p", "\p", "\p" );
StopAlert( kALRTResID, nil );
ExitToShell();
}
Running SpeechVoice
Run SpeechVoice by choosing Run from CodeWarrior's Project menu. After the code is compiled, CodeWarrior launches the SpeechVoice program. After you're satisfied that the program does in fact select and use an appropriate voice, click the Quit button to quit.
Till Next Month...
Two months ago you saw how a Mac program can speak the text in a single string. Last month you learned how your program can use a buffer and a speech channel to speak larger amounts of text. And finally, this month you read up on how to select a voice and then use that voice in generating speech. You can learn still more about speech and voices by reading the Sound volume of Inside Macintosh. You can also learn more about voices by experimenting with the SpeechVoice code. For starters, try adding a third radio button that specifies different voice characteristics. By the time you have speech fully integrated into your own program, you'll be ready to read up on a new topic in next month's Getting Started article...