The Road to Code: Getting Oriented
Volume Number: 23 (2007)
Issue Number: 10
Column Tag: The Road to Code
The Road to Code: Getting Oriented
Introduction to
Object-Oriented Programming
by Dave Dribin
Introduction
So far in The Road to Code, we have gone over variables, functions, control statements, and dynamic memory allocation. Now it's time to start putting all these pieces of the puzzle together and go over one of the main tenets of modern computer programming: object-oriented programming. Remember that Apple's native programming language is called Objective-C, which provides object-oriented extensions to the C language. The kind of programming we have been doing in straight C is called procedural programming. Even though the C language uses functions, the academic name for these functions is procedures. Where the focus of procedural programming is on procedures and data structures, object-oriented programming focuses on objects. Since objects are an evolutionary extension to procedures and data structures, this article will cover using data structures in C. This will lay the groundwork for us to finally start writing Objective-C code in next month's article.
Data Structures
Throughout this article, we will be writing code that deals with geometric rectangles. This is not the sexiest topic, but it is often used as an introduction to object-oriented programming. The nice thing about rectangles is that everyone knows what they are, so they require very little explanation. Their properties also require only knowledge of simple arithmetic. In case your geometry is a bit rusty, here's a simple diagram of a rectangle in the Cartesian coordinate system:
Figure 1: Rectangle in the Cartesian coordinate system
From this diagram, we can note the four points that make up this rectangle:
The lower left point is (5, 5)
The upper left point is (5, 10)
The upper right point is (15, 10)
The lower right point is (15, 5)
From these four points, we know the four edges of this rectangle:
The left edge has an X-coordinate of 5
The bottom edge has a Y-coordinate of 5
The right edge has an X-coordinate of 15
The top edge has a Y-coordinate of 10
From these four edges, we can determine the width and height of the rectangle
The width is: right edge - left edge = 15 - 5 = 10 units
The height is: top edge - bottom edge = 10 - 5 = 5 units
Finally, from the width and height, we can calculate the area and perimeter:
The area is: width x height = 5 x 10 = 50 units
The perimeter is: (2 x width) + (2 x height) = 2x5 + 2x10 = 30 units
Even though these calculations are fairly simple to do without a computer, I'm going to walk us through writing a program to help us calculate the area and perimeter of rectangles. Rectangles are used a lot in computer graphics, so this is more than just a trivial example used to demonstrate a point. I promise.
Area Calculations
Here is a simple program that calculates the area from the four edges of the rectangle:
Listing 1: main.c Area calculation
#include <stdio.h>
int main(int argc, const char * argv[])
{
float leftX = 5;
float bottomY = 5;
float rightX = 15;
float topY = 10;
float area;
area = (rightX - leftX) * (topY - bottomY);
printf("Area is: %.2f\n", area);
return 0;
}
I snuck in one shortcut that we haven't covered before. I've assigned a value to a variable in its declaration. This saves us four lines of code, since we do not need to assign the variables separately. When assigning a value to a variable in its declaration, it is called initializing a variable. Apart from this shortcut, the rest is straightforward and the output when run is:
Area is: 50.00
We know from our manual calculations that this is correct. Even though the area calculation is fairly simple, if we wanted to do multiple area calculations, we would put this into a function to avoid possible duplication errors:
Listing 2: main.c Area function
#include <stdio.h>
float rectangleArea(float leftX, float bottomY, float rightX, float topY)
{
return (rightX - leftX) * (topY - bottomY);
}
int main(int argc, const char * argv[])
{
float leftX1 = 5;
float bottomY1 = 5;
float rightX1 = 15;
float topY1 = 10;
float leftX2 = 0;
float bottomY2 = 0;
float rightX2 = 4;
float topY2 = 4;
float area;
area = rectangleArea(leftX1, bottomY1, rightX1, topY1);
printf("Area 1 is: %.2f\n", area);
area = rectangleArea(leftX2, bottomY2, rightX2, topY2);
printf("Area 2 is: %.2f\n", area);
return 0;
}
Providing Structure
In Listing 2, we have chosen to represent a rectangle as four different variables representing the four edges of a rectangle. It can get very cumbersome to declare four different variables for every rectangle we want to use. The C language provides a construct called structures that group together multiple variables into one package. Here's how we would declare a structure for our rectangle:
struct rectangle
{
float leftX;
float bottomY;
float rightX;
float topY;
};
We can declare a variable of this structure using the following syntax:
struct rectangle rectangle;
This declares a variable named rectangle that is of type struct rectangle. It may seem a little weird to have a structure and variable with the same name, but this is perfectly legal and actually quite common. This is legal because the full name of the structure is "struct rectangle" so it's not ambiguous what "rectangle" refers to.
To use the elements of the rectangle, such as leftX and bottomY, you would put a period or dot between them. Thus to set the four edges of our rectangle as in Figure 1, we would use the following code:
rectangle.leftX = 5;
rectangle.bottomY = 5;
rectangle.rightX = 15;
rectangle.topY = 10;
Even though we can access the individual elements, the structure as a whole can be passed around as a single unit. We can change our function signature to take this structure:
float rectangleArea(struct rectangle r)
{
return (r.rightX - r.leftX) * (r.topY - r.bottomY);
}
This function now takes a single argument, instead of four. Its type is struct rectangle. Also, the elements are accessed with dots, just as when we assigned to them. When we call this function, we pass just the structure variable:
area = rectangleArea(rectangle);
Packaging up our rectangle as a structure provides us a few benefits, but there is one final change I'd like to go over.
Type Definitions
Declaring structures is every so slightly different than declaring a variable as int or float. You need to include both words "struct rectangle" in front of the variable name. If you leave off the word struct, the compiler will give you an error. However, the C language allows us to create our own types that are even more similar to internal types, such as int or float. We can define our own types using the typedef keyword. Thus, we could include this line after our structure declaration to define our own type called Rectangle:
typedef struct rectangle Rectangle;
The typedef keyword makes a new type called Rectangle that is an alias for the type struct rectangle. This allows us to use the word Rectangle to declare variables and function arguments, instead of the more verbose struct rectangle:
Rectangle rectangle;
I vastly prefer using typedefs for structures. I find it makes the resulting code clearer with less extraneous words and is less error prone since I don't have to remember to use the struct keyword. I also like to use a capital letter for new type names. This simple convention makes it clear which words are types and which are variables without much thought. This is very handy when looking at code someone else wrote or even code you wrote a while back.
Because using a typedef with a structure is such a common idiom, we can combine the structure definition and type definition into one:
typedef struct
{
float leftX;
float bottomY;
float rightX;
float topY;
} Rectangle;
Because we are using a typedef, we don't need to give the structure itself a name (though that is legal). Combining this technique we can re-write Listing 2 as:
Listing 3: main.c Rectangle structure
#include <stdio.h>
typedef struct
{
float leftX;
float bottomY;
float rightX;
float topY;
} Rectangle;
float rectangleArea(Rectangle r)
{
return (r.rightX - r.leftX) * (r.topY - r.bottomY);
}
int main(int argc, const char * argv[])
{
Rectangle rectangle1;
Rectangle rectangle2;
float area;
rectangle1.leftX = 5;
rectangle1.bottomY = 5;
rectangle1.rightX = 15;
rectangle1.topY = 10;
rectangle2.leftX = 0;
rectangle2.bottomY = 0;
rectangle2.rightX = 4;
rectangle2.topY = 4;
area = rectangleArea(rectangle1);
printf("Area 1 is: %.2f\n", area);
area = rectangleArea(rectangle2);
printf("Area 2 is: %.2f\n", area);
return 0;
}
We now have a rectangle structure and a function that uses the structure. This combination of a structure and functions that manipulate that structure is called a data structure. Thus, we have begun to write a rectangle data structure.
Multiple Source Files
So far, we have only used one source file, named main.c. But as programs get larger, you will want to split your program into multiple source files. This helps organize your code into logical blocks, and allows you to reuse code in multiple source files. For example, now that we have a data structure to represent a rectangle, we may want to use this in other parts of our program. The best way to solve this problem is by putting rectangleArea in its own source file.
When using multiple source files, the compiler combines all functions in all source files together in one program. The program starts execution with the main function. There is one catch: each function name must be unique. This means that there can be only one rectangleArea function definition across all source files. If two different source files define the same function, you will get a compiler error.
We can put the rectangleArea function in its own source file, say rectangle.c, but then we need a way for other source files, like main.c, to be able to use it. If we just try and call it inside main.c, the compiler will complain that it does not know about a type named Rectangle and a function named rectangleArea. You've already seen and used the solution to this problem: header files.
We've used header files to give us access to system functions like printf and malloc, but header files are not anything magic. We can easily create our own. To tell the compiler that a function named rectangleArea exists, we put this code into a header file, for example, rectangle.h:
Listing 4: rectangle.h
typedef struct
{
float leftX;
float bottomY;
float rightX;
float topY;
} Rectangle;
float rectangleArea(Rectangle r);
This code looks very similar to the code before the main function in Listing 3. It contains the structure declaration, just as before, but we do not include the body of the rectangleArea function. This body-less function is called a function declaration. It tells the compiler that there is function named rectangleArea somewhere in our program, and it shouldn't complain if someone tries to call it. You can declare a function multiple times, but you must define it only once. Of course we do need to put the body of the function somewhere, so we put it in it's own source file, named rectangle.c:
Listing 5: rectangle.c
#include "rectangle.h"
float rectangleArea(Rectangle r)
{
return (r.rightX - r.leftX) * (r.topY - r.bottomY);
}
This contains our function definition, with the full body, but it also starts off with a #include line. This is necessary because the header contains the structure and type definition for Rectangle. Without the header file, the compiler wouldn't know what Rectangle was. This line is also similar to how we include the stdio.h header file, for printf. You'll notice that double quotes ("rectangle.h") are used instead of angle brackets (<stdio.h>). The general rule is that angle brackets are used to include system header files where double quotes are used to include user defined header files.
I haven't told you how to actually create a new source and header file in Xcode, but the procedure is quite painless. As you have probably noticed, on the left hand side of the Xcode window, you will see a Groups & Files list. If you open the disclosure triangles, you will see the various files of your project. In the Source group, you will see the file named main.c that we have been using so far.
Figure 2: Groups & Files list
To create a new source file, make sure that the Source group is highlighted as in Figure 2, because this is where we want the new source file to be placed. Then, select New File... from the File menu. This will open up a New File dialog box, as shown in Figure 3. Choose C File, under the BSD category, and click Next. Now you are prompted to enter the name of the new file, as shown in Figure 4, so type rectangle.c. You'll notice that by default Xcode will automatically create a header file named rectangle.h for you. Since this is what we want, leave that checked and click Finish. You should see two files added to your Sources group on the left hand side, as in Figure 5.
Figure 3: New File dialog
Figure 4: New C File dialog
Figure 5: New files added to Source group
If you click on these new files, your editor window will allow you to edit the contents of the selected file. Go ahead and edit rectangle.h and rectangle.c to match Listing 4 and Listing 5, respectively. Finally, we need to change main.c to use our new header file. We can replace the structure and function definition with the #include statement, as in Listing 6.
Listing 6: main.c using rectangle.h
#include <stdio.h>
#include "rectangle.h"
int main(int argc, const char * argv[])
{
Rectangle rectangle1;
Rectangle rectangle2;
float area;
rectangle1.leftX = 5;
rectangle1.bottomY = 5;
rectangle1.rightX = 15;
rectangle1.topY = 10;
rectangle2.leftX = 0;
rectangle2.bottomY = 0;
rectangle2.rightX = 4;
rectangle2.topY = 4;
area = rectangleArea(rectangle1);
printf("Area 1 is: %.2f\n", area);
area = rectangleArea(rectangle2);
printf("Area 2 is: %.2f\n", area);
return 0;
}
With our rectangle structure and function in its own header file, we can use it in other source files besides main.c, too. All we would need to do is include rectangle.h in this other file, and it could also calculate the area of rectangles. Because the code in rectangle.h allows other code to use our data structure, it is called the interface. And because the code in rectangle.c is where the actual function definition is, it is called the implementation. Separating the interface from the implementation gives us a lot of flexibility to reuse common code in different parts of the program.
Add Perimeter Calculations
With our reusable rectangle data structure in place, we can now add other functions that operate on the Rectangle structure. For example, to add a function that calculates the perimeter, modify rectangle.h to match Listing 7 and rectangle.c to match Listing 8.
Listing 7: rectangle.h with rectanglePerimeter
typedef struct
{
float leftX;
float bottomY;
float rightX;
float topY;
} Rectangle;
float rectangleArea(Rectangle r);
float rectanglePerimeter(Rectangle r);
Listing 8: rectangle.c with rectanglePerimeter
#include "rectangle.h"
float rectangleArea(Rectangle r)
{
return (r.rightX - r.leftX) * (r.topY - r.bottomY);
}
float rectanglePerimeter(Rectangle r)
{
return 2*(r.rightX - r.leftX) + 2*(r.topY - r.bottomY);
}
This adds a declaration for the rectanglePerimeter function to the interface and its definition to the implementation. An example of how we could use this in main.c is Listing 9.
Listing 9: main.c using rectanglePerimeter
#include <stdio.h>
#include "rectangle.h"
int main(int argc, const char * argv[])
{
Rectangle rectangle;
rectangle.leftX = 5;
rectangle.bottomY = 5;
rectangle.rightX = 15;
rectangle.topY = 10;
printf("Area is: %.2f\n", rectangleArea(rectangle));
printf("Perimeter is: %.2f\n", rectanglePerimeter(rectangle));
return 0;
}
If we were to run this program, the output would be:
Area is: 50.00
Perimeter is: 30.00
With the concepts of reusable data structures added to your toolbox, you are now one step closer to object-oriented programming. While the idea of reusable data structures in not unique to object-oriented programming, it is a primary concept. When we write our first Objective-C code next month, you will see what I mean.
Encapsulation
Even though our structure stores the four edges, both our area and perimeter calculation use the width and height of the rectangle. We may want to change our structure to store the lower left corner (leftX and bottomY) along with the width and height to make these calculations easier:
typedef struct
{
float leftX;
float bottomY;
float width;
float height;
} Rectangle;
This still allows us to represent any geometric rectangle, but we avoid recalculating the width and height over and over again. Now, we can re-write our functions to use these new structure elements:
float rectangleArea(Rectangle r)
{
return r.width * r.height;
}
float rectanglePerimeter(Rectangle r)
{
return (2*r.width) + (2*r.height);
}
There, much simpler! But a side effect of this is that we just broke our code in main.c. It's still trying to set rightX and topY, which no longer exist. What if we still want to create our rectangle using the edges? Rather than changing main to use width and height, let's create a new function that takes the four edges and initializes the new structure elements:
Rectangle rectangleInitWithEdges(float leftX, float bottomY,
float rightX, float topY)
{
Rectangle r;
r.leftX = leftX;
r.bottomY = bottomY;
r.width = rightX - leftX;
r.height = topY - bottomY;
return r;
}
This means we have to change our main function as follows:
int main(int argc, const char * argv[])
{
Rectangle rectangle;
rectangle = rectangleInitWithEdges(5, 5, 15, 10);
printf("Area is: %.2f\n", rectangleArea(rectangle));
printf("Perimeter is: %.2f\n", rectanglePerimeter(rectangle));
return 0;
}
This now gives us the best of both worlds. We can still create rectangle structures using the four edges, but we can more naturally calculate the area and perimeter. The height and width are calculated only once inside rectangleInitWithEdges. Notice that the main function now never accesses the structure's elements directly. It only uses functions defined in the interface to manipulate the structure. The implementation is now the only code that accesses the structure's elements. This kind of code organization, where only the implementation accesses a structure's elements is called encapsulation.
Encapsulation is a very good goal of software design. It gives data structure writers more freedom to implement their data structure without affecting the users of the data structure. In fact, because our rectangle data structure now uses encapsulation, we can change the structure back to the original with four edges, and our code in main.c will stay exactly the same! We would have to change our area and perimeter calculations, but those are internal details.
The truth of the matter is that programs are constantly changing over time. The programs we have been writing so far have been fairly trivial. But in real programs, you will be constantly making modifications, either to add features or fix bugs. Whenever you make changes, there is the possibility of introducing new bugs. Limiting the scope of those changes will make sure you introduce as few bugs as possible. It will also allow you to make changes faster, since less code will need to be modified.
Modifying Rectangles
Okay, so if users of rectangles should no longer access the structure directly, what's the proper way to modify the rectangle? To preserve our encapsulation, we should add functions to the interface that do this. If we wanted to change the right edge of our rectangle, we could write a function like this:
void rectangleSetRightX(Rectangle r, float rightX)
{
r.width = rightX - r.leftX;
}
Unfortunately, this will not work. Remember from our article on pointers that function arguments are completely separate variables from the ones passed in. The same holds true even for structures. The compiler will make a copy of the rectangle structure, and this function operates on the copy of the rectangle. When function arguments are copied like this, it is called passing by value. To solve this, we can pass a pointer to the rectangle, which is called passing by reference, as such:
void rectangleSetRightX(Rectangle * r, float rightX)
{
(*r).width = rightX - (*r).leftX;
}
When we change the first function argument to a pointer, we need to dereference the pointer inside the function. Thus, the syntax "(*r).width" dereferences the pointer and then accesses the structure element. When dereferencing pointers to structures, it is important to use parentheses around the pointer dereference, to avoid any ambiguity of what the star means. Because it is cumbersome to use parentheses, the C language provides a shortcut syntax for dereferencing structure pointers using an arrow syntax:
void rectangleSetRightX(Rectangle * r, float rightX)
{
r->width = rightX - r->leftX;
}
The arrow syntax, which is really two characters, a dash followed by the greater than sign, "->", allows us to dereference a pointer to a structure and access an element using a cleaner syntax. With this modification, the function declarations in our header file become:
Rectangle rectangleInitWithEdges(float leftX, float bottomY,
float rightX, float topY);
void rectangleSetRightX(Rectangle * r, float rightX);
float rectangleArea(Rectangle r);
float rectanglePerimeter(Rectangle r);
Notice that we sometimes pass the rectangle by value and sometimes by reference. This can be confusing to the user of our data structure. Since we must use a pointer for rectangleSetRightX, I'm going to change all of our functions to use pointers, for consistency:
void rectangleInitWithEdges(Rectangle * r,
float leftX, float bottomY, float rightX, float topY);
void rectangleSetRightX(Rectangle * r, float rightX);
float rectangleArea(Rectangle * r);
float rectanglePerimeter(Rectangle * r);
I also took the liberty of changing rectangleInitWithEdges to take a pointer to a Rectangle, too. This is also more consistent with our other functions. In fact, now every function takes a Rectangle * as its first argument. This means we have to change main.c to use the ampersand (address of) operator:
Listing 10: main.c using pointers
#include <stdio.h>
#include "rectangle.h"
int main(int argc, const char * argv[])
{
Rectangle rectangle;
rectangleInitWithEdges(&rectangle, 5, 5, 15, 10);
printf("Area is: %.2f\n", rectangleArea(&rectangle));
printf("Perimiter is: %.2f\n", rectanglePerimeter(&rectangle));
rectangleSetRightX(&rectangle, 20);
printf("Area is: %.2f\n", rectangleArea(&rectangle));
printf("Perimeter is: %.2f\n", rectanglePerimeter(&rectangle));
return 0;
}
Our code is now very consistent looking. It also allows us to use dynamic memory allocation for the rectangle, too. If we wanted to only use pointers in our application, we could re-write this using:
Listing 11: main.c using dynamic memory allocation
#include <stdio.h>
#include <stdlib.h>
#include "rectangle.h"
int main(int argc, const char * argv[])
{
Rectangle * rectangle;
rectangle = malloc(sizeof(Rectangle));
rectangleInitWithEdges(rectangle, 5, 5, 15, 10);
printf("Area is: %.2f\n", rectangleArea(rectangle));
printf("Perimeter is: %.2f\n", rectanglePerimeter(rectangle));
rectangleSetRightX(rectangle, 20);
printf("Area is: %.2f\n", rectangleArea(rectangle));
printf("Perimiter is: %.2f\n", rectanglePerimeter(rectangle));
free(rectangle);
return 0;
}
We must now allocate the memory before initializing it using malloc, and then return the memory to the system when we are done using free to avoid a memory leak. Other than that, our code is identical to Listing 10. Notice that sizeof works on structures, too. Here are the complete final listings for rectangle.h and rectangle.c just to make sure we're looking at the whole picture:
Listing 12: Final rectangle.h
typedef struct
{
float leftX;
float bottomY;
float width;
float height;
} Rectangle;
void rectangleInitWithEdges(Rectangle * r,
float leftX, float bottomY, float rightX, float topY);
void rectangleSetRightX(Rectangle * r, float rightX);
float rectangleArea(Rectangle * r);
float rectanglePerimeter(Rectangle * r);
Listing 13: Final rectangle.c
#include "rectangle.h"
void rectangleInitWithEdges(Rectangle * r,
float leftX, float bottomY, float rightX, float topY)
{
r->leftX = leftX;
r->bottomY = bottomY;
r->width = rightX - leftX;
r->height = topY - bottomY;
}
void rectangleSetRightX(Rectangle * r, float rightX)
{
r->width = rightX - r->leftX;
}
float rectangleArea(Rectangle * r)
{
return r->width * r->height;
}
float rectanglePerimeter(Rectangle * r)
{
return (2*r->width) + (2*r->height);
}
Conclusion
In summary, we've written a small rectangle data structure with its own interface and implementation files using proper encapsulation. Users of our data structure can use the interface functions to access properties of the rectangle, without accessing its internal structure elements. Combining structures and functions into reusable data structures with separate interface and implementation files like this is what object-oriented code is all about. So congratulations! You've actually been writing object-oriented code! Well, there is a bit more to object-oriented code than this, but we've learned about 75% of what makes object-oriented code so special. Next month, we will finally learn some Objective-C code, and you will see how Objective-C makes it even easier to write object-oriented code.