C++ Overview
Volume Number: | | 5
|
Issue Number: | | 9
|
Column Tag: | | Jörg's Folder
|
C++ Overview
By Jörg Langowski, MacTutor Editorial Staff
An Overview Of C++
MacHack 89 brought me not only a colorful set of screwdrivers from APDA, but also a new assignment: yours truly is supposed to run a tutorial column on C++. C++ is a very interesting programming language. Supposedly, the new Finder was written in it. Also, it exists on a couple of Unix systems. In fact, shopping for a Unix system, we recently met a representative who assured us that C++ would be delivered with the system. Jordan Matthews, and other people from Apple, spoke very highly about C++ at the MacHack, and assured us that we would get our fingers on a pre-release of Apples C++ for MPW, supposedly to be delivered by the end of this year.
As you might have guessed, Apple hasnt sent us the pre-release yet, and I have yet to use a working C++ compiler. So far my only hands-on experience is Bjarne Stroustrups book, The C++ Programming Language (Addison-Wesley 1987), which I highly recommend.
The style of the book is rather terse, and you have to work your way through. A good example is that after the introduction, not much is said about object-oriented programming, until you hit page 213:
A list specified in terms of pointers to a class can hold objects of any class derived from that class. That is, it may be heterogeneous. This is probably the single most important and useful aspect of derived classes, and it is essential in the style of programming presented in the following example. That style of programming is often called object based or object oriented; it relies on operations applied in a uniform manner to objects on heterogeneous lists.
So now you know what youve really done when you used MacApp The fact that Stroustrup refers to object-oriented programming in this rather abstract way made me dig out an old introduction to Simula 67, which was the first language to introduce object-oriented concepts. There, too (the book dates from 1973) no reference is made to OOP as we know it today. All the important constructs - classes, instances, methods, overriding - are already there, and one could have implemented todays programming style in Simula; only computers were much smaller, and most programs did not demand OOP concepts.
The C++ Design
Stroustrups team designed C++ for dealing with simulation problems not unlike those that Simula was developed for. However, C++ is a much broader concept than simply a set of object-oriented extensions to C; it is a redesign of the C language. To use another quotation from Stroustrups book, C++ was designed to enable larger programs to be structured in a rational way so that it would not be unreasonable for a single person to deal with 25,000 lines of code. To achieve such an ambitious goal, the most important point is to allow the user to extend the language to accommodate new shorthand notations for things that have to be done over and over again. For instance, in a program that uses matrix algebra, given the 25-row by 35-column matrix C and 25-row by 15-column matrix B,
A = ^B*C
is much easier to read than
matmul (A,transpose(B),C,15,25,35).
To be able to use such a shorthand for matrix multiplication, we would need two features built into the language: a. data structures that carry additional information, such as row and column size for a matrix, but which is normally hidden to the user; and b. the capability to redefine operators - like * or ^ - depending on the context in which they are used. The latter feature would then cause a * to behave differently depending on whether it is used to multiply two integers, reals, vectors or matrices. Some of this behavior is already built into most compilers: integer and real multiply generate different code. But this behavior cannot be modified. C++ allows you to modify your operators in any odd way.
Classes
Lets assume we wanted to define an array structure, matrix, whose size is not defined at compile time and for which space will be dynamically allocated at run time. In C, one might write
typedef struct matrix
{
int rows, cols;
int *m; /* pointer to matrix data */
}
matrix a
and then write an indexing function elem(i,j) which refers to the (i,j)th element of the matrix a by
*(a.m + 2*(i*a.cols + j)).
Of course, it would be much easier to simply define a two-dimensional array and write a[i][j], but lets stay with this definition for a while; unlike the usual array definition, this matrix is resizeable and space is allocated dynamically at run time. We would have to find a block of memory to hold the matrix data and put a pointer to it in m.
When we access an array, we are often not interested in its actual dimensions, as long as the indices are not out of range. In C++ we can define the matrix type in such a way that only certain functions have access to information private to the array (such as its dimensions), and all access to the arrays data is done through these access functions, called methods. Data structures that may carry private information are called classes in C++. (According to the manual, classes are user-defined types - the most general definition that one might imagine!). A class is just like a struct in which some of the fields cannot be seen from the outside, and in which the interface to these private fields is defined through method declarations. The C++ class definition for the matrix type would look very similar to a struct declaration, with some additions. The syntax of the class declaration is:
/* 1 */
class matrix
{
int rows, cols;
int *m; /* pointer to matrix data */
public:
int rowsize() { return rows}
int colsize() { return cols }
void set_size(int,int);
int& elem(int,int);
matrix(int,int);
~matrix;
}
matrix a
Those of you who have had some experience with NEON [lets make the point again that it is a shame that NEON has disappeared ] might recall that its class definition looked similar:
:class matrix <super object
2 <indexed
int rows
int cols
:M rowsize ;M
etc
;class
NEON, however, did not have the label public: for separating the private and public parts of the class declaration. In NEON, all variables were private and all the methods were public.
The C++ class declaration is similar to a C struct declaration, with the possibility to include functions and to hide parts of the declaration from the outside. The public functions in a class that constitute the interface to the outside world are called methods.
There are two principal ways to define a method. One can write the method code inside the class declaration (as for rowsize and colsize in the example above), or one can just declare the method and write the method code later, as for set_size or elem. elem returns the reference to an integer that is the (i,j)th element of the matrix and might be defined as follows:
/* 3 */
int& matrix::elem(int i, int j) { return m[i*cols + j] };
There is a fundamental difference between methods defined inside and outside of a class declaration. The methods defined outside will be called through a subroutine call, while inside-defined methods are inline-expanded by the compiler. Writing a.rowsize will not generate a JSR to the function code, but code that will directly reference the hidden field a.row. However, any method that is defined outside a class declaration may also be defined as an inline method by prefixing it with the keyword inline:
/* 4 */
inline int& matrix::elem(int i, int j) { return m[i*cols + j] };
There are two more special methods in the class declaration which carry the name of the class, or respectively the class name prefixed with a tilde (~). These are the so-called constructor and destructor methods; they are called when a new object is declared (as in matrix a;) or deleted (when one leaves the block that the object was declared in). Constructors and destructors are important when heap space has to be allocated for an object (our matrix will need it) and deallocated when the object is no longer defined.
Operators
Our dynamically sized matrix might be defined in a slightly different way which allows to access the elements in the usual way, writing a[i][j] instead of a.elem(i,j). One first defines a one-dimensional array class (as in Stroustrups book):
/* 5 */
class vector
{
int* v;
int sz;
public:
vector(int); ~vector();
int size () { return sz; }
void set_size(int);
int& operator[](int);
int& elem(int i) { return v[i] };
};
and then builds the two-dimensional class on top of it:
/* 6 */
class matrix : vector
{
vector*& mv;
int rows, cols;
public:
matrix(int,int); ~matrix;
int rowsize () { return rows; }
int colsize () { return cols; }
void set_size(int,int);
vector*& operator[](int);
int& elem(int i, int j) { return mv[i][j] };
};
(I hope this is approximately correct while Im waiting for the C++ system to try this out and get ready for your embarrassing remarks). In the program, one would declare matrix a(10,20) and access the (i,j)th element by writing a[i][j]. The array indexing operator [] has been re-declared in the class declaration, and will now support checking of index bounds, if we wish so.
The actual implementation of the operators has of course to be done separately. We would write
/* 6 */
int& vector::operator[](int i) { /* body of code */ }
and
/* 7 */
vector*& matrix::operator[](int i) { /* body of code */ }
to implement the new definitions.
The matrix multiplication operator may now be defined easily. We write
/* 8 */
matrix operator*(matrix& a, matrix& b)
{
matrix c(a.colsize,b.rowsize);
if (a.rowsize != b.colsize) error index mismatch;
for (int i=1 ; i<a.colsize ; i++)
for (int j=1 ; j<b.rowsize ; j++)
{
int sum = 0;
for (int k=1 ; k<a.rowsize ; k++)
sum = sum + a[i][k]*b[k][j];
c[i][j] = sum;
};
return c;
};
Again, I hope this would work in an actual example. It is not the most efficient way to program the matrix multiplication; the good way to do it would be using friend definitions. This concept is explained in Stroustrups book, and Im going to come back to it in the next column, where I can supply some examples.
The expression a*b, where a and b are of type matrix, would return a pointer to another object of class matrix, which contains the product of a and b. To make sense of the expression c = ^a*b, we would also have to define the transpose operator, ^, and the assignment operator, =. I wont write these definitions down here; you might try to work them out, or better, test them if you have a C++ system available.
Operator redefinition is one of the most important concepts of C++, since it makes the code much more readable. The redefinition of an existing operator (like +, *, etc.) is called operator overloading; when such a redefined operator is used, the compiler will automatically search the existing definitions to find one that works on the data types provided. Thus, even if one redefined * for matrices, integer and real multiplications would still work as before. I have not found out yet whether dynamic binding is possible for operators by declaring them virtual (see below), but Im sure Ill soon be able to test that.
Class Hierarchies
We have seen the syntax of a class definition which was derived from another class, class matrix : vector { }. If we define a derived class this way, none of the methods in the superclass will be accessible through an object of the subclass; all subclass methods have to be explicitly defined in the subclass declaration. If we write, on the other hand, class matrix : public vector { }, any method from class vector that is not redefined in class matrix is usable on objects of class matrix as well. This is the way we very often wish objects to behave; methods that are redefined in a subclass should override the superclass definition, but if an object does not know about a method it should look for a definition higher up in the hierarchy.
In a class hierarchy we should therefore be able to apply a method to an arbitrary object whose exact type is not known at compile time. If the objects type is known at compile time, the compiler will simply generate a JSR to the appropriate method code, passing arguments as required. This is known as early binding in object-oriented jargon. If the type is not known, we must check at run time what type of object is given the method call, and see whether the method is defined in the objects class declaration or somewhere higher up in the hierarchy. This is called late binding; a run time error message will be generated if the method cant be found for a particular object.
Late binding is important if we have a list of objects to which the same method should be applied, for instance a list of shapes - rectangles, circles, polygons - to be drawn on a screen. If the list is kept in an array shapelist[i], we could then simply write
/* 9 */
for (i=1;i<=N;i++) shapelist[i].draw;
to draw all the objects. This is very similar to Object Pascal, where we would write analogously
{10}
for i :=1 to N do shapelist[i].draw;
However, in Object Pascal late binding is always used when early binding cant be applied. In C++, we have to tell the compiler that a method could be used for late binding by declaring it virtual:
/* 11 */
class TShape {
TShape* Next, Prev;
Rect boundRect;
RgnHandle ShapeRgn;
public:
virtual void Create(rect *theRect);
virtual void Track(rect *oldRect,*newRect);
virtual void Draw();
virtual void Erase();
virtual void Free();
}
This is the generic definition of a shape for which methods for drawing, erasing, etc. exist, but may or may not be defined in the top class; they may be overridden in the descendant classes, and the actual binding may be known only at run time. The figure illustrates the definition of a class hierarchy of shapes in C++ and in Object Pascal.
This more or less concludes my quick overview of the main characteristics of C++ (of course, all the features of C are still present in the language). Dont laugh at the mistakes that are probably still in the examples; this happens when one writes programs without a compiler. There are many details I havent gone into here; well get to know them in the following columns, with corresponding examples. Forth friends, dont despair; youll get your share soon again, too.