Nov 01 Database Design
Volume Number: 17 (2001)
Issue Number: 11
Column Tag: Database Programming
An Introduction to Relational Database Design and SQL
by Paul Shields
Introduction
Databases have become a fundamental requirement of managing almost any business. Everyone needs to store some type of data and a well designed and managed database can make tracking and manipulating this data trivial. Building simple databases is easy, especially with the GUI tools provided by most database vendors. As our databases get more complex however, we need to develop a better understanding of database design principles. Poor database design can lead to the storage of large amount of duplicate data, data integrity problems caused by the accidental deletion of critical information, and difficulties in inserting new data into the database.
Once we have a complete database design, we need tools to manage and manipulate the data and database structure. SQL (Structured Query Language) is just such a language. The advantage of SQL is that it is designed to be platform and database independent, allowing you to easily move data between different vendor products. SQL is designed from the ground up to be optimized for manipulating relational databases, thus much of its power comes from being combined with a truly relational database. Not all relational databases include SQL support though, so don't assume that its absence is an indication of a databases internal capabilities.
Relational Databases
To better understand database designs and eventually SQL command structure, we should first consider the principles of relational database design. There is a lot of disagreement over all of the exact definition and feature set of a relational database. Here we define some of the most basic requirements and features that are common to most relational databases.
In over-simplified terms, a relational database is a database where all data visible to the user is organized strictly as tables of data values and all database operations work on those tables. This definition serves as an introduction to several terms common in the database world.
Column/Entity
A column is a collection of values with a common data type and definition. The range of values associated with a single column is known as the domain. There are two components to the domain, the physical definition and logical constraints. The physical definition states the generic type of data such as an integer, or string. The logical constraints are arbitrary constraints imposed by the database designer such as a valid integer between 1 and 100.
If we were to create a table to store the state component of a person's address, the domain would be a string that matches any of the 50 states.
Tables
A table is a collection of columns brought together based on some common grouping. A table can contain one or more columns. Each table in a database is assigned a unique name so that it can be referenced using SQL or other scripting tools. The rules associated with a table are as follows:
- Each row defines a single record in the table
- Each column represents one item of data stored about each row/record
- Each column in a table is assigned a unique name
- There is no standard limit on number of columns per table, but the norm is 255
- Rows are in no particular order and there are no constraints on the number of rows in a table
- Listing 1 shows a simple table that has two columns, FirstName and LastName.
Listing 1: Simple Table Definition
FirstName Lastname
Mary Jones
John Smith
Larry Stone
Primary key
Since the rows have no specific order, you cannot refer to them by location (first, last, third, etc.) This creates a problem if we want to reference a specific row of the table. In a well-designed database, every table has a column or columns marked as the primary key. The primary key is a special designation that ensures that no two rows have the same set of values. We can use the value stored in the primary key to find and extract the data from a specific row of the table
Primary keys come in many forms. When working with a table that describes people (such as an employee or customer table), we most often assign a unique employee or customer number. Our table would now look like the one in Listing 2:
Listing 2: Simple Table with a Primary Key
EmployeeID FirstName Lastname
12 Mary Jones
2 John Smith
34 Larry Stone
Relationships
In a relational database there is no ability to define explicit pointers, creating a parent/child relationship. Instead, relationships are defined by the sharing of common data. The data values in one column of Table 1 are used as the domain for the values in a column in Table 2. The value in Table 2 can now be used to reference information from Table 1.
Looking at the tables in listing 3 we can see that in the students table we store a list of students and the activities in which they participate. We can use the value in the Activity field of the student table to look up information on the activity, such as cost, in the Activities table. The data is still stored in two locations, we have simply defined a relation between the tables that says the Activity column in the students table relates to the Activity column in the Activities table.
Listing 3: Two tables with a defined relationship
Here we have two tables; one holds student information and the other information on activities at school.
Table 1: Students
StudentID
FirstName
LastName
Activity
Table 2: Activities
Activity
Cost
Location
Foreign key
The definition of a foreign key is a combination of relations and primary keys. If the values in a column from one table match the values of a primary key column in another table, the first is called a foreign key reference. If we were to modify the table definition in listing 3 to note that the Activity field of the Activities table is a primary key, then the reference from the Students table would be known as a foreign key reference.
A foreign key reference creates a parent/child relationship between tables. A table can have more than one foreign key to define relationships with multiple tables. Foreign keys are used to reduce the amount of data stored in a database. We can now use the values stored in one column to look up information in another table. This reduces duplication of data across tables. Duplication can lead to issues in database management, most significant of which is data integrity, the process of ensuring that the data is valid. These issues are discussed further in the section on data normalization.
Codd's twelve rules
In a 1985 article in the magazine ComputerWorld, Ted Codd presented 12 rules that a database must obey if it is to be considered truly relational. These rules have become the basis for most definitions of relational databases. Few databases comply with every rule in the list, but they represent a good starting point for understanding the core features of a relational database.
- The Information Rule: All information in a relational database is represented explicitly at the logical level in exactly one way—by values in tables.
- Guaranteed Access Rule: Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a table name, primary key value, and column name.
- Systematic Treatment of Null Values: Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type.
- Dynamic On-line Catalog Based on the Relational Model: The database description is represented at the logical level in the same way as ordinary data, so authorized users can apply the same relational language to its interrogation as they apply to regular data.
- Comprehensive Data Sublanguage: A relational system may support several languages and various modes of terminal use (for example, the fill-in-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and whose ability to support all of the following is comprehensible: data definition, view definition, data manipulation (interactive and by program), integrity constraints, and transaction boundaries (begin, commit, and rollback).
- View Updating Rule: All views that are theoretically updateable are also updateable by the system.
- High-level Insert, Update, and Delete: The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data.
- Physical Data Independence: Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods.
- Logical Data Independence: Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
- Integrity Independence: Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.
- A minimum of the following two integrity constraints must be supported:
- Entity integrity: No component of a primary key is allowed to have a null value.
- Referential integrity: For each distinct non-null foreign key value in a relational database, there must exist a matching primary key value from the same domain.
- Distribution Independence: A relational DBMS has distribution independence. Distribution independence implies that users should not have to be aware of whether a database is distributed.
- Non-subversion Rule: If a relational system has a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-time) relational language.
There is an implied rule in this definition. "For any system that is claimed to be a relational database management system, that system must be able to manage data entirely through its relational capabilities." No currently shipping commercial database claims to meet all 12 rules, yet we can safely assume they are relational and can satisfy our needs.
Relations and Normalization
The discussion of relations and foreign keys leads directly to the next major topic in database design, normalization. Normalization is the process of eliminating duplicates, validating data integrity, and defining relationships between tables. There are several levels to database normalization; each one designed to solve a different set of problems. It is absolutely critical that you normalize your database design before creating any tables as it is very difficult to change a database structure once filled with data.
When designing a database, the first pass design is most likely a collection of columns/entities based on existing forms or processes. This leaves us with a collection of tables and columns whose relationships may be poorly defined. Poorly defined relationships can lead to modification anomalies and data integrity issues. The process of normalization is designed to eliminate such anomalies.
For an example of a relationship with modification anomalies, refer to Listing 4. Here we define a student-activity table designed to track student participation in various school activities along with the fees associated with the activity. The problem with this table is that when we insert data into the table we must update two different relations, the student-activity relation and the activity-cost relation. If we delete a student, we not only delete the activity associated with the student, but potentially the information on how much the activity cost. This is known as a deletion anomaly.
Listing 4: Student activity information
A table to track student/activity participation, along with the fee associated with each activity.
Primary key: (StudentID, Activity)
Table 1: Student-Activity
StudentID Activity Fee
Jogging $100
Theatre $150
300 Skiing $300
Suppose we want to add in the cost of a new activity such as canoeing that costs $75. We cannot add this information unless we also have a student who wants to participate in the activity. This is called an insertion anomaly.
Normalization is the process of redefining these relationships to eliminate the modification anomalies. There are several levels of normalization that we will cover in detail, along with a few derivative techniques based on the mathematics of relations (well beyond the scope of this article). Each level of normalization eliminates a specific problem. One final issue to consider before diving into normalization is that every time we break a relationship into multiple components, we may have to introduce relational integrity constraints. Such constraints may impose limitations on our ability to insert data into a specific table. These restraints can be part of the underlying data definition or managed at the application level.
Another term to be familiar with is functional dependency. Functional dependency is a relationship between or among entities (columns) of a database. If we have two entities, x and y, y is functionally dependent on x, if the value of x determines the value of y. The common notation for a functional dependency is:
x->y
In this case, x is known as the determinant, because x determines the value of y. Functional dependencies can exist between group of entities. For example,
(x,y)->z
z->(y,z)
In the first example, we cannot subdivide the functional dependency because neither x nor y can determine the value of z on their own. In the latter example we can subdivide the functional dependency into x->y and x->z since x determines the value of both y and z.
First Normal Form (1NF)
Any table of data that meets the definition of a relation is considered to be in 1NF. Remember the definition of a relation is:
- All columns must be single value fields
- All entries within a single column must be of the same data type
- No two rows may be identical
This definition covers just about any grouping of data columns that includes a primary key to ensure uniqueness of the rows.
Second Normal Form (2NF)
A relation is in second normal form if all non-key entities are dependent on all key entities. If a table has only a single column as the primary key, it is by definition already in 2NF. This is because all of the other columns are dependent upon the single key column. For those tables with composite keys, we need to ensure that there are no partial dependencies.
Consider the table in listing 4. The column that contains the fee information is dependent upon the composite key of StudentID and Activity. The reality though is that the fee is only partially dependent on the key. The activity is what determines the fee each student pays. There is no dependency between the fee and the StudentID. This is called a partial dependency and 2NF eliminates such anomalies.
The easiest way to solve a problem like this is to create a second relation. In this case, we create a second table to store the Activity-Fee information (listing 5).
Listing 5: Student activity information as two relations
Table 1: Student-Activity
StudentID(key) Activity
100 Jogging
200 Theatre
300 Skiing
Table 2: Activity Fee
Activity(key) Fee
Jogging $100
Theatre $150
Skiing $300
Third Normal Form (3NF)
To understand 3NF, consider the example in listing 6, a table for storing student housing information. In this relation we have a functional dependency of StudentID->Building because a student can only live in one place at a time. We also have a functional dependency of Building->Cost since each dorm room has a fixed cost based on the building. This creates a transitive dependency of StudentID->Building->Cost, whereby StudentID determine the value in the Cost column.
Listing 6: Student housing information
A table containing student housing information.
Primary key: StudentID
Table 1: Student-Housing
StudentID Building Cost
100 Crockett Hall $2100
200 Davis Hall $2150
300 Hall Hall $1300
The table in listing 6 is in 2NF, but still has problems. If we were to delete an entry, we not only delete the information for that student, but information on how much it cost to live in a particular dorm. For example if we delete the first row, we lose the information on how much Crockett Hall cost because there is no other student in the table living in that dorm. This is a case of the deletion anomaly mentioned earlier.
A relation is in third normal form if it is in second normal form and has no transitive dependencies. Again, we can address this issue by breaking the relation into two relations as shown in listing 7.
Listing 7: Student activity information as two relations
Table 1: Student-Building
StudentID(key) Building
100 Crockett Hall
200 Davis Hall
300 Hall Hall
Table 2: Building-Cost
Building(key) Cost
Crockett Hall $2100
Davis Hall $2150
Hall Hall $1300
Boyce-Codd Normal Form (BCNF)
While 3NF eliminate most anomalies, it does address them all. Consider the example in listing 8. Assume that a student can have more than one major, a major can have one or more advisors, and that an advisor can advise on only one major. StudentID cannot be the sole key because a student can have multiple majors and thus multiple advisors.
We can use a combination of either StudentID and Major to determine the advisor or a combination of StudentID and Advisor to determine the major. Thus, we have two sets of candidate keys, one of which we will select as the primary key. Since the advisor name aslo determines the major (a functional dependency) it is known as a determinant. While this table meets all the requirements of 1NF, 2NF, and 3NF it still has a deletion anomaly. If we delete a student, we risk losing information on which advisors cover specific majors. Again, we can solve this problem by breaking the relation into two, separating out the Student-Major information from the Major-Advisor information.
Listing 8: Student advisor information
A table that defines a relation between students, majors, and advisors.
Primary key: (StudentID, Major)
Candidate Key: (StudentID, Advisor)
Table 1: Advisors
StudentID Major Advisor
100 Chemistry Smith
200 Art Jones
300 History Johnson
Like in the other cases, we can solve this problem by breaking the relation into two components. We could create one table to hold the student-advisor relation and another to hold the advisor-major relation.
Fourth normal form (4NF)
4NF addresses issues associated with tracking two one-to-many relations within a single table as shown in listing 9. The table defines a relation that stores StudentID, major, and activity. A student can have one or more major and participate in one or more activities. Here, we store duplicate information on student 100 because he participates in two activities but has only one major. Again, the solution is to break the relation into two as show in listing 10.
Listing 9: Student information
A table to store student major and activity information.
Primary key: StudentID
Table 1: Students
StudentID Major Activity
100 Chemistry Skiing
100 Chemistry Theatre
200 Art Swimming
300 History Bowling
Listing 10: Student information
A table to store student major and activity information.
Table 1: Majors
Primary key: StudentID
StudentID Major
100 Chemistry
200 Art
300 History
Table 1: Majors
Primary key: StudentID
StudentID Activity
100 Skiing
100 Theatre
200 Swimming
300 Bowling
Once in 4NF, most database designers can feel comfortable in stopping, knowing that they have resolved all deletion and insertion anomalies. Additional normal form definition exist, but most are based on mathematical theories and do little to address additional known anomalies. The most prevalent of these is Domain Key Normal Form (DKNF), proposed by R. Fagin in 1981. By definition any relation in DKNF has no anomalies and a relation with no anomalies is in DKNF.
SQL, the database programming language
Now that we have a fundamental understanding of relational databases, let's take a look at SQL (Structured Query Language), the standard language used for building and accessing relational databases. While there are alternatives to SQL and implementing SQL is not a core requirement of a relational database, it is a very common feature.
SQL is a tool for organizing, managing, and retrieving data from a computer database. The SQL language is essentially a programming language for relational databases. SQL is independent of the underlying database structure.
The flow of information between the user and the database using SQL is similar to that illustrated in Figure 1.
Figure 1: The flow of commands and data between the user and the database.
The first concept to understand is the role of the database management system (DBMS). The DBMS handles the management of the underlying database, handling all communications between the user and the database. Requests for data are sent to the DBMS (as are all database management calls) where they are processed and turned into actual read and writes to the database engine. This makes the DBMS moderately independent of the underlying database engine and often they are run on separate machines for performance reasons.
The process for processing an SQL command is:
- DBMS processes SQL request
- DBMS retrieves data from database by sending a request to the database engine
- DBMS returns data to user
- SQL can control all DBMS functions including:
- Data definitions
- Data retrieval
- Data manipulation
- Access control
- Data sharing
- Data integrity
How SQL manages to accomplish this is based on two principles. First, the SQL command language is broad in scope. Second, SQL draws much of its power from the underlying definition of relational databases.
Remember that one of the fundamental rules of a relational database is the Information rule. The Information rule stipulates that all information regarding the structure of the database must be represented as values in tables. This means that just like the regular information in a database, we can query and change the values associated with the database structure. SQL is indifferent to the fact that we are changing the values in the tables that control the structure of the database or those that represent end-user data. This allows us to use the same commands for database management as we use to query end-user data.
SQL is not, however, a complete programming language. First, it lacks conditional tests (IF) and flow control (GOTO, DO, and FOR) statements. Some database vendors may offer extensions to the SQL language to accomplish these functions, but they are not part of the SQL92 standard. SQL can however be integrated into other programming languages. Finally, despite the name, SQL lacks the explicit structure of languages like C or Pascal
The primary advantages of SQL are:
- Vendor independence/Portability - For the most part, SQL allows the user to change the underlying brand of database and DBMS without having to rewrite the SQL code. (Variations in the standard and proprietary extensions can complicate the issue)
- High-level, English like structure - This makes SQL relatively easy to use compared to other programming languages.
- Standardized - The standard is established by ANSI, with most vendors now supporting the more recent SQL 92 version.
- Dynamic data definition - SQL offers advanced data processing commands that allow for embedded and multi-level database queries.
Installing mysql
Apple provides a version of MySQL with OS X Server (available through the Software Update utility), but users running OS X client are on their own. Before jumping into deploying a whole database server, many users prefer to install a development version on a desktop machine. There is nothing inherent in MySQL or OS X client that keep the two from working together.
MySQL version 3.23.40 is the latest version of MySQL commonly available for OS X. There are a number of different installers available on the web, some are just a basic install of MySQL, while others include GUI management tools. For now, we will stick with the out-of-the-box, non-GUI installs, saving the GUI tools for a later date. This keeps the installation simple, but does require that you use the Mac OS X command-line for administration.
Before installing MySQL on an OS X client machine, I highly recommend first enabling the ‘root' user. The MySQL installer that creates the GRANT tables (the GRANT tables hold all the security information), seems ot detect whether or not the root user is enabled. If the system root user is not enabled, the MySQL ‘root' seems to be established in a state that prcents you from using the account properly. Without a fully enabled ‘root' user, you cannot administer the database. To enable the root user under OS X follow these steps:
- Launch NetInfo Manager from the Applications->Utilities folder
- Select Domain->Security->Authenticate
- Enter your administrator password
- Select Domain->Security->Enable Root User
- You will be prompted to enter a root password
- Quit NetInfo Manager
One of the best installers is put together by Marc Liyanage and is available at http://www.entropy.ch/software/macosx/. This is a basic port of the MySQL code, packaged in an easy to use Mac OS X installer.
The first step is to create a user account for the MySQL install. This account is used so that the MySQL server application is run independent of any active user or the root account, thus making it more secure. To create the account:
- Open System Preferences -> Users
- Create a new user
Name: MySQL user
Short Name: mysql
Password: whatever you want but don't leave it blank!
You are now completely ready to install MySQL.
- Run the Installer package (all items are installed in /usr/local/bin/)
- Open a terminal window
- Type "sudo mysql_install_db", enter administrator password when asked
- Type "sudo chown -R mysql /usr/local/var/"
- Type "sudo safe_mysqld —user=mysql &"
- Type "mysql test" to load the console application and begin working on the test database
At this point you have MySQL installed and running. If you restart your machine, you will need to rerun the command from step 5. Alternatively, you can download an alternate installer that automatically configures MySQL to run at startup.
The default database is called ‘test' and is completely unsecured. Security of the internal database structure is separate from the OS-level security. There are a few things that you should do if your database is hosted on a server with public Internet access.
The mysql_install_db script initializes the grant tables to contain the following set of privileges:
- The MySQL root user is created as a superuser who can do anything. Connections must be made from the local host. The initial root password is empty, so anyone can connect as root without a password and be granted all privileges if they have access to the machine.
- An anonymous user is created that can do anything with databases that have a name of ‘test' or starting with ‘test_'. Connections must be made from the local host. This means any local user can connect without a password and be treated as the anonymous user.
Other privileges are denied.
Because your installation is initially wide open, one of the first things you should do is specify a password for the MySQL root user. You can do this using:
[localhost:/usr/local/bin] pshields% mysql -u root mysql
mysql> UPDATE user SET Password=PASSWORD(‘new_password') WHERE user='root';
mysql> FLUSH PRIVILEGES;
You can also use the SET PASSWORD statement:
[localhost:/usr/local/bin] pshields% mysql -u root mysql
mysql> SET PASSWORD FOR root=PASSWORD(‘new_password');
Only users with write/update access to the mysql database can change the password for others users. All normal users (not anonymous ones) can only change their own password.
If you update the password in the user table directly using the first method, you must tell the server to re-read the grant tables (with FLUSH PRIVILEGES).
You may wish to leave the root password blank so that you don't need to specify it while you perform additional setup or testing. However, be sure to set it before using your installation for any real production work.
If you run into problems, you can recreate the GRANT tables from scratch. To re-create the grant tables, remove all the `.frm', `.MYI', and `.MYD' files in the directory ‘/usr/local/var/mysql'. Then run the mysql_install_db script referenced in the installation steps.
Refer to the MySQL documentation http://www.mysql.com/documentation/mysql/ for more detailed information on securing MySQL and creating users.
Creating a database in mysql
This section provides a very cursory look at creating tables and inserting data. Once you have gone through the process of building your tables and normalizing them on paper, you can use SQL to create them in the database. All these examples are built using the test database. The MySQL documentation can provide more assistance with creating a new database that is customized to meet your needs.
To create a simple table of user names:
mysql> create table users ( firstname varchar(15), lastname varchar(15));
Notice a few things about the structure of the command. First, every command must be terminated with a semicolon (;). This means that a command can span multiple lines of input. For example you might enter the previous command like this:
mysql> create table users (
->firstname varchar(15),
->lastname varchar(15) );
In this command we create a table with the name "users". Table and column names are case-sensitive. Table names must be unique while column names within a table are unique but can be re-used between tables. Finally, the datatype we used to store the name information is a varchar of length 15. This is equivalent to specifying an ASCII string of maximum length 15 characters. There are several datatypes in the SQL language including integer, float, char, varchar, double, date, and money among others.
Once the table is created, we can begin to insert data. To insert data into the users table, use the command:
mysql> insert into users (firstname, lastname) values (‘John', ‘Smith')
mysql> insert into users values (‘John', ‘Smith')
When inserting data, we specify the table name, along with a parenthetical list of columns for which data will be provided. For each column in the first part of the command, we must specify a value to store. Since our ‘users' tables uses varchars (strings), we need to put the values in quotes. The second version of the command has the column names omitted. We can do this when we will be providing data for every column in the table as part of the insert. The values for the insert must be listed in the order the columns were created originally if you don't specify the column names.
Finally, if you want to see the a listing of the column names in a table, you can use the command:
mysql> desc users;
+— — — -+— — — — — — -+— — — +— — -+— — — — -+— — — -+
| Field | Type | Null | Key | Default | Extra |
+— — — -+— — — — — — -+— — — +— — -+— — — — -+— — — -+
| first | varchar(15) | YES | | NULL | |
| last | varchar(15) | YES | | NULL | |
+— — — -+— — — — — — -+— — — +— — -+— — — — -+— — — -+
2 rows in set (0.03 sec)
Listing 11 demonstrates a more complex table creation command that includes defining a primary key and setting the properties of the SID column so that a user cannot insert a row without providing a value for SID. One idiosyncrasy of MySQL is that fields defined as ‘NOT NULL' will have a default value assigned unless otherwise specified. Thus the table defined in listing 11 will actually allow for one row entry without a SID specified, assigning the row a value of 0 for the SID, when a second row is inserted without specifying a SID, the user gets an error message indicating there is a duplicate entry. There is no easy way around this within MySQL because of the internal logic used to store the table definition. It is a minor issue though that you will need to be aware of and is an example of how different databases interpret the same SQL code and definitions.
Listing 11: Example Create with NOT NULL and PRIMARY KEY set
mysql> create table students (
-> SID integer not null,
-> firstname varchar(20),
-> lastname varchar(20),
-> primary key (SID));
Query OK, 0 rows affected (0.01 sec)
mysql> desc students;
+— — — — — -+— — — — — — -+— — — + — — +— — — — -+— — — -+
| Field | Type | Null | Key | Default | Extra |
+— — — — — -+— — — — — — -+— — — + — — +— — — — -+— — — -+
| SID | int(11) | | PRI | 0 | |
| firstname | varchar(20) | YES | | NULL | |
| lastname | varchar(20) | YES | | NULL | |
+— — — — — -+— — — — — — -+— — — + — — +— — — — -+— — — -+
3 rows in set (0.02 sec)
mysql> insert into students (firstname, lastname) values (‘john', ‘smith');
Query OK, 1 row affected (0.03 sec)
mysql> insert into students (firstname, lastname) values (‘john', ‘smith');
ERROR 1062: Duplicate entry ‘0' for key 1
Every time you run a command, MySQL provides a summary of how long the command took to execute. This is useful later on when building more complex queries. One thing you have to be careful with when writing SQL, is building queries that don't consumer significant amounts of CPU time. If you build such a query and make it available to many users, their simultaneous use of the command could bring your database to its knees.
Summary
Designing relational database can be a complex undertaking. The process of ensuring that your data is properly normalized while at times tedious is critical to the long-term success of your database implementation. The best practice is to proceed in small steps, making one change at a time, validating the change in structure and moving on to the next item.
Once the database design is done, start looking at how to implement the structure within the tools you selected. While most modern relational databases offer SQL as a primary programming tool, other offer proprietary languages and GUI tools both designed to offer enhanced functionality or ease of use.
MySQL is just one of many Relational Database Management Systems (RDBMS) for the Mac platform. It has the primary advantage of being free, but has some limitations when compared to the commercial offerings. Other products in the RDBMS field are 4D from ACI, FileMaker Pro, and FrontBase. Unfortunately, high-end systems like Oracle are not currently available for the Mac OS X platform, but you can use tools like WebObjects to connect to an Oracle database running on another platform.
Paul Shields is currently an Advisor and Project Manager for a major telecommunications firm in Dallas, TX. In his role Paul selects and implements the critical technologies that comprise the backbone of the computing infrastructure. Paul is also the Senior Editor for AppleLinks
http://www.applelinks.com/ and The Business Mac
http://www.thebusinessmac.com/. Feel free to forward any questions or comments to him at pshields@applelinks.com.