Mac in the Shell-Learning Python on the Mac: Modules
Volume Number: 25
Issue Number: 03
Column Tag: Mac in the Shell
Learning Python on the Mac: Modules
Organize code into reusable entities
by Edward Marczak
Introduction
Python has an incredible amount to offer, and its concept of modules is near the top of the list of power. Modules bring a lot of functionality to the language, and offer the author the ability to reuse code in as many different applications as they wish. Since the start of our journey down the path of Python, we've covered basic data types, flow control and functions. Modules will be one of the last very high-level concepts we cover for a bit, but it's an important one. Like most Mac in the Shell columns, this one, too, is best read with a Terminal.app window open and your favorite text editor up and running. So, clear your brain, and let's dive in.
What is a Module?
Modules, really, are just files. Like many things in Python, some very powerful features come with little fanfare, and they 'just work.' By placing functions, variables and code into a standalone file, you create an organized, self-contained, reusable package known as a namespace. In the past two months, you've seen the import keyword, and just had to trust me that it did something useful.
In actuality, import reaches out into the file system, locates, reads and runs an external module. The imported file becomes an object in the importing script, and all definitions in the file become attributes of that object. For illustrative purposes, let's imagine this: top-level ("main") file a.py imports b.py. b.py also imports c.py and also imports some modules from the standard Python library of modules. This is shown in Figure 1.

Figure 1 - Python high-level program architecture.
In essence, the introductory Mac in the Shell articles introducing Python were incredibly simplified; it's rare that a Python program won't have the need to import a module. While we'll be concentrating on importing modules written in Python, do note that Python can actually import a variety of different module types.
How does python know where to find modules? The python interpreter walks a pre-defined set of locations to know where to look. After looking in the same directory as the importing program, an environment variable named PYTHONPATH is consulted. PYTHONPATH is a simple list of directories in which to search for modules. It is constructed just like the PYTHONDOCS variable, covered in the second article introducing Python (December 2008, issue 24.12). After searching PYTHONPATH, python will search among the standard library (location defined at compile time). Finally, python includes any paths listed in .pth files (a slightly advanced topic that we won't be covering this month).
Most people don't ever have to create a PYTHONPATH variable, as the default locations serve most needs. However, if you're sharing development among a few people and need to have a common location for specific modules, you'll need to point PYTHONPATH to your custom location. Like the PATH variable, PYTHONPATH is simply a colon-separated list of paths, searched in order (from left to right). For example, in the bash shell:
export PYTHONPATH="/usr/share/python/"
With this path set, python will additionally search for modules in the /usr/share/python directory.
All of this comes together for the python interpreter as sys.path. From a running program, or, via the interactive interpreter, import the built-in sys module and print sys.path. Here's an abridged listing:
>>> import sys
>>> print sys.path
['', '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python25.zip',
'/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5']
Note that the path starts with an empty entry, which is how the program's directory is searched.
Let's start with a simple example to demonstrate modules.
Using Modules
Sticking with the a.py, b.py and c.py concept, let's define them for real. Follow along with these short listings.
Listing 1 - a.py
#!/usr/bin/env python
import b
if b.IsInAdmin('root'):
print "root is in the admin group"
else:
print "root is not in the admin group"
As stated earlier, the import statement reads a module file and executes it. So, the line import b reads the file b.py and executes it like any other Python program. The if statement tests the return value from an object imported from b. We can also say that the object is in b's namespace. Listing 2 reveals what b.py is doing.
Listing 2 - b.py
#!/usr/bin/env python
import c
def IsInAdmin(user):
admins = c.GetAdminGroup();
if user in admins:
return True
else:
return False
b.py starts off with its own import: c.py. b.py defines one function, IsInAdmin(). Finally, look at c.py to see how it works its magic.
Listing 3 - c.py
#!/usr/bin/env python
import plistlib
def GetAdminGroup():
admin_plist = plistlib.readPlist('/var/db/dslocal/nodes/Default/groups/admin.plist')
return admin_plist['users']
c.py imports plistlib-something you don't have to define, as it's part of the standard modules shipped with OS X. plistlib allows the author an easy manner of reading and writing OS X's plist files (ASCII only, currently). Here, we'll directly grab the admin.plist file and return the contents from the users key.
Save all three of these files in the same directory, mark a.py executable (chmod 770 a.py), and run it via sudo (it needs root privileges as the admin.plist file is protected). The output looks simply like this:
$ sudo ./a.py
root is in the admin group
Overall, that was fairly little work. Of course, in a real-world scenario, you wouldn't need to break it down as far as we did here. However, GetAdminGroup() might make a really nice addition to a larger OS X-specific library.
Imports and Namespaces
We've spoken a little about the concept of namespaces without formally defining it. A namespace is a mapping from names to objects. The important thing to realize is that there is no relation between names in different namespaces. Two modules can define x, and there will be no conflict. Internally to python, namespaces are implemented as dictionaries (remember those? Covered briefly in our introduction - November 2008, issue 24.11). To see this in action, run the interactive python shell (by typing python at the shell prompt) in the same directory as the modules we've been working on. A module's dictionary is hidden away in the __dict__ attribute. It responds to all standard dictionary messages. For example, to list the contents of the object, import our c.py module:
>>> import c
and then list the keys:
>>> c.__dict__.keys()
['GetAdminGroup', 'plistlib', '__builtins__', '__file__', '__name__', '__doc__']
This is the same list that you get already with the dir() function, covered in earlier columns. However, it does lead to some fairly (advanced) trickery and techniques.
The point of this is that an import statement will import objects into their own namespace. This name of a given namespace is derived from the filename. This means that module filenames must conform to good variable names. You can certainly create a module named while.py, but since "while" is also a built-in keyword, you won't be able to import it. Our c.py module has been imported into a namespace of 'c'.
This also means that you should be careful to name your module something useful, as you'll be referring to it in code... or will you? Well, that depends. It depends somewhat on how you import a module, and somewhat on style. I'll say it now: my personal preference is that you always import as we've been showing-by using the import keyword-and referencing the namespace in code. Some people consider this bulky, though. Look at our plistlib usage:
plistlib.readPlist()
Instead of simply calling readPlist, we have to qualify it. This prevents it from conflicting with any other function (object, really) of that name. Importing plistlib created a namespace for that module. However, there is a way to import into the current namespace, and that relies on the from keyword variant.
To import an entire module, or just a select part of it into the current namespace, you tell python, "from [module] import [what]". The "what" can reference a specific object in the module, or use a wildcard. To import all of c.py directly into the current namespace, b.py can issue:
from c.py import *
b.py could then make the following change:
admins = GetAdminGroup();
See? No need to qualify the function call. Also, be aware that a subset of a module can be imported by specifying it explicitly:
from plistlib import readPlist
While all of this may seem to lessen the burden on the python interpreter, it doesn't in reality. The problem with this is much more from a maintenance-for-the-author perspective. This is especially true when multiple people are working on a project, each by contributing a module. As you may guess, conflicting object names simply get crushed, and the last name defined 'wins.' You, the author, lose, of course. Additionally, since python reads the entire module file in any case, or, has a pre-compiled byte-code version, there are no real performance/time savings by importing a single object. You may change a module that is imported in this manner and pollute the namespace of the importing application without realizing it. Using a standard import, there will never be conflicting objects.
To Be Executed
I mentioned above that when a module is imported, it is run like any other Python application. This is true; it's how objects get created, just like in the top-level file. We can prove this, too. Consider the following code:
Listing 4 - msgtest.py
#!/usr/bin/env python
print "Running top-level python file"
import msgmod
print "Finishing up top-level file"
Listing 5 - msgmod.py
#!/usr/bin/env python
print "Running msgmod.py module"
Running msgtest.py yields the following output:
Running top-level python file
Running msgmod.py module
Finishing up top-level file
Not surprising if you've been following along! The same goes for any assignment statements in a module. Of course, functions are read and turned into an object, but the contents of which are not executed until explicitly called. Sometimes, however, a file has value as both a top-level file and as something to be imported. As a top-level file, we want it to execute, but as an imported file, we just want it imported, ready at our beck and call. There's a Python-ism that addresses this. A built-in variable, __name__, is assigned in each module at runtime. When the top-level application is run, __name__ is "__main__". Instead of just defining statements at the top indent-level, every statement in the top-level file is put into a function (or class) definition. The only top level statement tests __name__, and call your main function as appropriate. For example, our a.py program would be altered like this:
#!/usr/bin/env python
import b
def main():
if b.IsInAdmin('root'):
print "root is in the admin group"
else:
print "root is not in the admin group"
if __name__ == '__main__':
main()
This way, if we had other useful functions in a.py, it could be run directly, or imported as a module.
I will be using this idiom going forward.
Fin
For this month, that's enough. Modules are such a core concept in Python, it's worth it to make sure it's understood before moving on. Next month, we'll talk a little more about style, and then cover classes.
As an aside, I met many people at Macworld who were really exited by this column turning its attention to Python. I'd love to hear ideas from all readers, so, letters@mactech.com and emarczak@mactech.com are ready and waiting for mail.
Media of the month: "Synthetic Worlds: The Business and Culture of Online Games," by Edward Castronova. Interesting work that covers, mainly, the economy of virtual worlds (ok, MMOs). The book is a touch dated at this point, but overall, it should make the reader think about how much more our technology is touching and influencing our 'non-technological' world.
See you next month!
Ed Marczak is the Executive Editor of MacTech Magazine. He lives in New York with his wife, two daughters and various pets. He has been involved with technology since Atari sucked him in, and has followed Apple since the Apple I days. He spends his days on the Mac team at Google, and free time with his family and/or playing music. Ed is the author of the Apple Training Series book, "Advanced System Administration v10.5," and has written for MacTech since 2004.