Mac in the Shell: Learning Python on the Mac: Functions
Volume Number: 25 (2009)
Issue Number: 02
Column Tag: Mac in the Shell
Mac in the Shell: Learning Python on the Mac: Functions
Modularizing and simplifying your code
by Edward Marczak
Introduction
We've been learning Python on the Mac and have so far covered the basics. We need to introduce a little more foundation this month. Functions-in any language-allow the author to create reusable blocks of code. This is important from a few perspectives: code reuse, the ability to refactor easily and debugging. Functions lead to building libraries, both of which are critical concepts to becoming an effective Python programmer and part of an essential foundation for creating an OS X utility or application.
Jumping In
Let's get started. Like variables, functions don't need to be defined before use. Need a function? Just define it. Similarly, functions may be defined anywhere in the source file, and in any order. Since a function defines a code block, the rules of indentation apply: choose spaces or tabs (trying to be consistent with the style you use in the rest of your code) and indent the entire function. Here's an example:
#!/usr/bin/env python
def SayHello():
print "Hello"
SayHello()
This short program simply prints "Hello." Not too exciting, and it really could have been accomplished in one line. But it does help illustrate the basics: A function is created by using the def keyword, supplying a function name-unique to this source file-followed by parenthesis and terminated with a colon character. The lines following need to be consistently indented; the first line that lowers the indentation level ends the code block that comprises the function. The previous example is really as simple as it can get. Let's look at something a little more useful: a function definition that prints the square of a number passed into it.
def Square(number):
print number,"squared is",number ** 2
Once defined, this function can be called as many times as you have need, individually, or in a loop:
for i in range(1,6):
Square(i)
This will produce:
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
However, there's a problem with this function. What if we only want to compute the value of a square and keep it for later, rather than print it out? That's where the return keyword comes in. return sends a value back to the caller. So, our Square() function could be rewritten like this:
def Square(number):
return number ** 2
and would need to be called like this:
x = Square(44)
or, used in-line like this:
print "The square of 78 is %d." % (Square(78))
How, you may ask, is this any better than the first version? The answer is two-fold: first, you'll typically use functions to build up more complex sections of code. The times that you have a one-line or very simple function usually involve creating a function for readability purposes. Second, by returning a value, you can store it for later, rather than display it or otherwise use it immediately.
Libraries
Let's build up a small set of useful functions, and learn some new Python skills along the way. We'll improve and expand on these functions as time goes on. Also, for now, some of this is a little more advanced than we've covered, so, just trust me for now, and this will all be covered in future columns. Here's the beginning of the code, along with the first function, and yes, it's one that you just need to trust me on for now:
Listing 1a - Wrapper for subprocess
#!/usr/bin/env python
import subprocess
def RunProc(command,env_vars=None):
"""Wrap subprocess into something reasonable.
Args:
command: List containing command and arguments
env_vars: Shell environment variables that should be present for execution
Returns:
stdout: output of the command
"""
proc = subprocess.Popen(command, stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env=env_vars)
(stdout, stderr) = proc.communicate()
return stdout
Note that I'm also shooting for good habits, here: commented code, readable variable names and consistent, clean code style (space after comma, spaces around equal signs, and so on). The above code wraps the subprocess library calls to make it easier to run a child process. It's less than perfect right now, but we'll correct that in due time (not this column, however). Let's get on to a useful OS X-related function.
Listing 1b - GetLocalUsers function.
def GetLocalUsers():
"""Fills a dictionary with all system users
Args:
None
Returns:
Dictionary, filled with uid/username pairs
"""
command = ['/usr/bin/dscl','.','list','/Users']
output = RunProc(command)
userList = output.split('\n')
userDict = {}
for userName in userList:
if userName is not '':
command = ['dscl','.','read','/Users/'+userName,'uid']
output = RunProc(command)
uidLocation = output.split(':')
uid = uidLocation[2].strip()
userDict[uid] = userName
return userDict
The GetSystemUsers() function will fill a dictionary with uid (the key) and the short name that it is assigned to. It takes no parameters, and can be called like this:
userDict = GetLocalUsers()
To extract information, simply request the value using the uid as the key:
print "User ID 74 is %s." % (userDict[74])
(It's _mysql, if you're curious and not typing in code while reading this).
There are some new concepts in Listing 1b. The split string function (line 10 of listing 1b), creates a list, each element created by the boundary of the character argument. For example, the string 'This is a string' when split using a space (string.split(' ')) becomes the following list:
['This', 'is', 'a', 'string']
Any character can be used to provide the boundary marker.
This function is nice, but could certainly be improved. What if we only wanted standard users, and not all of the system users? That's where function arguments come in. Like passing arguments into a command line application, functions can accept arguments, too. Alter our function to do so:
def GetLocalUsers(incSystemUsers):
This will allow us to pass a value into the function. Great, but we need to do something with it. Let's expect this to be a Boolean True or False: if True, include the system accounts, if False, do not. (I'm classifying accounts based on uid-only system accounts have a uid of less than 500). Update the code portion of the function to utilize this new variable (the first ten lines are the same as before-I just wanted to give some context):
command = ['/usr/bin/dscl','.','list','/Users']
output = RunProc(command)
userList = output.split('\n')
userDict = {}
for userName in userList:
if userName is not '':
command = ['dscl','.','read','/Users/'+userName,'uid']
output = RunProc(command)
uidLocation = output.split(':')
uid = uidLocation[2].strip()
if incSystemUsers:
userDict[int(uid)] = userName
else:
if int(uid) > 499:
userDict[int(uid)] = userName
return userDict
We use a simple if/else statement to determine if the user should be included in the dictionary that is returned. We should update our comments about this in the function, too.
Now, we have to include a value when we call the function. Like this:
userDict = GetLocalUsers(False)
This is an improvement to this function. Frankly, though, more often than not, you probably do only want standard users.
Default Arguments
Python functions understand default arguments. Change the function definition to read:
def GetLocalUsers(incSystemUsers=False):
Now, if we don't tell GetLocalUsers how to behave, by not including an argument, it will assign False and continue on. So now, we can once again call the function with no arguments:
userDict = GetLocalUsers()
This is equivalent to calling GetLocalUsers(False). We really only have to pass in a value of True if we wish to override the default behavior. It is possible to combine default and non-default arguments in a single function definition. For example:
def Mount(path, method="afp", mount_point="/Volumes", shadow=False):
This fictional function, Mount(), has four parameters, only one which you are required to provide when called: path. The remainder use the defaults provided. Therefore, this function can be called in the following ways:
Mount("serv.example.com/share")
Mount("serv.example.com/share", "nfs")
Mount("serv.example.com/share","/tmp/mnt")
These examples use simple positional arguments: you need to know the position of the argument and pass in the values in the correct sequence. When creating a function definition, default arguments should be grouped at the end of the list. This allows them to be omitted.
Python also supports specifying the argument that is being passed in. This allows us to call the function like this:
Mount("serv.example.com/data.dmg","http",shadow=True)
Note that we're passing in the value for shadow in the third position, where we would typically specify mount_point. This is an excellent way to pass parameters into a function, as it makes it exceptionally obvious what is taking place. Which of the following code is easier to read?
ConvertFile('rawdata.txt','sales.csv','csv',True)
or:
ConvertFile(infile='rawdata.txt', outfile='sales.csv',
format='csv',Totals=True)
Hmmmm? Using named arguments is a good habit to get into.
Variable Length Argument Lists
As a final discourse in functions, what if you don't know how many parameters a function may receive, or, has a number that may change from call to call? Python supports variable length argument lists. Two decorators designate how these values are received. If a single asterisk is used, the values are received into a tuple, receiving any excess positional parameters, defaulting to an empty tuple. If a double asterisk is used, it is initialized to a new dictionary receiving any excess keyword arguments, defaulting to a new empty dictionary. Let's see how this works. Here's a short, sample function that accepts any number of arguments:
def SomeFunction(*args):
print args
It can be called like this
SomeFunction(24, 'blah', 'cold', 88, [22,34,66],'starch')
Note that arguments do not need to be of the same type. Pass in strings, integers, or even lists and dictionaries. The same goes for the double-asterisk decorator, with one exception: since you're creating a dictionary, you need to pass the key and the value.
Changing the function definition to:
def SomeFunction(**args):
allows the function to be called like this:
SomeFunction(number=24, some_string='blah',
a_list = [22,34,66], string2='starch')
In our case, this will cause the function to output:
{'a_list': [22, 34, 66], 'some_string': 'blah', 'string2': 'starch', 'number': 24}
Finally, realize that you can mix variable arguments along with standard and named arguments. In the examples used above, since we only specified a variable argument, passing in an argument is completely optional.
In Conclusion
I was hoping to rip through functions and talk about modules in this column, but then I started writing, and realized the scope of the topic. I want to do justice to both functions and modules, as they're both critical foundational subjects. Next month, we'll build on the functions introduced this month and dig into modules.
Media of the month: I'm not going to call out any one particular 'thing,' but will give this directive: Find something new. If you've only been exposed to the Mac, go pick up a book (or find a web page) about Windows, FreeBSD or Linux. Compare to OS X. If you're a MySQL person, download and explore PostgreSQL, or learn the command-line interface to SQLite. You get the idea-stretch your boundaries.
Ed Marczak is the Executive Editor of MacTech Magazine. He lives in New York with his wife, two daughters and various pets. He has been involved with technology since Atari sucked him in, and has followed Apple since the Apple I days. He spends his days on the Mac team at Google, and free time with his family and/or playing music. Ed is the author of the Apple Training Series book, "Advanced System Administration v10.5," and has written for MacTech since 2004.