Mac in the Shell: Debugging Python
Volume Number: 25
Issue Number: 11
Column Tag: Mac in the Shell
Mac in the Shell: Debugging Python
Stepping through code
with pdb
by Edward Marczak
Welcome
We've been covering Python in this column for the last few months. We've gone from the basics, such as built in data types, variable assignment and so on, through more advanced concepts such as creating classes and integrating with Cocoa via PyObj-C. The intent was never to imagine that this column alone would turn you into a master Python programmer, but to give you the tools and direction to do so. One tool that you will need on that journey, though, is a decent debugger. While it's less common in scripting languages like Python and Ruby to use a debugger, when something is just not working out as expected and you can't figure out why, a peek at the code while it's running is invaluable. This month, I'll show you how to do that in Python using the python debugger ("pdb").
Do The Needful
Using a Shell
The instructions in this column always try to respect the way people are used to working. However, debuggers are interactive and grew up in a shell environment. While there may be interaction with certain editors, that will be outside of the scope for this article. Edit in whatever editor you like, but we're going to run and debug from a shell. (I think the general unease with the shell is lessening in the Mac community...right?). So, fire up Terminal.app (or iTerm, Terminator, etc.) and we'll get started.
Learning Your History
A debugger is itself a program that lets you examine another running program. You can use a debugger to step through the running code of the target program one line at a time, examine the values of variables at a given point in the code, run up until a certain breakpoint and examine a program crash or exception. One of the more well-known multi-language debuggers is the GNU Debugger, or "gdb." While you could use gdb to debug Python, there happens to exist a Python-specific clone of GDB called pdb, or, the Python Debugger.
In many scripting languages, programs are typically relatively short, and debuggers are often unnecessary. Many scripters are accustomed to sprinkling 'print' or logging statements through their code that reveals the value of particular variables at a particular point in the program's execution. However, you may have realized that Python is a bit more grown up than many traditional scripting languages. There are many fairly large systems written in Python. As an application gets larger and contains more dependencies, a dedicated debugger becomes not only useful, but necessary.
Getting Started
We'll start off with some simple code as an example of basic debugging. You write the code in listing 1 in the hopes finding prime numbers through and including 10.
Listing 1: prime_debug.py - sample code for debugging.
#!/usr/bin/python
for n in range(2, 10):
for x in range(2, n):
if n % x == 0:
print n, 'equals', x, '*', n/x
break
else:
print n, 'is a prime number'
Of course, you run this code and see something a little different than you expected—there are two problems in this code. A basic reason for debugging! (Kudos if you already see the errors).
The Python debugger is implemented as a module, so, like other modules, you need to import it. Add the following import after the shebang line:
import pdb
You'll also need to pick a point where you want to start tracing. Since this is an example, we'll start right at the top. So also add the set_trace method immediately following the import statement:
pdb.set_trace()
Now you can just run the program (mark it as executable first with a chmod 770 prime_debug.py or simlar). However, when you run the program this time, you're looking at something different. Something like this:
$ ./prime_debug.py
> /Users/marczak/dev/py/prime_debug.py(7)<module>()
-> for n in range(2, 10):
(Pdb)
What you are looking at is the pdb interactive debugger waiting for your command. You'll see this when the pdb.set_trace() method is called. At this point, pdb stops all execution, displays the statement that it's waiting to execute next and displays its prompt. For our purposes, we want to execute this line (for n in range(2, 10)), so, we enter n, for "next." After pressing return, we're greeted with new information and a new prompt:
-> for x in range(2, n):
(Pdb)
Ah! We've moved on to the next line of the program, and are looking at the next statement to execute. To do so, you can simply press return, as pdb will repeat the last command you gave it by pressing return. Keep doing this a few times until you're comfortable with the display and what you're looking at.
Just so we can get back in sync, quit the debugger and we'll start again. To quit pdb at any time, issue a q command. You'll see a diagnostic "bdb.BdbQuit" line printed and find yourself back at a shell prompt.
Run your program again and let it drop into the debugger, and let's do something a little more useful this time. Tracing program flow is useful, but just as useful is being able to examine the value of variables. You're now essentially waiting for the first line of the program to execute: "for n in range(2, 10)." If you try to examine the variable n right now, you'll receive an error, since this line hasn't yet executed and n isn't yet defined.
First, execute this first line by entering n for "next," then enter p n, which stands for "print the contents of n." You can display the contents of any variable with the p ("print") command. In our example, the output should look like this:
-> for x in range(2, n):
(Pdb) p n
2
This is completely in line with our expectations: n is 2, right at the beginning of its range. (Note that the displayed line is the next line, not the one we're examining the variable of).
Finding our problem
Let's go off and find our problem, which is actually two-fold. The output currently looks like this:
3 is a prime number
4 equals 2 * 2
5 is a prime number
5 is a prime number
5 is a prime number
6 equals 2 * 3
7 is a prime number
7 is a prime number
7 is a prime number
7 is a prime number
7 is a prime number
8 equals 2 * 4
9 is a prime number
9 equals 3 * 3
This is all technically correct, but ugly. What's with the repeating lines? Also, we wanted to find values through 10, not 9. Since the first time we see the repeating lines is when n is equal to 5, let's find that point. Run the program, step through each line using the n command until you see the first output of "5 is a prime number." It will look like this:
> /Users/marczak/dev/py/prime_debug.py (9)<module>()
-> if n % x == 0:
(Pdb)
> /Users/marczak/dev/py/prime_debug.py (14)<module>()
-> print n, 'is a prime number'
(Pdb)
5 is a prime number
> /Users/marczak/dev/py/prime_debug.py (8)<module>()
-> for x in range(2, n):
(Pdb)
Now, let's pay attention as we continue to step through. After a few iterations (or sooner), it should become clear: our if statement is not True, which is fine, and the else clause is running our print statement, which isn't fine. We really only want to print that notification on the way out of the loop when it fails to find a factor. So, our logic error is simple: we have the wrong level of indentation on the else statement. It should be un-indented one level, to be a part of the for loop. The entire loop should look like this:
for n in range(2, 10):
for x in range(2, n):
if n % x == 0:
print n, 'equals', x, '*', n/x
break
else:
print n, 'is a prime number'
Again, notice the subtle difference in indentation for the else portion – it's really a part of the for loop. If you're 'too close' to your code, that's an easy one to miss. However, debugging can be similar to explaining your code to a rubber duck—you know how it's supposed to work, but you only have the 'a ha!' as you step through it.
Make it Easier
We found our major error, but now have another: we want to print primes up through and including the number 10. If you're like me, you need a refresher at this point as to where you are in the code. Issuing an l (not "one," but "ell," for "list") will do just that:
(Pdb) l
3 import pdb
4
5 pdb.set_trace()
6
7 for n in range(2, 10):
8 -> for x in range(2, n):
9 if n % x == 0:
10 print n, 'equals', x, '*', n/x
11 break
12 else:
13 # loop fell through without finding a factor
Ah! Now I know where I am. All we're really interested in from this point on is the value of n. Stepping through the remainder of the code shows that the initial for loop exits after 9. Didn't we ask it to run until 10?
Yes we did, but that's our misconception. Looking at the Python documentation for range() shows that the range intentionally excludes the final number.
While this may not be a common mistake that you make, it turns out that this is still a useful exercise: you may not always be debugging your own code.
Dealing with Functions
There's a few more pdb commands to understand before you tackle larger python programs. Specifically, you'll want to know how to deal with functions. Take, for example, the code in listing 2.
Listing 2: dict_iterate.py
#!/usr/bin/python
import pdb
pdb.set_trace()
def _PrintDict(dict):
"""Recursively iterate over a dictionary, printing results
Args:
dict: The dictionary to print
"""
pdb.set_trace()
for item in dict:
if type(dict[item]) == dict:
_PrintDict(dict[item])
else:
print "%s: %s" % (dict[item], type(dict[item]))
def main():
"""Main routine"""
aDict = {'color': 'blue',
'count': 15,
'cust_info': {'pid': '94758476', 'uid': '348576'},
'style': 'fruit'}
_PrintDict(aDict)
if __name__ == "__main__":
main()
This should look vaguely familiar to anyone who read the previous column on Python. Start this program running and step through it with n—you'll see python touch each function name to create an object for it. If you keep tracing with n ("next"), this program will end very quickly. This is because when the n command reaches a function, it executes the entire function without entering that function. So, stop tracing with n when you arrive at the call to main():
-> if __name__ == "__main__":
(Pdb) n
> /Users/marczak/dev/py/dict_iterate.py(30)<module>()
-> main()
We want to step into main(), so go ahead and enter s (for "step"). You should be greeted with:
def main():
showing that you're now looking at the definition for main(). Keep stepping as we want to also step into the call to _PrintDict().
When you do arrive in the _PrintDict() function, there's a for loop. Once you've traversed that loop, you may no longer be deeply interested in it, but want to get back to where you were before entering this function. pdb has a solution for you: r, for "return." Essentially, "finish up this function and return."
Be aware! Stepping into functions sometimes will have an unintended consequence for you: stepping into an library that you've included. This is often not the code that you're interested in debugging, though it may be. If you accidentally step into a library function—PyObj-C code included—just remember the r command and return until you're back to where you expect.
More pdb Features
You now know the core of pdb and can actually do some serious debugging. However, pdb offers a lot more. Some of which we'll save for another column, but there are two more useful things to pass on.
The easy way out: c, for "continue." If at any point, you've traced through all you've wanted to trace, but don't want to crash the program with a quit (q) command, there is another option. The continue command picks up and runs the remainder of the program.
Even better, though, is this: pdb is letting you load and run your Python program in an interactive environment. You can alter variables just by assigning them:
(Pdb) x =7
(Pdb) p x
7
Imagine the simple code in listing 3.
Listing 3: math.py
#!/usr/bin/python
import pdb
pdb.set_trace()
x = 5
for i in range(1, 10):
print i + x
At any time after x gets assigned, you can reassign it. Your debugging session can look like this:
$ ./math.py
> /Users/marczak/dev/py /math.py(7)<module>()
-> x = 5
(Pdb) n
> /Users/marczak/dev/py /math.py(9)<module>()
-> for i in range(1, 10):
(Pdb)
> /Users/marczak/dev/py /math.py(10)<module>()
-> print i + x
(Pdb)
6
> /Users/marczak/dev/py /math.py(9)<module>()
-> for i in range(1, 10):
(Pdb) x = 20
(Pdb) n
> /Users/marczak/dev/py /math.py(10)<module>()
-> print i + x
(Pdb)
22
This is fantastic news if you want to test your code for fragility around scenarios where variable reach certain values. Of course, as the code runs, if a variable gets reassigned in your program, you need to watch for that.
Reference Sheet
Here's a handy list of the pdb topics discussed in this article:
pdb library: import pdb
Start tracing: pdb.set_trace()
n: execute next command
s: step into a function
r: return from function
c: continue running program
q: quit pdb (and error out of program).
Conclusion
While you may focus on shorted programs now, as your skills improve, your programs should grow in complexity and size. At some point, you'll likely confound yourself, and a debugger comes in very handy during these times. Fortunately, Python has an available debugger that is friendly to use and easy to make available to your program.
Media of the month: I've been in a "back to school" kind of way, but really just studying on my own. There's too much to learn, right? Well, I've taken the school approach: one topic a day and rotate through them and study each night. Now, I'm not recommending anything this drastic, but, I'd bet there's one subject that you've wanted to learn. Now is the time. Hit up your local bookstore, University or Amazon.com, find a book and go. If you're still an actual student, well, keep going!
Until next month, keep scripting.
Ed Marczak is the Executive Editor for MacTech Magazine, and has written the Mac in the Shell column since 2004.