Mac in the Shell: Learning Python on the Mac
Volume Number: 24
Issue Number: 11
Column Tag: Mac in the Shell
Mac in the Shell: Learning Python on the Mac
An important, built-in scripting language
by Edward Marczak
Introduction
Python is a scripting language that breaks the mold, in many respects. It is fully object-oriented, although you're not forced into the "fully" part. It uses indentation to denote blocks of code, and, it's named after Monty Python. It's an ideal scripting language for OS X. The next few Mac in the Shell columns will be dedicated to learning Python using OS X. If you're new to Python, or rusty on the basics, here's the place to start.
History
Although Python is enjoying a bit of a surge in popularity, it was created over 15 years ago, in 1990 by Guido van Rossum, and is quite mature. Again, think Monty Python, the comedy troupe, not the reptile variety. Python is designed to be relatively simple to learn. Also, thanks to being open source, there's a version of Python for just about any platform you can think of (and if there isn't already, you can get it running there, too!). You can leverage your Python skills wherever there's a Python interpreter, be it Windows, Linux, OS X or portable device.
OS X Tiger ships with Python version 2.3, and Leopard ships with version 2.5 (2.5.1, to be specific). If you need to use version 2.4 for compatibility with existing code, or for your company's standards, source and binary installers are available. See http://www.pythonmac.org/packages/ for a start, if needed.
Jumping In
Interestingly, much like the bash shell that this column has covered, the Python binary is both a run-time interpreter and an interactive shell. I'm going to present these examples by using bash in Terminal.app for now, and get into editors in future columns. So, let's jump in! Start by opening Terminal.app (found in Applications > Utilities). Type python and press return:
$ python
Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
The python interpreter prints some introductory information, and then leaves you at a prompt. The triple greater-than prompt lets you know that you're in Python's interactive shell; python is now "listening." Test this out:
>>> 1+1
2
>>> a="blue"
>>> print a
blue
As you can see, the python interpreter is acting on our input, and displaying output where appropriate. Of course, what we type needs to be valid:
>>> call_home
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'call_home' is not defined
Variables defined in an interactive session persist only for the duration of that session:
>>> myName = "Ed"
>>> print "Hello, "+myName
Hello, Ed
>>> ^D
$ python
Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print "Hello, "+myName
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'myName' is not defined
The "^D" on the fourth line represents pressing control-d, and is how we exit the interactive python interpreter. As you can see, the myName variable was lost when we exited the interpreter, and was unknown by the new interpreter. This makes the interactive python shell a great playground, where you really can't damage much. Despite this, I end up not using the interactive environment very often. There are a few quirks to using it, and it doesn't always model the way a script will run. In essence, I use a text editor and run the program at a bash prompt. For now, we can use vi/emacs/pico or whatever you're comfortable with. If you're a hard-line GUI person, but don't have a choice of GUI editor as of yet, the free TextWrangler from BareBones is very highly recommended (http://www.barebones.com/products/textwrangler/).
Hello, World!
It's probably not too surprising that I'd start off with the classic "hello, world" program. It certainly is an easy way to ensure that our basic environment is working and that we know how to code a basic example. In your text editor, enter the following:
#!/usr/bin/env python
print "Hello, World!"
Save the file as hello.py when done. If you're strictly using the shell, set the executable bit on the file:
$ chmod 770 hello.py
and run it:
$ ./hello.py
Hello, World!
(If you've opted to use TextWrangler, you can skip the chmod/run dance-you can even skip save!-and choose "Run" from the "#!" menu. Output will show up in a new file. If you want to be really cool, add a shortcut to Run using System Preferences > Keyboard & Mouse > Keyboard Shortcuts). "Hello, World" is pretty basic, but it illustrates a few things:
Python is installed and working on your machine.
Python can be used as a run-time interpreter.
The most basic structure of a Python program.
You can copy from an article and type correctly.
Python is designed to be simple, with clear syntax. Not AppleScript-like syntax, but consistent and fairly obvious. It tends to be shorter, yet clearer than other scripting languages. Without explanation, look at the following code, and you can surely understand what it does:
#!/usr/bin/env python
a = 53
secondNum = 17
message = "The sum is: "
sum1 = a + secondNum
message = message + str(sum1)
print message
(This simply prints "The sum is: 70". Yes, it was needlessly long for demonstration purposes).
While a bit contrived, this is also a useful exercise, and shows us a few things about the language:
Identifiers (or, "variables") have no decorators (like a dollar-sign prefix).
Variable names must start with a letter, but the remainder can consist of letters (upper or lowercase), underscores ('_') or digits (0-9).
Variable case matters! "variable1" and "Variable1" are not the same variable name.
Python tends to do what you expect. While the plus sign adds two integers in the first occurrence above (a + secondNum), it is overloaded to also concatenate strings as seen in its second use above (message + str(sum1)).
What is not obvious from this example is a point I made earlier: Python is a fully object-oriented language, and everything is treated as an object (but don't feel objectified yourself!). Again, variables are identifiers to memory locations; former and current C/asm programmers should understand this implicitly. To demonstrate this, the id() function can be used to retrieve the current memory location. This time, I will use the interactive shell. Type python in a command shell to enter the Python interpreter.
Assign a string to the variable a:
>>> a = 'this is a string'
display it:
>>> a
'this is a string'
Let's see its memory address:
>>> id(a)
445040
Now, add on to a:
>>> a = a+" that is not a bulldog!"
and show the address of a again:
>>> id(a)
6223136
The address is completely different! What happened? The "a = a+..." command doesn't simply add on to the existing string in a, but throws a away entirely, creating a new variable from the result of the operation on the right hand side (RHS) of the equal sign.
Not having to worry about underlying memory is a hallmark of most scripting languages. Python takes it to another level, though, by handling garbage collection/memory reclamation, allowing you access to those lower level details and handling memory efficiently. Watch this demonstration:
Create a variable:
>>> g=5
Create a second variable equal to the first:
>>> h=g
Print the second variable:
>>> h
5
Find out where they both live in memory:
>>> id(g)
8402264
>>> id(h)
8402264
However, reassign one of the variables that point to the same memory:
>>> h=7
...and the former is not altered:
>>> g
5
>>> id(h)
8402240
So, if two variables are equal, they can simply point to the same location in memory. In this example, like the previous example, assigning something to one of the variables creates a new variable, and it will, most likely, receive a new location in memory. While you don't really have to care about where Python stores data in memory, this is a core concept in Python, so please ensure that you understand it before moving on.
Data Types
While no declaration of a variable is needed before use, Python is a strongly typed language and, once assigned, a variable's type cannot be changed. As mentioned earlier, everything in Python is an object. Every object has the basic attributes of identity (memory address), type (how to treat what's in that memory address) and value (what is actually stored in that memory address). Certain objects are immutable, meaning that their value cannot change once assigned. Conversely, other types are mutable, and can change their value. What are these types? Here is a list of basic data types built-in to Python:
Numeric:
Integers: elements from the mathematical set of integers.
Plain Integer (int): range from -2147483648 through 2147483647.
Long Integer: Basically gives an unlimited range, subject to available memory only (virtual included).
Boolean: True and False (behaves like 0 and 1).
Floating Point Numbers: Double-precision floating point numbers.
Complex Numbers: These represent complex numbers as a pair of machine-level double precision floating point numbers.
Sequences: Any data type in the sequence category is a finite ordered set, indexed by non-negative numbers.
Strings: Simply, an array of 8-bit bytes.
Unicode: An array of Unicode code units. In other words, a Unicode string.
Tuples: This type is a true Python-ism, and represents an arbitrary list of objects, formed by comma-separated lists of expressions.
Lists: Mutable; a comma separated list of arbitrary Python objects.
Sets: Like sequences, however, sets are unordered, finite sets of unique, immutable objects. This also means that sets cannot be indexed like sequences can.
Sets: Mutable unordered sequences.
Frozen Sets: Immutable unordered sequences.
Mappings: Mappings are finite sets of objects, indexed by arbitrary index sets. This type is mutable and fast.
Dictionary: This type is a collection of key/data pairs. Also known as a hash table.
Files: Represents an open file. Remembering that everything is Unix is treated as a file, this type includes on-disk files, but also stdin, std out and stderr as well.
Because everything is an object, the list is more expansive then I'm detailing here (for example, classes are a type, also). Also, custom types can be defined and imported into a program, although, custom types are made up of the foundational types. For our introductory purposes, though, this is just the right amount of information.
You've already seen examples of strings and integers in the examples in this article. Since they're the most familiar and natural to most people, I'll continue to focus on those types for now.
Let's Talk about Strings
Strings in Python fall under the category of a sequence. In effect, they are arrays, and can be treated as such. Python has fast and effective string manipulations and idioms that you'll need to be comfortable with before moving on to anything even remotely advanced.
For this portion of the article, you can use the interactive interpreter to gain an appreciation of strings and string manipulation a little faster, and then we'll put it together afterwards. Get into a shell, type python, and look for the triple-greater-than prompt. Best way to learn about strings is through examples. Assign "I am a string" to the variable "text":
>>> text="I am a string"
Display it, to verify:
>>> text
'I am a string'
Select only the first character from the variable text:
>>> text[0]
'I'
(remember: we start counting from 0). Selecting part of a sequence is called a slice in Python. Slice of character range 2 though 4:
>>> text[2:4]
'am'
(A slice includes the first character specified up through but not including the specified ending character). Display element 5 through the remainder of the sequence:
>>> text[5:]
'a string'
Leave the start parameter empty to start from the beginning:
>>> text[:4]
'I am'
A slice can also accept an option third parameter to specify a step:
>>> text[::2]
'Ia tig'
(In other words, start at zero, run through the end, giving every second character). Negative values for the step start at the end of the sequence:
>>> text[::-1]
'gnirts a ma I'
In most other languages, there are dedicated sub-string functions to perform the kind of manipulations you've seen here. Python generalizes this by allowing slices to work for any sequence type. We'll see more of this as we get into other types.
Strings can be concatenated and printed in a few different ways. Let's assign two variables as strings:
>>> a="Hello"
>>> b="there"
The print command can use a comma, which adds a space character to the output:
>>> print a,b
Hello there
The plus sign can be used to add strings together:
>>> print a+" "+b
Hello there
That's the raw basics of strings, with a little left over for next column.
How Python Works
Before we wrap up this column, I want to make it perfectly clear how this is all working and some unique things about Python. First, Python programs are simply text files. That's it; any text editor can be used to create and edit source code for a Python program. These text files typically are named with a .py suffix, but that's not technically necessary, although I'd urge you to get into the habit of doing so. These text files are run through the Python interpreter. Unlike many other scripting languages that read a line of source and then execute it, Python has a trick up its sleeve.
Before running source code, the Python interpreter creates an intermediate byte-compiled version of the program. This machine-independent version of the source is created automatically for you. It is this data that is then fed to a Python virtual machine. All of this is done to speed execution. When a byte-code compiled version of your code is created, it is saved alongside the source with a .pyc ("python compiled") extension. (Of course, it will only do this if you have write permissions to the directory with the source. If you don't, no big deal-python will create the byte-code in memory, and drop it on program completion). Here's another speed saving step: if python finds a byte-code compiled .pyc file with a timestamp newer than the source-in other words, the source code hasn't been touched since the .pyc file was created-python simply uses the already compiled .pyc file. For larger programs, this is a huge benefit in startup and run-time speed.
If you've been following along, and you created a "hello.py" program, but notice there's no corresponding .pyc file, you're right! Python only creates byte-code for source that is imported - a topic we'll cover next month. In the meantime, if you want to fool python into creating a byte-code compiled version, we can (although, be aware that you do not need to do this in the real-world, and is for illustration purposes only). Create a new program called shell.py in the same directory as hello.py, and type in the following short program:
#!/usr/bin/env python
import hello
Do the save-chmod dance outlined earlier, and run it. You'll see the familiar "Hello, World!" output. If you now list the directory, though, you should have a new file named hello.pyc. This is a complete representation of the source in hello.py, simply pre-compiled for the Python virtual machine. In fact, python will happily run this without the original source. Remove or otherwise rename hello.py. Now, ask python to run the byte-code:
$ python hello.pyc
Hello, World!
This ability will come in handy in other ways-something we'll cover in future columns.
Fin
While this was a simple, gentle introduction to Python, it presents the right foundation for us to build on. It also should give you enough to fool around with. Consider a bit of homework, just to make you go through the process: write a program that creates two integer variables, "start" and "end" and one string, "text". Have the program print a slice of the string using the variables and a print statement that precedes the string with "The slice is: ".
Overall, Python is easy, and makes a great first scripting language or introduction to programming. Next month, we'll cover the exercise, and get in a little deeper.
Media of the month: I don't think I've ever had a media selection tie into the column at all. However, for just this once, I'll go for it: "The Complete Monty Python's Flying Circus 16-Ton Megaset" on DVD by Eric Idle, John Cleese, Carol Cleveland, Terry Gilliam, and Terry Jones. Old fan or new, it's just funny.
Until next month, go practice some Python!
Ed Marczak is the Executive Editor of MacTech Magazine and the author of Mac OS X Advanced System Administration v10.5. Offline time is spent with his wife, daughters, rabbit, turtle and fish.