TweetFollow Us on Twitter

Adding Regular Expressions To Your Cocoa Application.

Volume Number: 19 (2003)
Issue Number: 4
Column Tag: Cocoa Development

Adding Regular Expressions To Your Cocoa Application.

Using MOKit to add the ability to match regular expressions in Cocoa.

by Ron Davis

Does your application need to parse data out of a bunch of text, or match strings that can vary some, but have a regular syntax? Do you have a Find command in your text editor? If you do you need to add regular expression matching to your app. Regular Expressions are textual representations of strings match pattern. They go beyond just finding a string and let you do things like find a string that begins and ends with certain characters, but can have anything in the middle. Or a string that contain four numbers followed by a letter.

I've been around the Mac a long time and never really thought about grep or regex or other commands that use regular expressions. But OSX changes that. Every UNIX geek out there knows about grep and it various offspring. Scripting languages like Perl use regular expressions as well, so I thought I needed to learn about them. Once I did I was hooked, and wanted to use them in my own applications. That lead me to Mike Ferris' MOKit, a Cocoa framework that lets you easily deal with regular expressions in your application.

Introduction to Regular Expressions

We'll start with a quick look at regular expression syntax for those of you who have no idea what I'm talking about. The introduction will be fast and shallow. If you need more information check out the URL in the Bibliography at the end of the article.

  
Symbol         Meaning                       Example
character      The character typed,          A is a, b is b, etc.
               with the exception of 
               special characters.

[character -   Any of a range of .           [a-d] = a,b,c, or d.
 character]    characters
 
.              Period matches any one 
               character, except line 
               breaks.
               
#              Matches any digit.            0,1,2,3,4,5,6,7,8,9 
   
\r             return

\t             tab 

\              The escape character like     \. matches a period. 
               in printf. Putting a slash    \\ matches a slash.
               in front of a special 
               character allows that 
               character to be matched.
               
?              0 or 1 of the previous .      ca?t, matches cat, or ct,
               characters                    but not caat.

*              0 or more of the              ca*t, matches ct, cat, 
               previous characters           caat, caaat.
               
+              1 or more of the              ca+t, matches cat, caat,
               previous characters           caaat, but not ct.
               
^              any character but the         (^r23) any character 
               ones after the carat.         but r, 2, or 3.
   
pattern |      match pattern or pattern.     ca|t, matches ca or t, 
pattern                                      but not cat.

(pattern)      Matching: treats what is      (ca)*t, matches cat, or 
               in the parenthesis as a       cacat, but not ct.
               single character.   c(*?)t,   on string coat, 
               Searching: delineates the     returns "oa".
               information to be 
               remembered in a find.

The last pattern there gives you a hint that regular expression can be used in two different ways. One way is matching, where you have a string and you want to know if it is equal to a regular expression. This returns a Boolean value, either the string matches or it doesn't. The other way to use regular expressions is to find a substring or strings in a longer string. When you do this you give an expression and you specify what part of the matched string you want back by placing that part in parentheses.

Let's look at an example or two. Say you let the user input a seven digit zip code and you want to make sure they didn't put any letters in there. You could get their input string and compare it against the regular expression "#+", which matches 1 or more digits, but wouldn't match an empty string, nor one with letters in it.

Now say you have an HTML tag for a link like <A HREF=http://www.radproductions.net/>RAD productions</A> and you wanted to pull out the URL. You could search with the regular expression "=(.*?)>" and you would get back http://www.radproductions.net. You may wonder why the ? is there. If you just put ".*", which means match 0 or more characters, you get to the end of the string because quotes and brackets are characters too. This is called a greedy search. Putting the ? tells it to only search until it finds the next part of the expression string.

MOKit

MOKit is a Cocoa framework written by Mike Ferris. It contains some text manipulation classes, one of which handles regular expressions. The underlying regular expression engine is actually a standard package written by Henry Spencer and used in one form or another by a lot of interesting things such as tcl and perl. MOKit classes are "not public domain, but they are free" according to the web page. The code can be downloaded at http://www.lorax.com/FreeStuff/MOKit.html. You can get both compiled frameworks and the source to MOKit. Version 2.6 was used for this article.

MOKit has two main parts, classes for text completion and classes for regular expressions. We'll only be talking about the regular expression classes here. These classes are MORegularExpression and MORegexFormatter. MORegularExpression is the main class for handling the evaluation of regular expressions. It is the one we'll use in our sample code. Here's its declaration.

Listing 1: MORegularExpression interface.

@interface MORegularExpression : NSObject <NSCopying, NSCoding> {
  @private
    NSString *_expressionString;
    NSString *_lastMatch;
    NSRange _lastSubexpressionRanges 
                           [MO_REGEXP_MAX_SUBEXPRESSIONS];
    void *_compiledExpression;
    BOOL _ignoreCase;
}
+ (BOOL)validExpressionString:(NSString *)expressionString;
+ (id)regularExpressionWithString:(NSString *)
               expressionString ignoreCase:(BOOL)ignoreCaseFlag;
+ (id)regularExpressionWithString:(NSString *)
               expressionString;
- (id)initWithExpressionString:(NSString *)expressionString
                ignoreCase:(BOOL)ignoreCaseFlag;
    
- (id)initWithExpressionString:(NSString *)
               expressionString;
- (NSString *)expressionString;
- (BOOL)matchesString:(NSString *)candidate;
- (NSRange)rangeForSubexpressionAtIndex:(unsigned)index
                inString:(NSString *)candidate;
- (NSString *)substringForSubexpressionAtIndex:
               (unsigned)index inString:(NSString *)candidate;
- (NSArray *)subexpressionsForString:(NSString *)candidate;
@end

As you can see, it is a fairly simple class. To use a regular expression in your code you create an instance of this class. If you need to keep it around, using the initWithExpressionString methods will probably be easiest. If you're just going to use it in the scope of a single method, use the class methods regularExpressionWithString, so you won't have to deal with releasing. Both of these methods have twins that take an ignoreCase parameter which, if set to YES, will cause evaluations to ignore the case of the characters in the expression and the search string. If you don't explicitly set case sensitivity then searches are case sensitive. Here's an example of how to create an expression for finding HREFs in a string of HTML:

MORegularExpression*   linkURLExp = [MORegularExpression regularExpressionWithString: 
                                    @"<A HREF=.*?</A>" ignoreCase:YES];

If you want to make sure the expression you create is valid you can call the class method validExpressionString, which will return YES if the expression is a valid regular expression. If you want to know what an MORegularExpression object's expression is you can get it from the expressionString accessor.

Now we can actually do some evaluations. As I said previously, there are two ways to use regular expressions, to match a string and to find a sub-string. If you have a string and you want to make sure it conforms to the regular expression you created, you can pass it into matchesString and the result will tell you if it matches. This is what MORegexFormatter does. It is a formatter you can add to a field and it will validate the value in that field by the regular expression you give it.

Getting sub-expressions is interesting. If you just want to find the location in the target string of a sub-string, you can use the rangeForSubexpressionAtIndex method. If you want the whole sub-string back as a new NSString* you use the substringForSubexpressionAtIndex, passing the string you are searching for in the inString parameter. The index is which value in parentheses you want back. There can be 0 to 20 sets of parentheses in a MOKit expression, and the index indicates which one you want the range for. So you could create an expression like "<A HREF=(.*?)>(.*?)</A>" to search for a link in an HTML page. If we used the HTML in Listing 2, and you asked for index 0 you would get the whole HREF tag: "<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>". If you asked for index 1, you'd get the link back "http://www.radproductions.net/". If you asked for index 2, you'd get back the text "R.A.D. Productions".

Listing 2: Sample HTML

<HTML>
<TITLE>R.A.D. Productions Home Page</TITLE>
<BODY>
<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>
</BODY>
</HTML>

In a nutshell, that is all there is to finding sub-strings with MORegularExpression. The last method in the interface, subexpressionsForString, is there for backwards compatibility and I'm not even going to explain it.

There is one tricky thing about using MORegularExpression in a large amount of text. What happens if you want to find every link in an HTML page? substringForSubexpressionAtIndex is only going to return the first occurrence in the string. Turns out there is no way to say, start searching at character n in the candidate string. What I did was truncate the string after each search to find the next one. Here's my code to find all of the links and their URL in an HTML page.

Listing 3: Finding all of the links.

-(void)handleHTML:(NSString*)inHTML
{
   MORegularExpression*   bothExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<A HREF=(.*?)>(.*?)</A>" 
                        ignoreCase:YES];
   
   MORegularExpression*   startStopExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<HTML>(.*?)</HTML>"];
   NSString*            result = nil;
   NSRange               range;
   NSString*            curString = [startStopExp 
                        substringForSubexpressionAtIndex:1
                        inString:inHTML];
   
   do 
   {
      range = [bothExp rangeForSubexpressionAtIndex:0
                     inString:curString ];
      if ( range.length > 0 )
         {
         NSString*   URLString;
         NSString*   linkString;
         NSURL*      fullURL;
         
         result = [linkURLExp 
                        substringForSubexpressionAtIndex:0
                        inString:curString];
         URLString = [bothExp 
                        substringForSubexpressionAtIndex:1
                        inString:curString];
         fullURL = [NSURL URLWithString:URLString 
                        relativeToURL:baseURL];
         URLString = [fullURL absoluteString];
         
         linkString = [bothExp
                        substringForSubexpressionAtIndex:2
                        inString:curString];
         if ( linkString == nil || 
               URLString == nil || 
               ([linkString length]== 0) || 
                     ([URLString length]== 0) )
            {} else 
            {
            [self addURL:URLString withText:linkString];
            }
         curString = [curString substringFromIndex:
                        (range.location + range.length)];
         }
   }
   while ([curString length] > 0 && 
               range.location != NSNotFound );
}

A little explanation. The method is in a class that has a method addURL. The class also keeps two arrays, one for URLs and one for the link text. When you call addURL the URL and the link string are added to the arrays for future reference. The class also knows what the URL of the page you are parsing is, and saves it in a variable called baseURL.

The first thing the method does is set up our regular expression for links. Then it makes a new string that will contain only the text between the <HTML> tag. You can use this to limit the search to just a certain part of the page. Then it sets up a loop, which will always execute once and will end when we don't get anything back from our search, or we run out of HTML to parse. Inside the loop we first try to find our expression's range in the HTML. If it isn't there, were done. If we find something, then we use our expression to get the sub-string for the URL. Some times a URL will be relative, so we use NSURL with the page's URL to create a full URL. Then we ask for the second index, which is the link text. If we get both, we add it to our list.

If we find something, then we need to search from the end of the string we found. So we create a sub-string from our current HTML string, that starts at the end of what we found and ends at the end of the current string. This effectively chops off everything from the beginning of the string to the end of what we just found. Then we loop.

Hopefully you've seen the coolness of regular expressions and want to use them in your Cocoa apps. MOKit makes this easy and is easy to use. So go to Mike Ferris' website and download it and add regular expressions to your app.

Bibliography

Mastering Regular Expressions, Jeffrey E. F. Friedl,

http://www.ora.com/catalog/regex2/

Using Regular Expressions, Stephen Ramsay,

http://etext.lib.virginia.edu/helpsheets/regex.html

Regular Expressions specification,

http://www.opengroup.org/onlinepubs/007908799/xbd/re.html

A Tao of Regular Expressions, http://sitescooper.org/tao_regexps.html

BBEdit Grep Tutorial, http://www.anybrowser.org/bbedit/grep.shtml


Ron Davis is a long time Mac programmer, having worked on everything from Virex Anti-Virus to CodeWarrior. His day job is working for Alsoft, and his evening job is R.A.D. Productions, makers of Suck It Down and FinderEye.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Capture One 15.3.1 - RAW workflow softwa...
Capture One is a professional RAW converter offering you ultimate image quality with accurate colors and incredible detail from more than 400 high-end cameras - straight out of the box. It offers... Read more
Connect Fonts 23.0.3 - Font management s...
Connect Fonts is the creative professional's font manager. Every professional font manager should deliver the basics: spectacular previews, powerful search tools, and efficient font organization. You... Read more
CleanMyMac X 4.11.0 - Delete files that...
CleanMyMac X makes space for the things you love. Sporting a range of ingenious new features, CleanMyMac lets you safely and intelligently scan and clean your entire system, delete large, unused... Read more
Firefox 102.0 - Fast, safe Web browser.
Firefox offers a fast, safe Web browsing experience. Browse quickly, securely, and effortlessly. With its industry-leading features, Firefox is the choice of Web development professionals and casual... Read more
Hopper Disassembler 5.6.1 - Binary disas...
Hopper Disassembler is a binary disassembler, decompiler, and debugger for 32- and 64-bit executables. It will let you disassemble any binary you want, and provide you all the information about its... Read more
Skim 1.6.11 - PDF reader and note-taker...
Skim is a PDF reader and note-taker for OS X. It is designed to help you read and annotate scientific papers in PDF, but is also great for viewing any PDF file. Skim includes many features and has a... Read more
Alfred 4.6.7 - Quick launcher for apps a...
Alfred is an award-winning productivity application for OS X. Alfred saves you time when you search for files online or on your Mac. Be more productive with hotkeys, keywords, and file actions at... Read more
Transmit 5.8.7 - Excellent FTP/SFTP clie...
Transmit is an excellent FTP (file transfer protocol), SFTP, S3 (Amazon.com file hosting) and iDisk/WebDAV client that allows you to upload, download, and delete files over the internet. With the... Read more
Adobe Lightroom Classic 11.4.1 - Import,...
You can download Lightroom for Mac as a part of Creative Cloud for only $9.99/month with Photoshop, included as part of the photography package. The latest version of Lightroom gives you all of the... Read more
MarsEdit 4.5.9 - Quick and convenient bl...
MarsEdit is a blog editor for OS X that makes editing your blog like writing email, with spell-checking, drafts, multiple windows, and even AppleScript support. It works with with most blog services... Read more

Latest Forum Discussions

See All

Apple Arcade Weekly Round-Up: Major Upda...
Apple recently revealed July’s upcoming Apple Arcade releases in a new App Store Story, and this week’s new release is My Bowling 3D+ featuring offline and online multiplayer support, and more. It arrives from the developers of Pro Darts 2022+ and... | Read more »
Downhill Mountain Biking Game ‘Descender...
Just over three years ago in May of 2019 developer RageSquid and publisher No More Robots released a quirky downhill mountain biking game called Descenders on PC and Xbox One. Bemoaning a lack of “extreme sports" titles in recent years led RageSquid... | Read more »
SwitchArcade Round-Up: ‘Monster Hunter R...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for June 30th, 2022. Thursday is once more upon us, and that means a bunch of new releases to look at. We start things off with DLC for some very big games, Monster Hunter Rise and... | Read more »
‘HOOK 2’ Review – A Sharp Left Hook From...
The original HOOK ($1.99) had a very simple idea behind it. You were presented with a tangled mess of hooks and loops, and you needed to remove each one without snagging any others. Extremely simple at first, but as the puzzles rolled along,... | Read more »
‘Dicey Dungeons’ Mobile Version Launchin...
After a very long wait, Terry Cavanagh’s dungeon crawling roguelite deckbuiler hybrid experience Dicey Dungeons is coming to mobile platforms next week alongside a huge free DLC pack on all platforms. This DLC will be included in the mobile... | Read more »
Distract Yourself With These Great Mobil...
Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links below... | Read more »
‘Danganronpa S: Ultimate Summer Camp’ is...
If you’ve been following Danganronp over the last few years, Spike Chunsoft celebrated its anniversary by bringing the series to mobile in the form of anniversary editions. After the first two released, there was a long delay for V3, but it finally... | Read more »
Out Now: ‘HOOK 2’, ‘Incoherence’, ‘Juras...
Each and every day new mobile games are hitting the App Store, and so each week we put together a big old list of all the best new releases of the past seven days. Back in the day the App Store would showcase the same games for a week, and then... | Read more »
Upcoming Mobile MMO RPG Shooter ‘Avatar:...
This past January a contingent of developers made up of Archosaur Games, Tencent, Lightstorm Entertainment, and Disney announced a new mobile game set in James Cameron’s Avatar universe titled Avatar: Reckoning. | Read more »
Culinary Platformer ‘Chefy-Chef’ Coming...
If your name is Chefy, it’s pretty much a given that you should be a chef. Such is the case with Chefy-Chef, a game from Bug Studio about a chef named Chefy who must travel to all sorts of exotic locations using a magical refrigerator in an effort... | Read more »

Price Scanner via MacPrices.net

July 4th sale at Verizon: Apple AirPods Pro f...
Verizon has Apple AirPods Pro on sale for $179.99 on their online store as part of their Fourth of July sale. Their price is $70 (28%) off Apple’s MSRP, and it’s among the lowest prices currently... Read more
Apple is now selling Certified Refurbished Ma...
Apple has added a full line of standard-configuration Mac Studios available in their Certified Refurbished section starting at only $1799 and ranging up to $400 off MSRP. Each Mac Studio comes with... Read more
Open-box 14″ M1 Pro MacBook Pros in stock tod...
QuickShip Electronics has open-box return Space Gray 14″ M1 Pro MacBook Pros in stock and on sale for $300-$450 off MSRP on their eBay store today. According to QuickShip, “The item in this listing... Read more
Can Being An iPhone User Really Determine Whe...
FEATURE: – If you’re traveling on the road today for the July 4th holiday, you might want to keep your Apple smartphone locked up inside the car’s glove compartment for your (and, everyone else’s)... Read more
2nd generation 4K Apple TVs with Siri remote...
Apple has restocked a full line of Certified Refurbished 2nd generation 32GB and 64GB 4K Apple TVs with Siri remotes for $30 off the cost of new models. Apple’s standard one-year warranty is included... Read more
Back in stock: Apple Watch Series 7 models fo...
Apple has restocked Certified Refurbished Apple Watch Series 7 WiFi-only models in their online store for $60-$70 off MSRP, starting at $339. Each Watch includes Apple’s standard one-year warranty, a... Read more
July 4th Sale at Expercom: $200 off any 16″ M...
Apple reseller Expercom has 16″ M1 Pro and M1 Max MacBook Pros available for $200 off MSRP as part of their July 4th sale. In addition to their MacBook Pro sale prices, take $50 off AppleCare+ when... Read more
10.2″ Apple iPads (WiFi models) are on sale f...
Amazon has Apple’s 9th generation 10.2″ WiFi iPads on sale for up to $20-$50 off MSRP for a limited time. Their prices are the lowest price currently available for one of these iPads. All models are... Read more
10-Core M1 Pro 14″ MacBook Pros on sale for $...
B&H Photo is offering $200 discounts on Apple’s new 14″ M1 Pro MacBook Pros with 10-Core CPUs (16GB RAM/1TB SSDs). Free 1-2 day shipping is available to most US addresses, and both models are in... Read more
B&H has 16-inch M1 Pro MacBook Pros in st...
New Space Gray 16″ MacBook Pros with Apple’s M1 Pro CPUs are in stock and on sale today at B&H Photo for $200 off Apple’s MSRP. Sale prices are for M1 Pro models with 512GB or 1TB of SSD storage... Read more

Jobs Board

VP, Software Engineering - *Apple* and Andr...
…Client Application Software Engineering team is seeking a VP, Software Engineering for Apple and Android. You will lead the client engineering team building Disney+, Read more
I/S Senior Engineer - *Apple* Systems Engin...
**19647BR** **Position Title:** I/S Senior Engineer - Apple Systems Engineering - Remote **Department:** Information Systems **Location:** Lakeland, FL between Read more
*Apple* IT Support Analyst - 2nd Shift - Zon...
Apple IT Support Analyst - 2nd Shift Professional Services Albany, New York Malta, New York Clifton Park, New York Menands, New York Syracuse, New York Watertown, Read more
Infotainment Certification Test Engineer (XC)...
…integration - CarPlay, android auto, MirrorLink, Baidu Carlife, MFi/iPod certification testing; Apple PPID preparation, Google HUCD and GTM preparation + 3 years of Read more
Workplace Services *Apple* Device Managemen...
…3350 Riverwood Parkway Suite 900, Atlanta, GA, 30339 USA **Workplace Services Apple Device Management** **Role Overview** Carrier is seeking an experienced and Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.