TweetFollow Us on Twitter

Adding Regular Expressions To Your Cocoa Application.

Volume Number: 19 (2003)
Issue Number: 4
Column Tag: Cocoa Development

Adding Regular Expressions To Your Cocoa Application.

Using MOKit to add the ability to match regular expressions in Cocoa.

by Ron Davis

Does your application need to parse data out of a bunch of text, or match strings that can vary some, but have a regular syntax? Do you have a Find command in your text editor? If you do you need to add regular expression matching to your app. Regular Expressions are textual representations of strings match pattern. They go beyond just finding a string and let you do things like find a string that begins and ends with certain characters, but can have anything in the middle. Or a string that contain four numbers followed by a letter.

I've been around the Mac a long time and never really thought about grep or regex or other commands that use regular expressions. But OSX changes that. Every UNIX geek out there knows about grep and it various offspring. Scripting languages like Perl use regular expressions as well, so I thought I needed to learn about them. Once I did I was hooked, and wanted to use them in my own applications. That lead me to Mike Ferris' MOKit, a Cocoa framework that lets you easily deal with regular expressions in your application.

Introduction to Regular Expressions

We'll start with a quick look at regular expression syntax for those of you who have no idea what I'm talking about. The introduction will be fast and shallow. If you need more information check out the URL in the Bibliography at the end of the article.

  
Symbol         Meaning                       Example
character      The character typed,          A is a, b is b, etc.
               with the exception of 
               special characters.

[character -   Any of a range of .           [a-d] = a,b,c, or d.
 character]    characters
 
.              Period matches any one 
               character, except line 
               breaks.
               
#              Matches any digit.            0,1,2,3,4,5,6,7,8,9 
   
\r             return

\t             tab 

\              The escape character like     \. matches a period. 
               in printf. Putting a slash    \\ matches a slash.
               in front of a special 
               character allows that 
               character to be matched.
               
?              0 or 1 of the previous .      ca?t, matches cat, or ct,
               characters                    but not caat.

*              0 or more of the              ca*t, matches ct, cat, 
               previous characters           caat, caaat.
               
+              1 or more of the              ca+t, matches cat, caat,
               previous characters           caaat, but not ct.
               
^              any character but the         (^r23) any character 
               ones after the carat.         but r, 2, or 3.
   
pattern |      match pattern or pattern.     ca|t, matches ca or t, 
pattern                                      but not cat.

(pattern)      Matching: treats what is      (ca)*t, matches cat, or 
               in the parenthesis as a       cacat, but not ct.
               single character.   c(*?)t,   on string coat, 
               Searching: delineates the     returns "oa".
               information to be 
               remembered in a find.

The last pattern there gives you a hint that regular expression can be used in two different ways. One way is matching, where you have a string and you want to know if it is equal to a regular expression. This returns a Boolean value, either the string matches or it doesn't. The other way to use regular expressions is to find a substring or strings in a longer string. When you do this you give an expression and you specify what part of the matched string you want back by placing that part in parentheses.

Let's look at an example or two. Say you let the user input a seven digit zip code and you want to make sure they didn't put any letters in there. You could get their input string and compare it against the regular expression "#+", which matches 1 or more digits, but wouldn't match an empty string, nor one with letters in it.

Now say you have an HTML tag for a link like <A HREF=http://www.radproductions.net/>RAD productions</A> and you wanted to pull out the URL. You could search with the regular expression "=(.*?)>" and you would get back http://www.radproductions.net. You may wonder why the ? is there. If you just put ".*", which means match 0 or more characters, you get to the end of the string because quotes and brackets are characters too. This is called a greedy search. Putting the ? tells it to only search until it finds the next part of the expression string.

MOKit

MOKit is a Cocoa framework written by Mike Ferris. It contains some text manipulation classes, one of which handles regular expressions. The underlying regular expression engine is actually a standard package written by Henry Spencer and used in one form or another by a lot of interesting things such as tcl and perl. MOKit classes are "not public domain, but they are free" according to the web page. The code can be downloaded at http://www.lorax.com/FreeStuff/MOKit.html. You can get both compiled frameworks and the source to MOKit. Version 2.6 was used for this article.

MOKit has two main parts, classes for text completion and classes for regular expressions. We'll only be talking about the regular expression classes here. These classes are MORegularExpression and MORegexFormatter. MORegularExpression is the main class for handling the evaluation of regular expressions. It is the one we'll use in our sample code. Here's its declaration.

Listing 1: MORegularExpression interface.

@interface MORegularExpression : NSObject <NSCopying, NSCoding> {
  @private
    NSString *_expressionString;
    NSString *_lastMatch;
    NSRange _lastSubexpressionRanges 
                           [MO_REGEXP_MAX_SUBEXPRESSIONS];
    void *_compiledExpression;
    BOOL _ignoreCase;
}
+ (BOOL)validExpressionString:(NSString *)expressionString;
+ (id)regularExpressionWithString:(NSString *)
               expressionString ignoreCase:(BOOL)ignoreCaseFlag;
+ (id)regularExpressionWithString:(NSString *)
               expressionString;
- (id)initWithExpressionString:(NSString *)expressionString
                ignoreCase:(BOOL)ignoreCaseFlag;
    
- (id)initWithExpressionString:(NSString *)
               expressionString;
- (NSString *)expressionString;
- (BOOL)matchesString:(NSString *)candidate;
- (NSRange)rangeForSubexpressionAtIndex:(unsigned)index
                inString:(NSString *)candidate;
- (NSString *)substringForSubexpressionAtIndex:
               (unsigned)index inString:(NSString *)candidate;
- (NSArray *)subexpressionsForString:(NSString *)candidate;
@end

As you can see, it is a fairly simple class. To use a regular expression in your code you create an instance of this class. If you need to keep it around, using the initWithExpressionString methods will probably be easiest. If you're just going to use it in the scope of a single method, use the class methods regularExpressionWithString, so you won't have to deal with releasing. Both of these methods have twins that take an ignoreCase parameter which, if set to YES, will cause evaluations to ignore the case of the characters in the expression and the search string. If you don't explicitly set case sensitivity then searches are case sensitive. Here's an example of how to create an expression for finding HREFs in a string of HTML:

MORegularExpression*   linkURLExp = [MORegularExpression regularExpressionWithString: 
                                    @"<A HREF=.*?</A>" ignoreCase:YES];

If you want to make sure the expression you create is valid you can call the class method validExpressionString, which will return YES if the expression is a valid regular expression. If you want to know what an MORegularExpression object's expression is you can get it from the expressionString accessor.

Now we can actually do some evaluations. As I said previously, there are two ways to use regular expressions, to match a string and to find a sub-string. If you have a string and you want to make sure it conforms to the regular expression you created, you can pass it into matchesString and the result will tell you if it matches. This is what MORegexFormatter does. It is a formatter you can add to a field and it will validate the value in that field by the regular expression you give it.

Getting sub-expressions is interesting. If you just want to find the location in the target string of a sub-string, you can use the rangeForSubexpressionAtIndex method. If you want the whole sub-string back as a new NSString* you use the substringForSubexpressionAtIndex, passing the string you are searching for in the inString parameter. The index is which value in parentheses you want back. There can be 0 to 20 sets of parentheses in a MOKit expression, and the index indicates which one you want the range for. So you could create an expression like "<A HREF=(.*?)>(.*?)</A>" to search for a link in an HTML page. If we used the HTML in Listing 2, and you asked for index 0 you would get the whole HREF tag: "<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>". If you asked for index 1, you'd get the link back "http://www.radproductions.net/". If you asked for index 2, you'd get back the text "R.A.D. Productions".

Listing 2: Sample HTML

<HTML>
<TITLE>R.A.D. Productions Home Page</TITLE>
<BODY>
<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>
</BODY>
</HTML>

In a nutshell, that is all there is to finding sub-strings with MORegularExpression. The last method in the interface, subexpressionsForString, is there for backwards compatibility and I'm not even going to explain it.

There is one tricky thing about using MORegularExpression in a large amount of text. What happens if you want to find every link in an HTML page? substringForSubexpressionAtIndex is only going to return the first occurrence in the string. Turns out there is no way to say, start searching at character n in the candidate string. What I did was truncate the string after each search to find the next one. Here's my code to find all of the links and their URL in an HTML page.

Listing 3: Finding all of the links.

-(void)handleHTML:(NSString*)inHTML
{
   MORegularExpression*   bothExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<A HREF=(.*?)>(.*?)</A>" 
                        ignoreCase:YES];
   
   MORegularExpression*   startStopExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<HTML>(.*?)</HTML>"];
   NSString*            result = nil;
   NSRange               range;
   NSString*            curString = [startStopExp 
                        substringForSubexpressionAtIndex:1
                        inString:inHTML];
   
   do 
   {
      range = [bothExp rangeForSubexpressionAtIndex:0
                     inString:curString ];
      if ( range.length > 0 )
         {
         NSString*   URLString;
         NSString*   linkString;
         NSURL*      fullURL;
         
         result = [linkURLExp 
                        substringForSubexpressionAtIndex:0
                        inString:curString];
         URLString = [bothExp 
                        substringForSubexpressionAtIndex:1
                        inString:curString];
         fullURL = [NSURL URLWithString:URLString 
                        relativeToURL:baseURL];
         URLString = [fullURL absoluteString];
         
         linkString = [bothExp
                        substringForSubexpressionAtIndex:2
                        inString:curString];
         if ( linkString == nil || 
               URLString == nil || 
               ([linkString length]== 0) || 
                     ([URLString length]== 0) )
            {} else 
            {
            [self addURL:URLString withText:linkString];
            }
         curString = [curString substringFromIndex:
                        (range.location + range.length)];
         }
   }
   while ([curString length] > 0 && 
               range.location != NSNotFound );
}

A little explanation. The method is in a class that has a method addURL. The class also keeps two arrays, one for URLs and one for the link text. When you call addURL the URL and the link string are added to the arrays for future reference. The class also knows what the URL of the page you are parsing is, and saves it in a variable called baseURL.

The first thing the method does is set up our regular expression for links. Then it makes a new string that will contain only the text between the <HTML> tag. You can use this to limit the search to just a certain part of the page. Then it sets up a loop, which will always execute once and will end when we don't get anything back from our search, or we run out of HTML to parse. Inside the loop we first try to find our expression's range in the HTML. If it isn't there, were done. If we find something, then we use our expression to get the sub-string for the URL. Some times a URL will be relative, so we use NSURL with the page's URL to create a full URL. Then we ask for the second index, which is the link text. If we get both, we add it to our list.

If we find something, then we need to search from the end of the string we found. So we create a sub-string from our current HTML string, that starts at the end of what we found and ends at the end of the current string. This effectively chops off everything from the beginning of the string to the end of what we just found. Then we loop.

Hopefully you've seen the coolness of regular expressions and want to use them in your Cocoa apps. MOKit makes this easy and is easy to use. So go to Mike Ferris' website and download it and add regular expressions to your app.

Bibliography

Mastering Regular Expressions, Jeffrey E. F. Friedl,

http://www.ora.com/catalog/regex2/

Using Regular Expressions, Stephen Ramsay,

http://etext.lib.virginia.edu/helpsheets/regex.html

Regular Expressions specification,

http://www.opengroup.org/onlinepubs/007908799/xbd/re.html

A Tao of Regular Expressions, http://sitescooper.org/tao_regexps.html

BBEdit Grep Tutorial, http://www.anybrowser.org/bbedit/grep.shtml


Ron Davis is a long time Mac programmer, having worked on everything from Virex Anti-Virus to CodeWarrior. His day job is working for Alsoft, and his evening job is R.A.D. Productions, makers of Suck It Down and FinderEye.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Day One 6.1 - Maintain a daily journal.
Day One is an easy, great-looking way to use a journal / diary / text-logging application. Day One is well designed and extremely focused to encourage you to write more through quick Menu Bar entry,... Read more
Vivaldi 3.7.2218.55 - An advanced browse...
Vivaldi is a browser for our friends. We live in our browsers. Choose one that has the features you need, a style that fits and values you can stand by. From the look and feel, to how you interact... Read more
Macs Fan Control 1.5.9 - Monitor and con...
Macs Fan Control allows you to monitor and control almost any aspect of your computer's fans, with support for controlling fan speed, temperature sensors pane, menu-bar icon, and autostart with... Read more
Dragon Dictate 6.0 - Premium voice-recog...
With Dragon Dictate speech recognition software, you can use your voice to create and edit text or interact with your favorite Mac applications. Far more than just speech-to-text, Dragon Dictate lets... Read more
OmniFocus 3.11.7 - GTD task manager with...
OmniFocus is an organizer app. It uses projects to organize tasks naturally, and then add tags to organize across projects. Easily enter tasks when you’re on the go, and process them when you have... Read more
rekordbox 6.5.1.0009 - Professional DJ m...
rekordbox is the best way of preparing and managing your tracks, be it at home, in the studio, or even on the plane! It allows you to import music from other music-management software using the... Read more
1Password 7.8.1 - Powerful password mana...
1Password is a password manager that uniquely brings you both security and convenience. It is the only program that provides anti-phishing protection and goes beyond password management by adding Web... Read more
Ableton Live 10.1.35 - Record music usin...
Ableton Live lets you create and record music on your Mac. Use digital instruments, pre-recorded sounds, and sampled loops to arrange, produce, and perform your music like never before. Ableton Live... Read more
Microsoft OneNote 16.48 - Free digital n...
OneNote is your very own digital notebook. With OneNote, you can capture that flash of genius, that moment of inspiration, or that list of errands that's too important to forget. Whether you're at... Read more
Microsoft Office 365, 2019 16.48 - Popul...
Microsoft Office 365. The essentials to get it all done. Unmistakably Office, designed for Mac Get started quickly with new, modern versions of Word, Excel, PowerPoint, Outlook and OneNote-... Read more

Latest Forum Discussions

See All

Pokemon Masters EX's latest update...
Two new Sync Pairs have made their way into Pokemon Masters today. Both pairs hail from the Alola region, Elio & Popplio and Selene & Rowlet. Their arrival coincides with an event called Trials on the Isle. [Read more] | Read more »
Shrouded Citadel: navigate your escape i...
Having been cooped up over the past 12 months due to winter and covid, Pifer is encouraging gamers to start enjoying the great outdoors again with its recently launched AR adventure epic, Shrouded Citadel. | Read more »
Moonlight Sculptor is an upcoming MMORPG...
Kakao Games and XL Games – who you might be familiar with from their previous game ArcheAge – have announced that their MMORPG Moonlight Sculptor is now available to pre-order for iOS and Android devices. Moonlight Sculptor has previously launched... | Read more »
MU Archangel is now open for pre-registr...
MU Archangel is now open for pre-registration in Southeast Asia following its massive success in other territories. Players from Singapore, Thailand, Malaysia, Indonesia, and the Philippines (except Vietnam) can now join in on the fun by applying... | Read more »
Compete, a new social media app you can...
Whoever told you you can’t get rich making videos has obviously never heard of Compete, Competitive Media Technologies Limited’s hot new social media app where you can rake in all the dough just by doing what you love. Video monetization that... | Read more »
Bethesda has released a new DOOM mobile...
Bethesda Softworks has released a new DOOM game out of the blue exclusively for mobile devices. It’s called Mighty DOOM and is currently only available as an early access title on Android but will be expanding to more users in the future. [Read... | Read more »
Anagraphs is a word puzzle game with a t...
Cinq-Mars Media has released its word puzzle game Anagraphs for iOS and Android devices. The game released last week after a short delay in getting it onto the appropriate platforms. [Read more] | Read more »
These are the top 5 best iPhone games li...
Fortnite has been the big hitter in mobile gaming this year, and it's not hard to see why. Thanks to some excellent marketing, and a polished experience that almost anyone can enjoy, it's really taken the App Store by storm. But there are other... | Read more »
The top 5 best iPhone games like Pokemon...
Pokemon GO is still the, if you'll excuse the pun, go-to game if you want some AR action on your phone. But it's not the only choice out there, and if you've got a hankering for something a bit different, then your eyes might already have started... | Read more »
The top 5 best iPhone games like Starcra...
Starcraft sits at the top of the RTS tree for a number of very good reasons. It also isn't on mobile, again, for a number of very good reasons. But that doesn't mean you can't find a way to indulge your sci-fi, competitive, massive, or engaging RTS... | Read more »

Price Scanner via MacPrices.net

Roundup of Today’s Best M1 Mac mini Prices an...
Apple resellers are offering discounts on new M1 Mac minis ranging up to $140 off MSRP this week, with prices starting at only $589. These are all the same Mac minis sold by Apple in their retail and... Read more
New at Verizon: Apple iPhone SE for free with...
Verizon is offering the 64GB Apple iPhone SE for free for customers opening a new line of service with a Verizon Unlimited plan. Offer is valid for a limited time. Price is credited monthly over a 24... Read more
B&H is offering clearance prices on lefto...
Apple reseller B&H Photo has clearance 2020 13″ 1.4GHz Intel-based MacBook Pros on sale today for $200-$300 off Apple’s original MSRP with prices starting at only $1099. Expedited shipping is... Read more
Roundup of Today’s Best MacBook Deals: M1 Mac...
Apple resellers are offering sale prices on Apple’s M1-powered 13″ MacBook Airs ranging up to $190 off MSRP. Here’s where to pick one up today, and as always, keep an eye on our 13″ MacBook Air Price... Read more
Apple AirPods Pro drop to new low price of on...
Amazon has Apple’s AirPods Pro on sale today for a new low price of only $197 shipped. That’s $52 off MSRP and the lowest price currently available for a set of AirPods Pro from any Apple reseller.... Read more
Apple restocks clearance 13″ Intel-based MacB...
Apple has clearance, Certified Refurbished, 2020 13″ Intel-based MacBook Airs available starting at only $809 and up to $280 off original MSRP. Each MacBook features a new outer case, comes with a... Read more
OWC drops prices on 2020 Intel multi-core Mac...
Other World Computing has clearance 2020 Intel-based Mac minis on sale starting at only $499. Both 4-core and 6-core models are in stock today. These are new, unopened, factory-sealed minis: – 3.6GHz... Read more
Save $50 off Apple’s 10.9″ iPad Air today at...
B&H Photo has new 10.9″ Apple iPad Airs in stock and on sale today for up to $50 off MSRP. Expedited shipping is free to most addresses in the US. Note that some sale prices may be restricted to... Read more
Rare Apple sale: Get a HomePod mini for $10 o...
Apple reseller Expercom has the Space Gray HomePod mini on sale today for $89 shipped. Their price is $10 off Apple’s MSRP, and it’s currently the only sale price available for a HomePod mini among... Read more
Apple has M1 Mac minis available starting at...
Apple has a full line of standard configuration M1 Mac minis available in their Certified Refurbished section starting at only $589 and up to $140 off MSRP. Each mini comes with Apple’s one-year... Read more

Jobs Board

Geek Squad Advanced Repair *Apple* Professi...
**802113BR** **Job Title:** Geek Squad Advanced Repair Apple Professional **Job Category:** Store Associates **Store Number or Department:** 000399-Wausau-Store Read more
*Apple* Mobility Specialist - Best Buy (Unit...
**802109BR** **Job Title:** Apple Mobility Specialist **Job Category:** Store Associates **Store Number or Department:** 001540-Tuscaloosa-Store **Job Description:** Read more
*Apple* /MAC Technician - TEKsystems (United...
Description: Looking to add an IT Technician proficient in troubleshooting Apple products, including workstations, iPads and laptops. A 2 or 4 year degree degree is Read more
*Apple* Valley 20hr Teller - Wells Fargo (Un...
…or scheduled + Ability to stand for extended periods of time **Street Address** **MN- Apple Valley:** 14325 Cedar Ave - Apple Valley, MN **Disclaimer** All offers Read more
Executive Team Leader Specialty Sales (Assist...
…Specialty Sales (Assistant Manager Merchandising and Service)- Apple Valley, CaliforniaApply NowJob ID:R0000129723job family:Store Managementschedule:Full Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.