TweetFollow Us on Twitter

Adding Regular Expressions To Your Cocoa Application.

Volume Number: 19 (2003)
Issue Number: 4
Column Tag: Cocoa Development

Adding Regular Expressions To Your Cocoa Application.

Using MOKit to add the ability to match regular expressions in Cocoa.

by Ron Davis

Does your application need to parse data out of a bunch of text, or match strings that can vary some, but have a regular syntax? Do you have a Find command in your text editor? If you do you need to add regular expression matching to your app. Regular Expressions are textual representations of strings match pattern. They go beyond just finding a string and let you do things like find a string that begins and ends with certain characters, but can have anything in the middle. Or a string that contain four numbers followed by a letter.

I've been around the Mac a long time and never really thought about grep or regex or other commands that use regular expressions. But OSX changes that. Every UNIX geek out there knows about grep and it various offspring. Scripting languages like Perl use regular expressions as well, so I thought I needed to learn about them. Once I did I was hooked, and wanted to use them in my own applications. That lead me to Mike Ferris' MOKit, a Cocoa framework that lets you easily deal with regular expressions in your application.

Introduction to Regular Expressions

We'll start with a quick look at regular expression syntax for those of you who have no idea what I'm talking about. The introduction will be fast and shallow. If you need more information check out the URL in the Bibliography at the end of the article.

  
Symbol         Meaning                       Example
character      The character typed,          A is a, b is b, etc.
               with the exception of 
               special characters.

[character -   Any of a range of .           [a-d] = a,b,c, or d.
 character]    characters
 
.              Period matches any one 
               character, except line 
               breaks.
               
#              Matches any digit.            0,1,2,3,4,5,6,7,8,9 
   
\r             return

\t             tab 

\              The escape character like     \. matches a period. 
               in printf. Putting a slash    \\ matches a slash.
               in front of a special 
               character allows that 
               character to be matched.
               
?              0 or 1 of the previous .      ca?t, matches cat, or ct,
               characters                    but not caat.

*              0 or more of the              ca*t, matches ct, cat, 
               previous characters           caat, caaat.
               
+              1 or more of the              ca+t, matches cat, caat,
               previous characters           caaat, but not ct.
               
^              any character but the         (^r23) any character 
               ones after the carat.         but r, 2, or 3.
   
pattern |      match pattern or pattern.     ca|t, matches ca or t, 
pattern                                      but not cat.

(pattern)      Matching: treats what is      (ca)*t, matches cat, or 
               in the parenthesis as a       cacat, but not ct.
               single character.   c(*?)t,   on string coat, 
               Searching: delineates the     returns "oa".
               information to be 
               remembered in a find.

The last pattern there gives you a hint that regular expression can be used in two different ways. One way is matching, where you have a string and you want to know if it is equal to a regular expression. This returns a Boolean value, either the string matches or it doesn't. The other way to use regular expressions is to find a substring or strings in a longer string. When you do this you give an expression and you specify what part of the matched string you want back by placing that part in parentheses.

Let's look at an example or two. Say you let the user input a seven digit zip code and you want to make sure they didn't put any letters in there. You could get their input string and compare it against the regular expression "#+", which matches 1 or more digits, but wouldn't match an empty string, nor one with letters in it.

Now say you have an HTML tag for a link like <A HREF=http://www.radproductions.net/>RAD productions</A> and you wanted to pull out the URL. You could search with the regular expression "=(.*?)>" and you would get back http://www.radproductions.net. You may wonder why the ? is there. If you just put ".*", which means match 0 or more characters, you get to the end of the string because quotes and brackets are characters too. This is called a greedy search. Putting the ? tells it to only search until it finds the next part of the expression string.

MOKit

MOKit is a Cocoa framework written by Mike Ferris. It contains some text manipulation classes, one of which handles regular expressions. The underlying regular expression engine is actually a standard package written by Henry Spencer and used in one form or another by a lot of interesting things such as tcl and perl. MOKit classes are "not public domain, but they are free" according to the web page. The code can be downloaded at http://www.lorax.com/FreeStuff/MOKit.html. You can get both compiled frameworks and the source to MOKit. Version 2.6 was used for this article.

MOKit has two main parts, classes for text completion and classes for regular expressions. We'll only be talking about the regular expression classes here. These classes are MORegularExpression and MORegexFormatter. MORegularExpression is the main class for handling the evaluation of regular expressions. It is the one we'll use in our sample code. Here's its declaration.

Listing 1: MORegularExpression interface.

@interface MORegularExpression : NSObject <NSCopying, NSCoding> {
  @private
    NSString *_expressionString;
    NSString *_lastMatch;
    NSRange _lastSubexpressionRanges 
                           [MO_REGEXP_MAX_SUBEXPRESSIONS];
    void *_compiledExpression;
    BOOL _ignoreCase;
}
+ (BOOL)validExpressionString:(NSString *)expressionString;
+ (id)regularExpressionWithString:(NSString *)
               expressionString ignoreCase:(BOOL)ignoreCaseFlag;
+ (id)regularExpressionWithString:(NSString *)
               expressionString;
- (id)initWithExpressionString:(NSString *)expressionString
                ignoreCase:(BOOL)ignoreCaseFlag;
    
- (id)initWithExpressionString:(NSString *)
               expressionString;
- (NSString *)expressionString;
- (BOOL)matchesString:(NSString *)candidate;
- (NSRange)rangeForSubexpressionAtIndex:(unsigned)index
                inString:(NSString *)candidate;
- (NSString *)substringForSubexpressionAtIndex:
               (unsigned)index inString:(NSString *)candidate;
- (NSArray *)subexpressionsForString:(NSString *)candidate;
@end

As you can see, it is a fairly simple class. To use a regular expression in your code you create an instance of this class. If you need to keep it around, using the initWithExpressionString methods will probably be easiest. If you're just going to use it in the scope of a single method, use the class methods regularExpressionWithString, so you won't have to deal with releasing. Both of these methods have twins that take an ignoreCase parameter which, if set to YES, will cause evaluations to ignore the case of the characters in the expression and the search string. If you don't explicitly set case sensitivity then searches are case sensitive. Here's an example of how to create an expression for finding HREFs in a string of HTML:

MORegularExpression*   linkURLExp = [MORegularExpression regularExpressionWithString: 
                                    @"<A HREF=.*?</A>" ignoreCase:YES];

If you want to make sure the expression you create is valid you can call the class method validExpressionString, which will return YES if the expression is a valid regular expression. If you want to know what an MORegularExpression object's expression is you can get it from the expressionString accessor.

Now we can actually do some evaluations. As I said previously, there are two ways to use regular expressions, to match a string and to find a sub-string. If you have a string and you want to make sure it conforms to the regular expression you created, you can pass it into matchesString and the result will tell you if it matches. This is what MORegexFormatter does. It is a formatter you can add to a field and it will validate the value in that field by the regular expression you give it.

Getting sub-expressions is interesting. If you just want to find the location in the target string of a sub-string, you can use the rangeForSubexpressionAtIndex method. If you want the whole sub-string back as a new NSString* you use the substringForSubexpressionAtIndex, passing the string you are searching for in the inString parameter. The index is which value in parentheses you want back. There can be 0 to 20 sets of parentheses in a MOKit expression, and the index indicates which one you want the range for. So you could create an expression like "<A HREF=(.*?)>(.*?)</A>" to search for a link in an HTML page. If we used the HTML in Listing 2, and you asked for index 0 you would get the whole HREF tag: "<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>". If you asked for index 1, you'd get the link back "http://www.radproductions.net/". If you asked for index 2, you'd get back the text "R.A.D. Productions".

Listing 2: Sample HTML

<HTML>
<TITLE>R.A.D. Productions Home Page</TITLE>
<BODY>
<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>
</BODY>
</HTML>

In a nutshell, that is all there is to finding sub-strings with MORegularExpression. The last method in the interface, subexpressionsForString, is there for backwards compatibility and I'm not even going to explain it.

There is one tricky thing about using MORegularExpression in a large amount of text. What happens if you want to find every link in an HTML page? substringForSubexpressionAtIndex is only going to return the first occurrence in the string. Turns out there is no way to say, start searching at character n in the candidate string. What I did was truncate the string after each search to find the next one. Here's my code to find all of the links and their URL in an HTML page.

Listing 3: Finding all of the links.

-(void)handleHTML:(NSString*)inHTML
{
   MORegularExpression*   bothExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<A HREF=(.*?)>(.*?)</A>" 
                        ignoreCase:YES];
   
   MORegularExpression*   startStopExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<HTML>(.*?)</HTML>"];
   NSString*            result = nil;
   NSRange               range;
   NSString*            curString = [startStopExp 
                        substringForSubexpressionAtIndex:1
                        inString:inHTML];
   
   do 
   {
      range = [bothExp rangeForSubexpressionAtIndex:0
                     inString:curString ];
      if ( range.length > 0 )
         {
         NSString*   URLString;
         NSString*   linkString;
         NSURL*      fullURL;
         
         result = [linkURLExp 
                        substringForSubexpressionAtIndex:0
                        inString:curString];
         URLString = [bothExp 
                        substringForSubexpressionAtIndex:1
                        inString:curString];
         fullURL = [NSURL URLWithString:URLString 
                        relativeToURL:baseURL];
         URLString = [fullURL absoluteString];
         
         linkString = [bothExp
                        substringForSubexpressionAtIndex:2
                        inString:curString];
         if ( linkString == nil || 
               URLString == nil || 
               ([linkString length]== 0) || 
                     ([URLString length]== 0) )
            {} else 
            {
            [self addURL:URLString withText:linkString];
            }
         curString = [curString substringFromIndex:
                        (range.location + range.length)];
         }
   }
   while ([curString length] > 0 && 
               range.location != NSNotFound );
}

A little explanation. The method is in a class that has a method addURL. The class also keeps two arrays, one for URLs and one for the link text. When you call addURL the URL and the link string are added to the arrays for future reference. The class also knows what the URL of the page you are parsing is, and saves it in a variable called baseURL.

The first thing the method does is set up our regular expression for links. Then it makes a new string that will contain only the text between the <HTML> tag. You can use this to limit the search to just a certain part of the page. Then it sets up a loop, which will always execute once and will end when we don't get anything back from our search, or we run out of HTML to parse. Inside the loop we first try to find our expression's range in the HTML. If it isn't there, were done. If we find something, then we use our expression to get the sub-string for the URL. Some times a URL will be relative, so we use NSURL with the page's URL to create a full URL. Then we ask for the second index, which is the link text. If we get both, we add it to our list.

If we find something, then we need to search from the end of the string we found. So we create a sub-string from our current HTML string, that starts at the end of what we found and ends at the end of the current string. This effectively chops off everything from the beginning of the string to the end of what we just found. Then we loop.

Hopefully you've seen the coolness of regular expressions and want to use them in your Cocoa apps. MOKit makes this easy and is easy to use. So go to Mike Ferris' website and download it and add regular expressions to your app.

Bibliography

Mastering Regular Expressions, Jeffrey E. F. Friedl,

http://www.ora.com/catalog/regex2/

Using Regular Expressions, Stephen Ramsay,

http://etext.lib.virginia.edu/helpsheets/regex.html

Regular Expressions specification,

http://www.opengroup.org/onlinepubs/007908799/xbd/re.html

A Tao of Regular Expressions, http://sitescooper.org/tao_regexps.html

BBEdit Grep Tutorial, http://www.anybrowser.org/bbedit/grep.shtml


Ron Davis is a long time Mac programmer, having worked on everything from Virex Anti-Virus to CodeWarrior. His day job is working for Alsoft, and his evening job is R.A.D. Productions, makers of Suck It Down and FinderEye.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Typinator 9.1 - Speedy and reliable text...
Typinator turbo-charges your typing productivity. Type a little. Typinator does the rest. We've all faced projects that require repetitive typing tasks. With Typinator, you can store commonly used... Read more
ESET Cyber Security 6.11.414.0 - Basic i...
ESET Cyber Security provides powerful protection against phishing, viruses, worms, and spyware. Offering similar functionality to ESET NOD32 Antivirus for Windows, ESET Cyber Security for Mac allows... Read more
Opera 105.0.4970.29 - High-performance W...
Opera is a fast and secure browser trusted by millions of users. With the intuitive interface, Speed Dial and visual bookmarks for organizing favorite sites, news feature with fresh, relevant content... Read more
Slack 4.35.131 - Collaborative communica...
Slack brings team communication and collaboration into one place so you can get more work done, whether you belong to a large enterprise or a small business. Check off your to-do list and move your... Read more
Viber 21.5.0 - Send messages and make fr...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device, so... Read more
Hazel 5.3 - Create rules for organizing...
Hazel is your personal housekeeper, organizing and cleaning folders based on rules you define. Hazel can also manage your trash and uninstall your applications. Organize your files using a familiar... Read more
Duet 3.15.0.0 - Use your iPad as an exte...
Duet is the first app that allows you to use your iDevice as an extra display for your Mac using the Lightning or 30-pin cable. Note: This app requires a iOS companion app. Release notes were... Read more
DiskCatalogMaker 9.0.3 - Catalog your di...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast Finder-like intuitive look and feel Super-fast search algorithm Can compress catalog data for... Read more
Maintenance 3.1.2 - System maintenance u...
Maintenance is a system maintenance and cleaning utility. It allows you to run miscellaneous tasks of system maintenance: Check the the structure of the disk Repair permissions Run periodic scripts... Read more
Final Cut Pro 10.7 - Professional video...
Redesigned from the ground up, Final Cut Pro combines revolutionary video editing with a powerful media organization and incredible performance to let you create at the speed of thought.... Read more

Latest Forum Discussions

See All

‘Sonic Dream Team’ Apple Arcade Review –...
What an unusual day we have arrived upon today. Now, Sonic the Hedgehog games aren’t a new thing for iOS gaming. The original Sonic the Hedgehog appeared on the classic iPod, so the Blue Blur got in the doors as fast as you would expect him to. The... | Read more »
PvP Basketball Game ‘NBA Infinite’ Annou...
Level Infinite and Lightspeed Studios just announced a new real-time PvP basketball game for mobile in the form of NBA Infinite (). NBA Infinite includes solo modes as well, collecting and upgrading current NBA players, managing teams, and more. It... | Read more »
New ‘Dysmantle’ iOS Update Adds Co-Op Mo...
We recently had a major update hit mobile for the open world survival and crafting adventure game Dysmantle ($4.99) from 10tons Ltd. Dysmantle was one of our favorite games of 2022, and with all of its paid DLC and updates, it is even better. | Read more »
PUBG Mobile pulls a marketing blinder wi...
Over the years, there have been a lot of different marketing gimmicks tried by companies and ambassadors, some of them land like Snoop Dog and his recent smoking misdirection, and some are just rather frustrating, let’s no lie. Tencent, however,... | Read more »
‘Goat Simulator 3’ Mobile Now Available...
Coffee Stain Publishing and Coffee Stain Malmo, the new mobile publishing studio have just released Goat Simulator 3 on iOS and Android as a premium release. Goat Simulator 3 debuted on PS5, Xbox Series X|S, and PC platforms. This is the second... | Read more »
‘Mini Motorways’ Huge Aurora Borealis Up...
Mini Motorways on Apple Arcade, Nintendo Switch, and Steam has gotten a huge update today with the Aurora Borealis patch bringing in Reykjavik, new achievements, challenges, iCloud improvements on Apple Arcade, and more. Mini Motorways remains one... | Read more »
Fan-Favorite Action RPG ‘Death’s Door’ i...
Last month Netflix revealed during their big Geeked Week event a number of new titles that would be heading to their Netflix Games service. Among them was Acid Nerve and Devolver Digital’s critically acclaimed action RPG Death’s Door, and without... | Read more »
SwitchArcade Round-Up: Reviews Featuring...
Hello gentle reader, and welcome to the SwitchArcade Round-Up for December 4th, 2023. I’ve been catching up on my work as much as possible lately, and that translates to a whopping six reviews for you to read today. The list includes Astlibra... | Read more »
‘Hi-Fi Rush’ Anniversary Interview: Dire...
Back in January, Tango Gameworks and Bethesda released one of my favorite games of all time with Hi-Fi Rush. As someone who adores character action and rhythm games, blending both together seemed like a perfect fit for my taste, but Hi-Fi Rush did... | Read more »
Best iPhone Game Updates: ‘Pizza Hero’,...
Hello everyone, and welcome to the week! It’s time once again for our look back at the noteworthy updates of the last seven days. Things are starting to chill out for the year, but we still have plenty of holiday updates ahead of us I’m sure. Some... | Read more »

Price Scanner via MacPrices.net

Apple is clearing out last year’s M1-powered...
Apple has Certified Refurbished 11″ M1 iPad Pros available starting at $639 and ranging up to $310 off Apple’s original MSRP. Each iPad Pro comes with Apple’s standard one-year warranty, features a... Read more
Save $50 on these HomePods available today at...
Apple has Certified Refurbished White and Midnight HomePods available for $249, Certified Refurbished. That’s $50 off MSRP and the lowest price currently available for a full-size Apple HomePod this... Read more
New 16-inch M3 Pro MacBook Pros are on sale f...
Holiday MacBook deals are live at B&H Photo. Apple 16″ MacBook Pros with M3 Pro CPUs are in stock and on sale for $200-$250 off MSRP. Their prices are among the lowest currently available for... Read more
Christmas Deal Alert! Apple AirPods Pro with...
Walmart has Apple’s 2023 AirPods Pro with USB-C in stock and on sale for $189.99 on their online store as part of their Holiday sale. Their price is $60 off MSRP, and it’s currently the lowest price... Read more
Apple has Certified Refurbished iPhone 12 Pro...
Apple has unlocked Certified Refurbished iPhone 12 Pro models in stock starting at $589 and ranging up to $350 off original MSRP. Apple includes a standard one-year warranty and new outer shell with... Read more
Holiday Sale: Take $50 off every 10th-generat...
Amazon has Apple’s 10th-generation iPads on sale for $50 off MSRP, starting at $399, as part of their Holiday Sale. Their discount applies to all models and all colors. With the discount, Amazon’s... Read more
The latest Mac mini Holiday sales, get one to...
Apple retailers are offering Apple’s M2 Mac minis for $100 off MSRP as part of their Holiday sales. Prices start at only $499. Here are the lowest prices available: (1): Amazon has Apple’s M2-powered... Read more
Save $300 on a 24-inch iMac with these Certif...
With the recent introduction of new M3-powered 24″ iMacs, Apple dropped prices on clearance M1 iMacs in their Certified Refurbished store. Models are available starting at $1049 and range up to $300... Read more
Apple M1-powered iPad Airs are back on Holida...
Amazon has 10.9″ M1 WiFi iPad Airs back on Holiday sale for $100 off Apple’s MSRP, with prices starting at $499. Each includes free shipping. Their prices are the lowest available among the Apple... Read more
Sunday Sale: Apple 14-inch M3 MacBook Pro on...
B&H Photo has new 14″ M3 MacBook Pros, in Space Gray, on Holiday sale for $150 off MSRP, only $1449. B&H offers free 1-2 day delivery to most US addresses: – 14″ 8-Core M3 MacBook Pro (8GB... Read more

Jobs Board

Mobile Platform Engineer ( *Apple* /AirWatch)...
…systems, installing and maintaining certificates, navigating multiple network segments and Apple /IOS devices, Mobile Device Management systems such as AirWatch, and Read more
Omnichannel Associate - *Apple* Blossom Mal...
Omnichannel Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Senior Product Manager - *Apple* - DISH Net...
…Responsibilities** We are seeking an ambitious, data-driven thinker to assist the Apple Product Development team as our Wireless Product division continues to grow Read more
Senior Product Manager - *Apple* - DISH Net...
…Responsibilities** We are seeking an ambitious, data-driven thinker to assist the Apple Product Development team as our Wireless Product division continues to grow Read more
Senior Software Engineer - *Apple* Fundamen...
…center of Microsoft's efforts to empower our users to do more. The Apple Fundamentals team focused on defining and improving the end-to-end developer experience in Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.