TweetFollow Us on Twitter

Adding Regular Expressions To Your Cocoa Application.

Volume Number: 19 (2003)
Issue Number: 4
Column Tag: Cocoa Development

Adding Regular Expressions To Your Cocoa Application.

Using MOKit to add the ability to match regular expressions in Cocoa.

by Ron Davis

Does your application need to parse data out of a bunch of text, or match strings that can vary some, but have a regular syntax? Do you have a Find command in your text editor? If you do you need to add regular expression matching to your app. Regular Expressions are textual representations of strings match pattern. They go beyond just finding a string and let you do things like find a string that begins and ends with certain characters, but can have anything in the middle. Or a string that contain four numbers followed by a letter.

I've been around the Mac a long time and never really thought about grep or regex or other commands that use regular expressions. But OSX changes that. Every UNIX geek out there knows about grep and it various offspring. Scripting languages like Perl use regular expressions as well, so I thought I needed to learn about them. Once I did I was hooked, and wanted to use them in my own applications. That lead me to Mike Ferris' MOKit, a Cocoa framework that lets you easily deal with regular expressions in your application.

Introduction to Regular Expressions

We'll start with a quick look at regular expression syntax for those of you who have no idea what I'm talking about. The introduction will be fast and shallow. If you need more information check out the URL in the Bibliography at the end of the article.

  
Symbol         Meaning                       Example
character      The character typed,          A is a, b is b, etc.
               with the exception of 
               special characters.

[character -   Any of a range of .           [a-d] = a,b,c, or d.
 character]    characters
 
.              Period matches any one 
               character, except line 
               breaks.
               
#              Matches any digit.            0,1,2,3,4,5,6,7,8,9 
   
\r             return

\t             tab 

\              The escape character like     \. matches a period. 
               in printf. Putting a slash    \\ matches a slash.
               in front of a special 
               character allows that 
               character to be matched.
               
?              0 or 1 of the previous .      ca?t, matches cat, or ct,
               characters                    but not caat.

*              0 or more of the              ca*t, matches ct, cat, 
               previous characters           caat, caaat.
               
+              1 or more of the              ca+t, matches cat, caat,
               previous characters           caaat, but not ct.
               
^              any character but the         (^r23) any character 
               ones after the carat.         but r, 2, or 3.
   
pattern |      match pattern or pattern.     ca|t, matches ca or t, 
pattern                                      but not cat.

(pattern)      Matching: treats what is      (ca)*t, matches cat, or 
               in the parenthesis as a       cacat, but not ct.
               single character.   c(*?)t,   on string coat, 
               Searching: delineates the     returns "oa".
               information to be 
               remembered in a find.

The last pattern there gives you a hint that regular expression can be used in two different ways. One way is matching, where you have a string and you want to know if it is equal to a regular expression. This returns a Boolean value, either the string matches or it doesn't. The other way to use regular expressions is to find a substring or strings in a longer string. When you do this you give an expression and you specify what part of the matched string you want back by placing that part in parentheses.

Let's look at an example or two. Say you let the user input a seven digit zip code and you want to make sure they didn't put any letters in there. You could get their input string and compare it against the regular expression "#+", which matches 1 or more digits, but wouldn't match an empty string, nor one with letters in it.

Now say you have an HTML tag for a link like <A HREF=http://www.radproductions.net/>RAD productions</A> and you wanted to pull out the URL. You could search with the regular expression "=(.*?)>" and you would get back http://www.radproductions.net. You may wonder why the ? is there. If you just put ".*", which means match 0 or more characters, you get to the end of the string because quotes and brackets are characters too. This is called a greedy search. Putting the ? tells it to only search until it finds the next part of the expression string.

MOKit

MOKit is a Cocoa framework written by Mike Ferris. It contains some text manipulation classes, one of which handles regular expressions. The underlying regular expression engine is actually a standard package written by Henry Spencer and used in one form or another by a lot of interesting things such as tcl and perl. MOKit classes are "not public domain, but they are free" according to the web page. The code can be downloaded at http://www.lorax.com/FreeStuff/MOKit.html. You can get both compiled frameworks and the source to MOKit. Version 2.6 was used for this article.

MOKit has two main parts, classes for text completion and classes for regular expressions. We'll only be talking about the regular expression classes here. These classes are MORegularExpression and MORegexFormatter. MORegularExpression is the main class for handling the evaluation of regular expressions. It is the one we'll use in our sample code. Here's its declaration.

Listing 1: MORegularExpression interface.

@interface MORegularExpression : NSObject <NSCopying, NSCoding> {
  @private
    NSString *_expressionString;
    NSString *_lastMatch;
    NSRange _lastSubexpressionRanges 
                           [MO_REGEXP_MAX_SUBEXPRESSIONS];
    void *_compiledExpression;
    BOOL _ignoreCase;
}
+ (BOOL)validExpressionString:(NSString *)expressionString;
+ (id)regularExpressionWithString:(NSString *)
               expressionString ignoreCase:(BOOL)ignoreCaseFlag;
+ (id)regularExpressionWithString:(NSString *)
               expressionString;
- (id)initWithExpressionString:(NSString *)expressionString
                ignoreCase:(BOOL)ignoreCaseFlag;
    
- (id)initWithExpressionString:(NSString *)
               expressionString;
- (NSString *)expressionString;
- (BOOL)matchesString:(NSString *)candidate;
- (NSRange)rangeForSubexpressionAtIndex:(unsigned)index
                inString:(NSString *)candidate;
- (NSString *)substringForSubexpressionAtIndex:
               (unsigned)index inString:(NSString *)candidate;
- (NSArray *)subexpressionsForString:(NSString *)candidate;
@end

As you can see, it is a fairly simple class. To use a regular expression in your code you create an instance of this class. If you need to keep it around, using the initWithExpressionString methods will probably be easiest. If you're just going to use it in the scope of a single method, use the class methods regularExpressionWithString, so you won't have to deal with releasing. Both of these methods have twins that take an ignoreCase parameter which, if set to YES, will cause evaluations to ignore the case of the characters in the expression and the search string. If you don't explicitly set case sensitivity then searches are case sensitive. Here's an example of how to create an expression for finding HREFs in a string of HTML:

MORegularExpression*   linkURLExp = [MORegularExpression regularExpressionWithString: 
                                    @"<A HREF=.*?</A>" ignoreCase:YES];

If you want to make sure the expression you create is valid you can call the class method validExpressionString, which will return YES if the expression is a valid regular expression. If you want to know what an MORegularExpression object's expression is you can get it from the expressionString accessor.

Now we can actually do some evaluations. As I said previously, there are two ways to use regular expressions, to match a string and to find a sub-string. If you have a string and you want to make sure it conforms to the regular expression you created, you can pass it into matchesString and the result will tell you if it matches. This is what MORegexFormatter does. It is a formatter you can add to a field and it will validate the value in that field by the regular expression you give it.

Getting sub-expressions is interesting. If you just want to find the location in the target string of a sub-string, you can use the rangeForSubexpressionAtIndex method. If you want the whole sub-string back as a new NSString* you use the substringForSubexpressionAtIndex, passing the string you are searching for in the inString parameter. The index is which value in parentheses you want back. There can be 0 to 20 sets of parentheses in a MOKit expression, and the index indicates which one you want the range for. So you could create an expression like "<A HREF=(.*?)>(.*?)</A>" to search for a link in an HTML page. If we used the HTML in Listing 2, and you asked for index 0 you would get the whole HREF tag: "<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>". If you asked for index 1, you'd get the link back "http://www.radproductions.net/". If you asked for index 2, you'd get back the text "R.A.D. Productions".

Listing 2: Sample HTML

<HTML>
<TITLE>R.A.D. Productions Home Page</TITLE>
<BODY>
<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>
</BODY>
</HTML>

In a nutshell, that is all there is to finding sub-strings with MORegularExpression. The last method in the interface, subexpressionsForString, is there for backwards compatibility and I'm not even going to explain it.

There is one tricky thing about using MORegularExpression in a large amount of text. What happens if you want to find every link in an HTML page? substringForSubexpressionAtIndex is only going to return the first occurrence in the string. Turns out there is no way to say, start searching at character n in the candidate string. What I did was truncate the string after each search to find the next one. Here's my code to find all of the links and their URL in an HTML page.

Listing 3: Finding all of the links.

-(void)handleHTML:(NSString*)inHTML
{
   MORegularExpression*   bothExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<A HREF=(.*?)>(.*?)</A>" 
                        ignoreCase:YES];
   
   MORegularExpression*   startStopExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<HTML>(.*?)</HTML>"];
   NSString*            result = nil;
   NSRange               range;
   NSString*            curString = [startStopExp 
                        substringForSubexpressionAtIndex:1
                        inString:inHTML];
   
   do 
   {
      range = [bothExp rangeForSubexpressionAtIndex:0
                     inString:curString ];
      if ( range.length > 0 )
         {
         NSString*   URLString;
         NSString*   linkString;
         NSURL*      fullURL;
         
         result = [linkURLExp 
                        substringForSubexpressionAtIndex:0
                        inString:curString];
         URLString = [bothExp 
                        substringForSubexpressionAtIndex:1
                        inString:curString];
         fullURL = [NSURL URLWithString:URLString 
                        relativeToURL:baseURL];
         URLString = [fullURL absoluteString];
         
         linkString = [bothExp
                        substringForSubexpressionAtIndex:2
                        inString:curString];
         if ( linkString == nil || 
               URLString == nil || 
               ([linkString length]== 0) || 
                     ([URLString length]== 0) )
            {} else 
            {
            [self addURL:URLString withText:linkString];
            }
         curString = [curString substringFromIndex:
                        (range.location + range.length)];
         }
   }
   while ([curString length] > 0 && 
               range.location != NSNotFound );
}

A little explanation. The method is in a class that has a method addURL. The class also keeps two arrays, one for URLs and one for the link text. When you call addURL the URL and the link string are added to the arrays for future reference. The class also knows what the URL of the page you are parsing is, and saves it in a variable called baseURL.

The first thing the method does is set up our regular expression for links. Then it makes a new string that will contain only the text between the <HTML> tag. You can use this to limit the search to just a certain part of the page. Then it sets up a loop, which will always execute once and will end when we don't get anything back from our search, or we run out of HTML to parse. Inside the loop we first try to find our expression's range in the HTML. If it isn't there, were done. If we find something, then we use our expression to get the sub-string for the URL. Some times a URL will be relative, so we use NSURL with the page's URL to create a full URL. Then we ask for the second index, which is the link text. If we get both, we add it to our list.

If we find something, then we need to search from the end of the string we found. So we create a sub-string from our current HTML string, that starts at the end of what we found and ends at the end of the current string. This effectively chops off everything from the beginning of the string to the end of what we just found. Then we loop.

Hopefully you've seen the coolness of regular expressions and want to use them in your Cocoa apps. MOKit makes this easy and is easy to use. So go to Mike Ferris' website and download it and add regular expressions to your app.

Bibliography

Mastering Regular Expressions, Jeffrey E. F. Friedl,

http://www.ora.com/catalog/regex2/

Using Regular Expressions, Stephen Ramsay,

http://etext.lib.virginia.edu/helpsheets/regex.html

Regular Expressions specification,

http://www.opengroup.org/onlinepubs/007908799/xbd/re.html

A Tao of Regular Expressions, http://sitescooper.org/tao_regexps.html

BBEdit Grep Tutorial, http://www.anybrowser.org/bbedit/grep.shtml


Ron Davis is a long time Mac programmer, having worked on everything from Virex Anti-Virus to CodeWarrior. His day job is working for Alsoft, and his evening job is R.A.D. Productions, makers of Suck It Down and FinderEye.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

FotoMagico 6.2.2 - Powerful slideshow cr...
FotoMagico lets you create professional slideshows from your photos and music with just a few, simple mouse clicks. It sports a very clean and intuitive yet powerful user interface. High image... Read more
Default Folder X 5.7 - Enhances Open and...
Default Folder X attaches a toolbar to the right side of the Open and Save dialogs in any OS X-native application. The toolbar gives you fast access to various folders and commands. You just click on... Read more
f.lux 42.1 - Adjusts the color of your d...
f.lux makes the color of your computer's display adapt to the time of day, warm at night and like sunlight during the day. Ever notice how people texting at night have that eerie blue glow? Or wake... Read more
Spotify 1.1.94.872 - Stream music, creat...
Spotify is a streaming music service that gives you on-demand access to millions of songs. Whether you like driving rock, silky R&B, or grandiose classical music, Spotify's massive catalogue puts... Read more
Vitamin-R 4.15 - Personal productivity t...
Vitamin-R creates the optimal conditions for your brain to work at its best by structuring your work into short bursts of distraction-free, highly focused activity alternating with opportunities for... Read more
OfficeTime 2.0.628 - Easy time and expen...
OfficeTime is time and expense tracking that is easy, elegant and focused. Other time keepers are clumsy or oversimplified. OfficeTime balances features and ease of use, allowing you to easily track... Read more
Slack 4.28.182 - Collaborative communica...
Slack brings team communication and collaboration into one place so you can get more work done, whether you belong to a large enterprise or a small business. Check off your to-do list and move your... Read more
DEVONthink Pro 3.8.6 - Knowledge base, i...
DEVONthink is DEVONtechnologies' document and information management solution. It supports a large variety of file formats and stores them in a database enhanced by artificial intelligence (AI). Many... Read more
FileMaker Pro 19.5.4 - Quickly build cus...
FileMaker Pro is the tool you use to create a custom app. You also use FileMaker Pro to access your app on a computer. Start by importing data from a spreadsheet or using a built-in Starter app to... Read more
Backblaze 8.5.0.628 - Online backup serv...
Backblaze is an online backup service designed from the ground-up for the Mac. With unlimited storage available for $6 per month, as well as a free 15-day trial, peace of mind is within reach with... Read more

Latest Forum Discussions

See All

SwitchArcade Round-Up: Reviews Featuring...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for September 26th, 2022. In today’s article, we kick off the week with a bang. And by “bang", I mean four reviews. Family Man, Radiant Silvergun, The Legend of Heroes: Trails from Zero... | Read more »
‘Romancing SaGa: Minstrel Song Remastere...
Following its showing at TGS 2022, Square Enix has released a new gameplay trailer for the previously announced remaster of the PS2 remake of the Super Famicom original (yes) Romancing SaGa game, Romancing SaGa: Minstrel Song Remastered. | Read more »
Gamabilis reveal release date for realis...
Realistic Sims are very fun experiences and give gamers an excellent chance to experience other walks of life, and Gamabilis has released its hyper-real farm management game Roots of Tomorrow. Whilst the more arcade-type games like Stardew Valley... | Read more »
Best iPhone Game Updates: ‘Streets of Ra...
Hello everyone, and welcome to the week! It’s time once again for our look back at the noteworthy updates of the last seven days. We’ve got a nice mix of Apple Arcade, free-to-play, and even a proper paid game. We don’t see those often! So yes, a... | Read more »
The House of Da Vinci 3 launches on Andr...
Following its earlier release on iOS this year, The House of Da Vinci 3 has also officially launched on Android devices. Blue Brain Games' 3D puzzle adventure boasts an average rating of 4.9/5 and will give players the much-awaited conclusion to... | Read more »
‘Oxenfree: Netflix Edition’ Is Out Now o...
Over the weekend, Netflix and Nightschool Studio announced and released Oxenfree: Netflix Edition (Free) worldwide on iOS and Android. This new version of Oxenfree: Netflix Edition is a separate release, and the prior version that I own, is no... | Read more »
‘Genshin Impact’ Version 3.1 Update Pre-...
Genshin Impact (Free) version 3.1 ‘King Deshret and the Three Magi’ goes live in a few days across iOS, Android, PC, PS5, and PS4. As with prior updates, pre-installation for the upcate has just gone live a few days before release. | Read more »
We’re Digging ‘Shovel Knight Dig’ – The...
We spend the bulk of this week’s podcast talking about the new iPhone 14. Specifically, the iPhone 14 Pro Max which both Eli and myself picked up. The consensus seems to be: They’re great! They’re iPhones! We do lay down our hot takes on all the new... | Read more »
TouchArcade Game of the Week: ‘Loose Noz...
There aren’t a lot of stories like that of the development of Loose Nozzles, and of those games that do have an interesting development story, even fewer are actually decent games to play. Loose Nozzles nails both, though. The way it was created is... | Read more »
SwitchArcade Round-Up: ‘Shovel Knight Di...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for September 23rd, 2022. In today’s article, we’ve got the rest of this week’s releases to look at. There are actually a few big games today, including the hot-hot-hot Shovel Knight Dig... | Read more »

Price Scanner via MacPrices.net

13-inch Apple MacBook Airs with M2 processors...
Amazon has 13″ MacBook Airs with M2 CPUs in stock today and on sale for $1099. Shipping is free. Their prices are $100 off Apple’s MSRP, and they are the lowest prices available for M2-powered Macs... Read more
AR Glasses That Work With Apple’s Hardware? T...
NEWS – Lenovo has created quite the spectacle(s) with its latest product. “Apple Glass” — the purported name of Apple’s forthcoming AR glasses — is not expected to be released until 2025 (at the... Read more
New today at Apple: 13-inch M2 MacBook Pros f...
Apple 13″ MacBook Pros with M2 CPUs in stock and available today starting at $1169, Certified Refurbished, and ranging up to $150 off original MSRP. These are the cheapest 13″ M2 MacBook Pros for... Read more
Sunday Sale: 13″ Apple M1 MacBook Air availab...
Amazon has Space Gray Apple 13″ M1 MacBook Airs on sale for $690.95 for an extremely limited time. Other models are on sale for $849. Their price for the Space Gray model is the cheapest we’ve ever... Read more
Use our exclusive Apple Price Trackers to fin...
Our Apple award-winning price trackers are the best place to look for the lowest prices and latest sales on all the latest Apple gear this season. Scan our price trackers for the latest information... Read more
New promo at Verizon: Get Apple Watch Series...
Purchase a new iPhone 14 at Verizon, and get an Apple Watch Series 8 for as low as $5 per month. $120 in promo credits for the Watch are spread over a 36 month term, reducing the price of the Watch... Read more
Visible drops prices on Apple iPhone 13 model...
Verizon’s low-cost wireless cell service, Visible has dropped prices on iPhone 13 models to new low prices starting at $599: – iPhone 13 Pro Max: starting at $980 + free $200 gift card – iPhone 13... Read more
Back in stock! 14″ MacBook Pros with Apple M1...
Amazon has restocked 14″ MacBook Pros M1 Pro CPUs for $400 off MSRP, starting at only $1599. Shipping is free. Be sure to make your purchase from Amazon rather than a third-party seller. Their prices... Read more
This is the final week to take advantage of A...
Apple’s Back to School promotion for 2022 ends on September 26, 2022. As part of this promotion, Apple will include a free $150 Apple Gift Card with the purchase of any MacBook Air, MacBook Pro, or... Read more
Mac Studio with M1 Max CPU back in stock toda...
Apple has the base standard-configuration Mac Studio available again in their Certified Refurbished section for $1799, and it’s in stock today. Each Mac Studio comes with Apple’s one-year warranty,... Read more

Jobs Board

Physician Assistant, Primary Care, *Apple*...
Physician Assistant, Primary Care, Apple Valley (1.07FTE) + Job ID: 65766 + Department: AV Primary Care + City: Apple Valley, MN + Location: HP - Apple Read more
Operations Manager - Mac/ *Apple* Engineerin...
…Responsible for the day-to-day activities relating to the engineering of Apple Macs in a complex, multi-platform environment. Demonstrates strong leadership, Read more
Lead Developer - *Apple* tvOS - Rumble (Uni...
…earnings, and positive sentiment About the role: We are looking for a Lead Apple tvOS Developer to join our application engineering team to expand our video centric Read more
Systems Administrator - *Apple* Devices / J...
…Administration **Duties and Responsibilities** + Configure and maintain the client's Apple Device Management (ADM) solution. The current solution is JAMF supporting Read more
Sr Product Manager, *Apple* TV Platforms -...
…an experienced senior product manager to drive the strategy and requirements for our Apple TV devices, acting as the champion and owner of the holistic experience in Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.