TweetFollow Us on Twitter

Remote Search Services

Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Web Site Design

Remote Search Services for Web Sites

by Avi Rappoport

Adding search to your site - even if you don't own the server

What is Remote Site Searching?

When you want to add search to your site, you may be have some technical difficulties. Perhaps your site is hosted on a large server somewhere, or you have an uncooperative web administrator, or the challenges of adding a CGI are too daunting. Never fear! You can outsource your search to a remote site search service and let someone else worry about the gory details.

The indexer and search engine run on the remote server: they will use a web indexing robot, or spider, to follow links on your site and read the pages, then store every word in the index file on that server. When it comes time to search, the form on your local Web page send a message to the remote search engine. Although it's going through the Web, process doesn't change - it just has to move a little farther. The remote search engine takes the search terms, matches the words in the index, sorts them according to relevance, and creates an HTML page with the results. When a searcher clicks on the result link, they will see the page from your site, just as though the search came from there. It's easy and painless for practically everyone.

This review covers the range of remote search services, their features and their drawbacks. It will teach you to prepare your site, try indexing it, test the search, customize the results, keep the search up to date, and choose the right program for your long-term needs.

What you Get With Remote Search Services

  • No need for server access: Even if your site is hosted and you have FTP access only, you can run a search engine.
  • No need to learn CGIs or server systems: You never need to install any software, worry about version compatibility, or learn about permissions and paths (or paying someone else to do so).
  • Easy administration: The remote search service will provide a set of Web pages for administration, rather than making you learn about command lines or config files.
  • No load on your server: Search engines require significant resources, such as CPU time during researching and retrieval, as well as disk space. Outsourcing to a remote server moves the load away from you. In addition, these servers are usually in data centers with excellent connectivity and 24/7 administration.
  • Minimal initial investment: Instead of paying for a search engine up front, you can pay a small monthly fee. Some services are free, showing advertising with the search results.
  • Easy to switch: If you aren't happy with your search service, it's easy to switch to another.

The Tradeoffs

  • Advertising or continuing costs: You must pay every month or allow your searchers to see other people's advertising
  • Less control over the indexing: If your data changes frequently (hourly or daily), most of these services will not index that often.
  • Dependent on outside service: If the service's search engine gets busy, it may delay responses for your site, and there's not much you can do.
  • Less capacity: The remote search services have a page limit, usually somewhere between 200 and 5000 pages. While many can go higher than that, they can't handle hundreds of thousands of pages.
  • Fewer special features: Each search engine has its own special features, but you have more choices if you plan to run your own engine. For example, indexing password-protected areas, or word processing file formats, adding a thesaurus or a spellchecker, etc.
  • Intranet privacy: Intranets (internal networks using standard software) want to keep control of all their data, rather than allowing access external systems.
  • Multi-site indexing: Most remote services allow you to index just the sites you control. With a local search engine, you can index other sites and create a public search portal.

Remote Site Search Services Covered

The following services are covered in this review, and also have pages and examples on this site.

Atomz <http://www.atomz.com/>

  • free for 500 pages, fewer than 5,000 searches per month, (no ads, just a logo)
  • paid version: 250 pages & 2.5K searches @ $75 per year; 500 pages & 5K searches/month @ $150 per year; 1,000 pages & 10K searches/month @ $300 per year; 2,500 pages & 25 K searches/month @ $600 per year; 5,000 pages & 50K searches/month @ $1,200 per year

FreeFind <http://www.freefind.com/>

  • free (with advertising) can handle up to 32MG of HTML (flexible), will "sample" sites if they get large.

intraSearch (WhatUSeek) <http://www.whatUseek.com/intraSearch/>

  • free (with advertising) to at least 10,000 pages

MondoSearch (remote version) <http://www.mondosearch.com/>

  • paid version only: 1 - 1,000 pages: $144; to 5,000 pages: $585; to 10,000 pages: $990; above: contact sales@mondosoft.com
  • local server version also available

PicoSearch <http://www.picosearch.com/>

  • free (with advertising), to 5,000 pages
  • paid version: $6.99 per month (12 month commitment); $9.99 per month (3 month commitment)

PinPoint <http://pinpoint.netcreations.com/>

  • free (with advertising) to 5,000 pages

SearchButton <http://www.searchbutton.com/>

  • free (with advertising), for up to 5,000 pages, 30,000 searches per month
  • paid version: up to 1,000 pages: $300 per year; up to 5,000 pages: $600 per year (limit of 30,000 searches per month); for more pages, contact company

SiteMiner <http://www.siteminer.com/>

  • free (with advertising), to 10,000 + pages

Webinator (remote version) <http://www.thunderstone.com/texis/indexsite>

  • free (with Thunderstone logo), to 5,000 pages
  • local server version also available, can do thousands and millions of pages

Checking Links and Pages

Before you install any search engine with a indexing spider, you must make sure it can find the pages on your site. The good news is that cleaning up your links will improve your accessibility to the large public search engines (such as AltaVista, Google, HotBot and Infoseek), and make it easier for you to run an automated site mapper.

Robot Spider Compatibility

The indexing spiders follow links from a starting page, so use a home page if you have good text links, or a site map page.

Whole sites: Robots.txt

The first thing is to check the "robots.txt" file. This is a standard file for web servers that sits at the root of your site, and excludes robots that are not welcome on the site, or in certain specific directories (though this is voluntary). If you run your own server, you control this file: otherwise your host server administrator controls it.

You want to make sure that this file exists, and that it allows at least your indexing spider to access your directories. You may need to negotiate with your web hosting provider on this point, as this file must be stored in the root folder of the web host.

For more information on this topic, see Search Indexing Robots and Robots.txt: <http://www.searchtools.com/info/robots/robots-txt.html> and the WebMasters Guide to the Robots Exclusion Protocol at < http://info.webcrawler.com/mak/projects/robots/exclusion-admin.html>

Individual Pages: META ROBOTS tag

The other way that page designers can control robots and spiders is by using the META ROBOTS tags. These are particularly useful if you have a hosted site and don't want to bother your server administrator.

For example, if you have a directory listing or site map page, you can tell the spiders to follow the links but not index the text on the page by placing the following information into the HTML header: <meta name="robots" content="noindex,follow">. If you have pages with useful data but inappropriate links, such as a web calendar page with duplicate links to other calendar pages, use <meta name="robots" content="index,nofollow">.

For more information, see Search Indexing Robots and the Robots Meta Tags <http://www.searchtools.com/info/robots/robots-meta.html> and the Webmaster guide above.

Good Links and Bad Links

Indexing spiders tend to be pretty dumb. They know about the simple HREF links, but just get lost on anything more complex. Spiders and robots may not follow links in:

  • image maps (especially server-side image maps)
  • redirect and META Refresh tags
  • Framesets
  • DHTML layers
  • ActiveX controls
  • JavaScript menus and pages
  • Java pages and site maps
  • Flash or Shockwave (unless you use the AfterShock options to generate HTML text and links!)

Check Your Links

To give yourself a spider-eye view, try a text browser such as Lynx, or a graphical browser with images and JavaScript turned off, and no Plug-Ins: this will give you a good view of what the spiders see.

Don't rely on your content-management system to check local links: it knows too much about the structure of your site and the special formats you use!

To make sure all your local links work, run a link-checking robot such as Big Brother for Mac & Unix <http://pauillac.inria.fr/%7Efpottier/bb.html.en>, or use a service such as NetMechanic <http://www.netmechanic.com/>. If these services can follow the links, there's a good chance that your search indexing robot can do the same.

Solution: Supplement Complex Links

If you find you have problems, there are two ways around bad links: both require work, but they will make the indexing spiders happy.

  • Alternate Navigation: add alternate links in <NOSCRIPT> and <NOFRAMES> tags, lists of the links from image maps, simple alternate pages for DHTML and Java pages, etc. This should work for all kinds of robots and spiders.
  • Site Page Listing make a page or sitemap with links to every page on your site. This is hard to maintain and synchronize with your other changes. You can't use a site mapper application that uses a link-following robot, because it will have the same problems that the search engine spiders have.

Five for the Price of One

The good news is that all this work will pay off in five ways:

  1. Your search engine robot spider can find your pages
  2. The robot spiders for the webwide public search engines such as HotBot, Infoseek, AltaVista find your pages
  3. Robot-based link checker can check your links
  4. Robot-based site map creator can find your pages to make a map
  5. Your site is now accessible to blind and visually-disabled web surfers (as described in the W3C Web Accessibility Initiative), and those using text browsers such as PDAs.

Test the Index

Many of the search services require minimal commitment on your part. All you have to do is go to the service Web site, register with a user ID or email address and password, then give them the home page URL. The search service will send their indexing spider to follow links on your site very quickly, so try to do this during a quiet time.

Once you have signed up, you'll see all the setup and configuration options in the browser interface. Some are more elaborate than others: Atomz has a bunch of tabs and subpanes within the tabs, FreeFind has a nice Wizard interface. Webinator has a fairly elaborate mail-back access control: you must have an email address on the server to index that server.

If your server is slow, you are charged by the byte, or you have long files, choose a service that will do smart updating, and only get the contents of pages if they have changed.

If you have access to your web site log or monitor window, you can watch the spider as it follows links throughout your site. Otherwise, or in addition, choose a service that provides reports on the indexing process.

Try Searching Your Site

Remote search services provide almost-instant gratification: you can test them as soon as they're done with the indexing. Most of them have a test search form on their site: if not, copy their form to your local page and try it out.

Searching

There are two basic kinds of search queries: those which match pages on your site that contain every search term and those that match any search term, though they may not show you every matching page . A few will let searchers choose the best approach.


Figure 1. PicoSearch result showing all pages which have any of the search terms.


Figure 2. SiteMiner result for the same search, showing all three pages with all search terms.

If your site contains text from other languages, you need to watch out for letter matching issues. Some search engines can only match the 26 English characters, while others can match diacritical characters (such as î and á) and special characters (ø and ß). PicoSearch and MondoSearch also offer multilingual interfaces. Non-Roman scripts such as Arabic, Russian and Japanese are even harder, although PicoSearch offers results in Chinese.

Relevance Ranking

When you do a search, and the engine locates a set of pages that match your search, it has to sort them as best it can. This is particularly difficult with one and two word searches-it's hard to tell which is the most relevant page (the best match).

Like hairstyles and music, success in relevance ranking is a matter of taste. You should do a number of searches to see what you think of any search engine you choose. Try searches with just one word, others with two, and still others with four or five. This should give you a feeling for the kinds of relevance ranking that a search engine will do.

Search Forms

Search forms are the user interfaces to the search engine, so you can have several different forms, for your various needs.

  • Search Field: this is very small form with a text field and Search button: it can go on your front page or even in the navigation bar on every page.
  • Simple Search: a search form with an additional option or two, which may be on a search page with instructions and tips. Note that SiteMiner has a JavaScript search form, rather than a normal HTML form.


    Figure 3. MondoSearch Simple Search Form.

  • Advanced Search: lets the searcher have more control over the search, with options for date ranges or special zones. Only SearchButton, MondoSearch and Webinator include advanced search forms, though they have slightly quirky options.

Each of the site search services provides an HTML or JavaScript search form for you to copy and paste to a page on your site. All you have to do is put the form into a page (you don't even have to post the page on your site at first). When you, or a searcher, types text into the field and clicks on the Search button, the browser recognizes the ACTION attribute of the FORM tag connects to the search server, and sends the form items, including the hidden site ID, so the server can tell which site you mean to search.

When the remote search server gets the form command, it looks in the index, matches the search words, and organizes the results. The URL of the results page is that of the search service, not of your server, because that's where the results page is coming from, but the URLs for the found pages themselves include your server name.

Note: SiteMiner only has a JavaScript search box: site visitors without JavaScript must follow a link to their site for searching. This limits your audience and makes it hard for people with old browsers, PDAs and other new client hardware, and those with impaired vision using speaking browsers

Customizing Results Pages

Everyone is familiar with webwide public search engines and their lists of results. A local search results page is very similar, although for the best user experience, the search results page should look and feel like the rest of your site. If you are using a remote search service as a permanent part of your site, be sure that you choose a service that lets you customize the page design enough for your comfort level.

Simple Customization features

  • page color: let you select the page color, so it matches the rest of your site design
  • background: set the background graphic
  • text and link colors: keep the text, link, active link and visited link colors consistent with the rest of your site


Figure 4. FreeFind Results Page Options Wizard.

Page Design

Some services let you lay out your results page, including the page sections above, to the left, to the right, and below the results list. This allows you to include your normal navigation and site structure links, showing searchers more about the scope of your site. This usually includes fields for you to paste in your HTML code, and you will probably have to try this a couple of times to get the right relation ship with the results list, so this is only accessible to those who have some HTML tag experience.


Figure 5. Webinator field to insert page header in HTML.

Advertising

Several of the free site search services will display banner advertising on the search results, although none of the paid versions will do so. For many sites, it's a fair trade for searching services, but for others, such as libraries and public schools, advertising is inappropriate, so they should choose a version without advertising.


Figure 6. PinPoint default result page, showing banner advertising.

Search Result List Items

As with the results page, the list of pages which match the search is familiar. Some search engines let you customize the elements of the items on this list, which lets you match the layout to the data you have. For example, some sites have useful URLs which give some context to the page, while others are just confusing.

Other features may include

  • a ranking number or graphic indicator of how well the engine thinks the page matches the search terms
  • a file modification date (in two-digit US date format: 09/16/99, four-digit year-first format: 1999-09-16, or some other format)
  • a file size (best in K, Kilobytes, rather than bytes)


Figure 7. IntraSearch result showing items with URL, size and update date.

If you have carefully written META DESCRIPTION tag contents for each of your pages, so they'll rank well and look great in webwide search engines, you will probably want your site search to display them as well. Be sure to choose a remote search service that will show these.

Otherwise, some services do a good job of extracting useful text, while others just grab the text from the top of the page.


Figure 8. SearchButton result showing selected text extracted from pages.

Some services extract lines containing the search terms and/or highlight the words which match the search terms.


Figure 9. Atomz result showing items with top text and matching text.

Care and Feeding of Your Site Search

Although the remote search service is taking care of the server-side of things, you still have to keep track of the status, even if it's just to make sure it's still running, although these services have been fairly reliable so far. You should also perform test searches, some that you do every time, others that check new information on your site. And, as you change the layout and design of your site, make sure that the search form and results page reflect these changes.

Updating the Index

To keep your search index synchronized with the content on your site, you'll need to set up some kind of update schedule. If your site changes rarely, you can tell the service when to re-index. However, if your site changes more often, you will want to set up a scheduled update.

Watching the Searches

Analyzing your search log or report can teach you what your visitors are looking for - it's like having a free, automated market research survey. For example, if you have a movie site and everyone starts searching for the Blair Witch Project, you know it's hot, and can make sure you have good information so they don't go somewhere else.


Figure 10. SearchButton Report Options.

How to Choose a Remote Search Service

Read through the listings above, and try out the search engines in the SearchTools search page <http://www.searchtools.com/search/>. Think about which of the features we describe is vital, and which you can live without (it's like buying a car). Then try out two or three that have the most important elements, and see how well they fit with your site.

Product Special Features and Issues

Atomz

  • Advantages: A very powerful and configurable service, provides lots of control over indexing, follows complex links nicely, indexes PDF files, has many search options, and provides good schedules for updating. The results page and listing layout is entirely configurable using HTML and a simple tag-based scripting language, and there are no banner ads on the results page, just an Atomz logo. Free to 500 pages, paid version for more pages.

  • Disadvantages: free version will only index 500 pages, index ignores "NOFOLLOW" tag, search finds every possible match, finds plurals and other word forms, by default, retrieves synonyms and soundalike words: just finds too much!

FreeFind

  • Advantages: Allows you to index multiple sites (up to 32K of HTML data), update indexes often, handles complex links nicely. The search form lets users choose to match "any words" or "all words". Nice administration wizard interface walks through the options. Free, with advertising, to 32 MB of HTML text.
  • Disadvantages: Banner ads on result page, not many useful options for customization results page or matched items, sometime the server is slow to respond, no indexing reports, limited search reports.

intraSearch (WhatUSeek)

  • Advantages: Allows you to index multiple sites, good indexing of complex URLs, search finds only pages which match all search words, will update once a week. Free, with advertising, to 500 pages.
  • Disadvantages: Ignores robots.txt, which is usually an uncooperative move; not much results page customizing possible; index report is the "site map"; there is only minimal search reporting.

MondoSearch

  • Advantages: good with complex links such as redirects and framesets: shows frame page results in context; lets admin controls speed of indexing; search form can include choice of "any words" or "all words", handles extended Roman characters, marks pages by language, unusual results format - shows pages in categories, otherwise results pages layout is very flexible; can be customized for any language; local server version available.


    Figure 11. MondoSearch Category Results.

  • Disadvantages: no built-in update schedule, browser administration somewhat disorganized, very modal (you must click OK before changing pages). Paid only.

PicoSearch

  • Advantages: good with complex links such as redirects and framesets, tends to follow many links. Excellent index reporting, especially the live online version. Recognizes extended Roman characters and can be customized to show results in many languages including Chinese. Free, with advertising, to 5,000 pages.
  • Disadvantages: No update scheduling: you must do it interactively each time. search finds every possible match, which is usually too many pages. Free version is not very customizable for results pages or match items. Minimal search reporting. We also saw a distressing error when trying to search a site that is not yet indexed.

PinPoint

  • Advantages: Very configurable results page design and results listing options, thorough index report by email. Free, with advertising, to 5,000 pages.
  • Disadvantages: Some problems following complex links, no update scheduling so you must do it interactively every time. Search finds every possible match, which is often too many pages. Problems with extracting text for page description. Almost no search reporting, makes it hard to track usage. Server ca be slow.

SearchButton

  • Advantages: It's good with complex links such as redirects, and indexes automatically once a month. The search form offers a link to the Advanced form, for power users. The search reports are excellent and the service also provides access to the search log for even more detail. Free, with advertising, to 5,000 pages. Can request no-ads version for small public-service sites, no-advertising paid version available.
  • Disadvantages: May want to index more frequently than free version allows. Search finds every possible match (even when that's not wanted). Search form HTML can be hard to locate, and the results page and result item customization is very limited.

SiteMiner

  • Advantages: good with complex links such as redirects, search finds only pages which match all search words. Free, with advertising, to 10,000+ pages.
  • Disadvantages: JavaScript only search box (no HTML), problems with extended Roman characters, minimal results page and result item customization, no update scheduling, minimal search reporting.

Webinator

  • Advantages: good with complex links such as client image maps, search finds only pages which match all search words, great results-page layout customization options, automatically updates every two weeks and on demand. Free, with logo, to 5,000 pages: local server version available.
  • Disadvantages: Complex access system for logging into administration site, title problem with server redirects, no result item customization, no update scheduling, minimal search reporting.

As you can see, there's no one search engine that has all the advantages. Which one you should choose depends on your site, and your particular needs. You won't know what you like until you take a couple of test drives!


Avi Rappoport is the Principal Consultant for Search Tools Consulting, specializing in Web Site, Intranet and Portal search engines. She reports and analyzes the industry for SearchTools.com (which runs on a PowerMac 6100). You can contact her at consult@searchtools.com.
Disclaimer: Search Tools Consulting has consulting relationships with MondoSearch and SearchButton, but we do not allow our customers to influence our reviews.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Top Mobile Game Discounts
Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links... | Read more »
Price of Glory unleashes its 1.4 Alpha u...
As much as we all probably dislike Maths as a subject, we do have to hand it to geometry for giving us the good old Hexgrid, home of some of the best strategy games. One such example, Price of Glory, has dropped its 1.4 Alpha update, stocked full... | Read more »
The SLC 2025 kicks off this month to cro...
Ever since the Solo Leveling: Arise Championship 2025 was announced, I have been looking forward to it. The promotional clip they released a month or two back showed crowds going absolutely nuts for the previous competitions, so imagine the... | Read more »
Dive into some early Magicpunk fun as Cr...
Excellent news for fans of steampunk and magic; the Precursor Test for Magicpunk MMORPG Crystal of Atlan opens today. This rather fancy way of saying beta test will remain open until March 5th and is available for PC - boo - and Android devices -... | Read more »
Prepare to get your mind melted as Evang...
If you are a fan of sci-fi shooters and incredibly weird, mind-bending anime series, then you are in for a treat, as Goddess of Victory: Nikke is gearing up for its second collaboration with Evangelion. We were also treated to an upcoming... | Read more »
Square Enix gives with one hand and slap...
We have something of a mixed bag coming over from Square Enix HQ today. Two of their mobile games are revelling in life with new events keeping them alive, whilst another has been thrown onto the ever-growing discard pile Square is building. I... | Read more »
Let the world burn as you have some fest...
It is time to leave the world burning once again as you take a much-needed break from that whole “hero” lark and enjoy some celebrations in Genshin Impact. Version 5.4, Moonlight Amidst Dreams, will see you in Inazuma to attend the Mikawa Flower... | Read more »
Full Moon Over the Abyssal Sea lands on...
Aether Gazer has announced its latest major update, and it is one of the loveliest event names I have ever heard. Full Moon Over the Abyssal Sea is an amazing name, and it comes loaded with two side stories, a new S-grade Modifier, and some fancy... | Read more »
Open your own eatery for all the forest...
Very important question; when you read the title Zoo Restaurant, do you also immediately think of running a restaurant in which you cook Zoo animals as the course? I will just assume yes. Anyway, come June 23rd we will all be able to start up our... | Read more »
Crystal of Atlan opens registration for...
Nuverse was prominently featured in the last month for all the wrong reasons with the USA TikTok debacle, but now it is putting all that behind it and preparing for the Crystal of Atlan beta test. Taking place between February 18th and March 5th,... | Read more »

Price Scanner via MacPrices.net

AT&T is offering a 65% discount on the ne...
AT&T is offering the new iPhone 16e for up to 65% off their monthly finance fee with 36-months of service. No trade-in is required. Discount is applied via monthly bill credits over the 36 month... Read more
Use this code to get a free iPhone 13 at Visi...
For a limited time, use code SWEETDEAL to get a free 128GB iPhone 13 Visible, Verizon’s low-cost wireless cell service, Visible. Deal is valid when you purchase the Visible+ annual plan. Free... Read more
M4 Mac minis on sale for $50-$80 off MSRP at...
B&H Photo has M4 Mac minis in stock and on sale right now for $50 to $80 off Apple’s MSRP, each including free 1-2 day shipping to most US addresses: – M4 Mac mini (16GB/256GB): $549, $50 off... Read more
Buy an iPhone 16 at Boost Mobile and get one...
Boost Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering one year of free Unlimited service with the purchase of any iPhone 16. Purchase the iPhone at standard MSRP, and then choose... Read more
Get an iPhone 15 for only $299 at Boost Mobil...
Boost Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering the 128GB iPhone 15 for $299.99 including service with their Unlimited Premium plan (50GB of premium data, $60/month), or $20... Read more
Unreal Mobile is offering $100 off any new iP...
Unreal Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering a $100 discount on any new iPhone with service. This includes new iPhone 16 models as well as iPhone 15, 14, 13, and SE... Read more
Apple drops prices on clearance iPhone 14 mod...
With today’s introduction of the new iPhone 16e, Apple has discontinued the iPhone 14, 14 Pro, and SE. In response, Apple has dropped prices on unlocked, Certified Refurbished, iPhone 14 models to a... Read more
B&H has 16-inch M4 Max MacBook Pros on sa...
B&H Photo is offering a $360-$410 discount on new 16-inch MacBook Pros with M4 Max CPUs right now. B&H offers free 1-2 day shipping to most US addresses: – 16″ M4 Max MacBook Pro (36GB/1TB/... Read more
Amazon is offering a $100 discount on the M4...
Amazon has the M4 Pro Mac mini discounted $100 off MSRP right now. Shipping is free. Their price is the lowest currently available for this popular mini: – Mac mini M4 Pro (24GB/512GB): $1299, $100... Read more
B&H continues to offer $150-$220 discount...
B&H Photo has 14-inch M4 MacBook Pros on sale for $150-$220 off MSRP. B&H offers free 1-2 day shipping to most US addresses: – 14″ M4 MacBook Pro (16GB/512GB): $1449, $150 off MSRP – 14″ M4... Read more

Jobs Board

All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.