Your Internet Service Shopping List
![[Catalogs]](img/subject.gif)
![[Databases]](img/search.gif)
![[All-in-Ones]](img/multiple.gif)
This is the real meat of the Matrix and where it earns its name. For each
server described on its own page, the charts below represent a checklist of
features and attributes in a unified interface. In addition, various
sections of the charts are linked to relevant descriptive or background
material located in this document or elsewhere in the collection.
Matrix Keys
In creating these checklists, I wanted to do more than indicate the
presence or lack of a feature with a simple check/no-check system. On the
other hand, I also wanted to avoid a rating scale from 1-10 or 1-100,
simply because I didn't want to judge whether one server's feature was 5%
better or worse than another's.
To this end I chose the following 4 element scale, which sorts the
evaluations logically (and graphically):
- '*' Bullet or Asterisk
- An excellent and rather complete implementation of the function. For
example, a system with boolean searching that supports complex queries.
- '+' Plus Sign
- Acceptable functionality, but lacking a robust implementation. Such a
system would let you select whether to join multiple keywords with
Boolean AND or Boolean OR, but nothing more.
- '-' Minus Sign
- A poor level of support for the function, typically because it is
implemented in terms of other functionality. An example of this would
be a system that simply decides whether multiple keywords should be
connected by Boolean AND or Boolean OR, regardless of user convenience.
- ' ' Blank
- The function is simply not supported on this server.
Just because one feature is rated with a plus while another is minus, it
doesn't represent an absolute judgment of quality; rather, I mean it to
indicate a rough comparison of the same feature across servers. If you
still feel that a mistake or injustice has been made, please
feel free to mail me.
Overview Matrix
Overview Criteria
- Catalog or Index Name:
The abbreviated name of the Web Index.
- Evaluation of the Index:
A comprehensive value that reflects the author's evalution of the
index for general Internet navigation and locating particular
documents or services, on a scale of 1 to 5 Spiders.
- Number of Documents:
The number of Web, Gopher, FTP, and other documents referenced by this
collection. Keep in mind that some subject catalogs keep a short list
of hand-picked rich resources, but the effectiveness of a search
engine is proportional to the size of its database.
- Contains Web and Gopher Links:
The index contains resources found on the Web, Gopher, or both.
- FTP, UseNet, ListServ, IRC:
The index contains pointers to FTP or other well-known types of
Internet
resources.
- Other Databases:
The index contains information gathered from sources that are not
located on the Internet, such as MedLine, newspaper newswires,
or other commercial databases.
- Clarity of the Interface:
The layout of the search interface and other pages is easy to learn
and use. Well-designed services will offer navigation services across
the collection, poor services will be disorganized or littered with
obscure link icons.
- Speed of the Interface:
Relative speed of the server's links and download time for the images.
Typically reflects how many users connect to the index, the quality of
the search software, the speed of the server hardware, and the server's
support for load balancing.
- Image Download Time:
For each service with a logo or imagemap interface, how much it affect
the download time? A small logo is acceptable, but large images,
numerous icons, and textured backgrounds cost download speed and
bandwidth.
- Text-only Support:
Support for disabling images and text-only browsers means includes
functional navigational aids and alternative to the information in
images or icons.
- HTML Forms Support:
Reflects how well the server make efficient use of
HTML Forms in the search interface
and in feedback links.
- Non-Forms Support:
Certain browsers do not support HTML Forms, and must rely on simpler
search interfaces. If the server supports
non-forms searches, how robust and useful is
the search engine?
- Located in the United States:
The Web server or a mirror
site is located in the United States (not even Canada), indicating
faster and more reliable network access.
Features List
- Catalog or Index Name:
The abbreviated name of the Web Index.
- Subject Index:
The information in the index is organized by subject area, typically
in a hierarchic tree of information.
- Organization:
The organization of the links, including number and depth of subjects,
as well as cross-references and general ease of use.
- Abstracts:
Amount of descriptive information that accompanies links.
- Searchable Index:
The information in the index is stored in a database, which is
accessed by entering relevant search criteria, called
keywords, and then
displayed in a list of links to the desired documents.
- Multiple Keywords:
Users can expand or restrict databases searches by entering more
than one keyword. Additional controls are often necessary for
flexible control over the query.
- Natural Language:
The ability to accept a query in natural language (as if asking a
simple question), ignoring common or incidental words (the,
is, and) and extracting relevant keywords for the
search.
- Boolean Searching:
For servers that allows Boolean
Searching, this field reflects the sophistication of the feature.
Many servers automatically join keywords with Boolean AND or Boolean
OR, but only Alta Vista and
EINet Galaxy supports complex criteria.
- Proximity Searching:
The ability to select a document based on the proximity of the keywords
in the text. As a rule of thumb, this means that documents with
incidental keyword matches will be rated lower than others with highly
relevant content.
- Keyword Phrase Searching:
In selecting highly related keywords, it may be desirable to treat them
as a single word phrase to encourage the search engine to find them
together. For example, using "Bill Gates" as a query will
generate better matches than the Boolean "Bill AND Gates".
Phrase searching provides much more specific functionality than either
Boolean searching or Proximity Searching.
- Regular Expressions:
A sophisticated method for specifying keyword patterns, using wildcard
characters and other matching functions; its generally available on
search engines that are based on Perl or grep software.
- Substring Searches:
This feature represents the ability to enter a complete or partial word
and generate matches containing it. Exceptional servers will examine
the keywords a user has entered, and identify the appropriate root word
to use as a substring search; I commonly refer to this functionality as
Root or Suffix Management.
- Sorts Search Results:
Many search engines will list the result sets in order of calculated
relevance, typically listing the best matches toword the top of the
results document and degenerating into poorer matches toward the bottom.
This feature makes its easy for users to identify and print only
the best 10 or 50 matches in a set.
- Limits Search Results:
Some servers allow the user to specify a maximum number of documents
to return, thus providing better response time and a focused result
set. Certain servers enforce a maximum number of search results, to
lessen the server load or to encourage user subscription.
- Richness of Match Descriptions:
This value reflects relevant background information for the documents
in the result set, such as match quality, file location, file size,
file timestamp, or extracted passages. The more information that a
service provides, the easier it is to identify useful documents
in the match set.
- Custom Search Software:
The software that performs a search is critical to the speed and
functionality of the service. Servers written using Perl, awk, or
other simple scripting tools are much slower than custom software
solutions written in C or using special database software.
- Searches Filenames and URLs:
Servers that can search the filenames and locations are useful for
locating documents in a particular location or by a particular
author. Unfortunately, such searches often interfere with keyword
searches because machine or user names may incidentally match
search criteria (such as www-genome.wi.mit.edu or
forestry.umn.edu).
- Searches Summaries and Keywords:
This is perhaps the most reliable type of search, because the
the search engine examines very specific words or phrases, rather
than the incidental text and file locations. Summaries are a good
way to assist search software in finding relevant documents, but
require administrative work and the establishment of consistent
descriptors and vocabulary
control standards.
- Searches Document Fulltext:
The most flexible type of search, this method applies a brute force
search the complete content of the documents for possible matches.
Although time-consuming and prone to error, fulltext searches can
be simplified or focused with tools such as
root management or
proximity searching.
![[Catalogs]](img/subject.gif)
![[Databases]](img/search.gif)
![[All-in-Ones]](img/multiple.gif)
This collection is Copyright © 1995-6 by Matt Slot, but has been designed
for public use. Permission is hereby granted for unlimited print and electronic
redistribution. Your feedback is
appreciated.
Matt Slot *
fprefect@ambrosiasw.com *
11/22/96