Search Engine Help
|
In order to be more consistent with the way major search engines work, MHC's search engine now assumes,
unless you tell it otherwise, that all words you type must appear on the page in order for it to be
returned in the search results. Think of this as an implicit "AND" of the words you type; previously,
the search engine used an implicit "OR".
As a result of this change, it is no longer necessary to use the + symbol to imply "AND". Instead, you
will have to use the word OR (in uppercase) whenever you want to specify that the word appearing after OR is optional.
The word OR can also be abbreviated with a single vertical bar | character. See below for examples.
|
|
The search engine consists of two main parts, a robot, and the search program
that is activated by one of these pages. The robot starts late at night when
usage is low and looks for links in each page on the system. Eventually,
it covers every one of the many thousands of HTML files that are somehow linked
to the main index.shtml page. It intentionally excludes pages
that are part of a user's personal Web space rather than the main system.
This method is very similar to that used by the Web-wide search engines like
Google and Alta Vista.
From this process the robot generates a database of all the words contained
in all of the pages. The site map is a second database
that gets generated by a similar robot.
When you enter a search request, the search engine script looks in these two databases
for all of the pages containing the words or phrases you asked for.
Because these two robots only operate once per day, any pages that are added to
the system since it last ran won't come up as the results of a search.
The most basic search is done by just entering one more words, separated
by spaces. The search engine will give you a list of pages containing at least one word
which begins with these letters. For instance, if you enter food potato,
the request would match a page having the words food and potato,
or one with foodstuff and potatoes, somewhere in it.
Note that, in the case of entries from the site map, only the pages' titles
are considered. This type of search is best when you think that the term you
are searching for is contained in one of the major pages on the site. It helps
to eliminate pages that may only deal with a topic in passing.
In order to prevent the search engine from considering words that only start
with a word you are looking for, you can enclose the word in double-quotes.
If you enter "food", only pages containing that exact word will match;
pages with foodstuff will not.
If you want to limit the search to an exact phrase, you can enter it in
double-quotes as well.
A couple of notes about what you enter:
-
Case does not matter. Asking for jones or JOnes will match
Jones, just fine.
-
All punctuation is ignored, so entering Mr. Jones is the same as entering
Mr Jones.
-
Some words (like a, the, and, to, etc.) are so
common they are always ignored, except when in double-quotes. The search
engine will give you an error message if you have not entered any "unique"
keywords for your search.
-
Even though most of the examples here use one or two words, you can actually
use any number of them.
The results of a search are organized into two sections, each of which contain three columns:
Matches in the Site Map
|
| Location within site map hierarchy |
Match Gauge | Page Title | Link |
| Library, Information & Technology Services : MHC Archives and Special Collections : rare2.htm |    | Dante Catalog Home Page | /lits/library/arch/dante.htm |
| | Library, Information & Technology Services : MHC Archives and Special Collections : rare2.htm : Dante Catalog Home Page |    | Dante Catalog | /lits/library/arch/dante.gia.htm |
   | Dante Illustrators | /lits/library/arch/danteill.htm |
Matches in Pages
|
Match Gauge | Page Title | Link |
   | Descriptions of images in the Inferno | /lits/library/arch/danteimgrt.html |
    | Dante Catalog | /lits/library/arch/dantegia.htm |
    | Dante Catalog | /lits/library/arch/test/dante.gia.htm |
The first section shows matches from titles of entries in the site map, if this option is enabled.
The list is organized based on the pages' hierarchy within the map, and their relative scores.
Above each grouping appears a "path" which details how each page fits into the hiearchy.
The first column in each section is a gauge which indicates how well the particular page
matches your query. Since the results are sorted with the best matches first,
the first entry will always be a completely red bar (text-based browsers
show this as 100%). Other matches are expressed as a percentage of the best
match.
The search engine calculates the worth of a match based on the number
of times the term appears on the page and where the term is located. For
instance, a term contained in a page's title or in a large text header is
considered to be more important than one in the body of a page.
The page's title, if any, is taken from the HTML
<TITLE> tag. The link gives the full path of the matching
page, and provides you with a link you can click on to go there.
Below the matches, a number of statistics are given, for example:
270 matches total, first 100 available, 1-15 shown.
FOOD=1351 FIGHT=1359
[ Next 15 matches ]
This display means that the word FOOD was found on 1351 pages, and
the word FIGHT was found on 1359. The total number of pages containing
both words is 270. In order to conserve system resources, the search engine
will only show you the first 100 matches, so this line informs you that the
first 15 are now being shown. To go to the next 15 best matches, click on the
link provided.
The search engine normally shows 15 matches at a time. The
Advanced Search page allows you to change this.
There may be times when you want to look for pages that match either
one word or a second. For this case, use the format word1
OR word2. For example, entering food OR potato will
match pages which contain either of these words. It is important to type the word OR in uppercase.
By using a - you can exclude a word. Entering food
-potato will match pages which contain food, but
not those which contain both food and potato.
Using the modifier url: you can force the search to only match those
pages whose location (URL) begins with a certain path. url: should
be followed by the absolute path of interest, without
http://www.mtholyoke.edu at the beginning. A search for food
url:/offices/comm/csj looks for pages with the word
food in issues of the College Street Journal.
There is also a separate Advanced Search page
which allows you to choose from a list of locations on this site and to set
the number of matches per page in the result.
-
Searches are not sensitive to capitalization. Punctuation is ignored.
-
word1 word2 Matches documents containing words beginning with both
word1 and word2
-
"word" Matches documents containing the word exactly
-
"word1 word2" Matches documents containing the exact phrase
-
word1 OR word2 Matches documents containing a word beginning with
word1 or word2
-
word1 -word2 Matches documents containing a word beginning with
word1 and no words beginning with word2
-
word1 url:word2 Matches documents containing a word beginning with
word1, but only when their URLs start with
word2
-
Examples:
mary lyon Matches Mary Lyon, Lyons of Maryland, or Lyon, Mary
"mary" Matches Mary Lyon or Mary Smith
"mary lyon" Matches Mary Lyon
mary OR lyon Matches Maryland, Lyon, France, or Mary Smith
"mary" -lyon Matches Mary, but not on pages containing Lyon
map url:/adm Matches pages in the Admission Center which talk about the campus map
|