AmiGO: Search Relevance

From GO Wiki
Revision as of 15:48, 11 February 2008 by Maria (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

[From Amelia's WG message:]

If you just want to try it out, please go to:

http://toy.lbl.gov:9006/cgi-bin/amigo/search.cgi

to give it a whirl. I've left in the info about the best match. Comments and bug reports gratefully received! The relevance algorithm now has the following tweaks:

  • automatically remove 'complex' or 'activity' from the end of terms so that users are not penalised for not knowing GO speak
  • search results where words appear in the same order as the query phrase score higher than where the order is different
  • whole word matches score higher than partial matches

For those who are interested, the basic calculation performed to generate the relevance score is:

relevance = 1 - ( remainder / ( querystr + remainder ))

where:

  • querystr = length of the query phrase
  • remainder = a figure based on the length of the search result after the query phrase has been removed

This is then multiplied by a factor depending on what field the search result appears in; e.g. if it's a related synonym, the factor is 0.5, if it's a term name, the factor is 1.0, etc..

The figure for the 'remainder' is now calculated using this formula:

remainder =

       # word chars x word char weighting  (currently 1.0)
       + # word boundaries x boundary weighting  (currently 0.25)
       + (# non word chars + # search matches in same order as query phrase) x non word char weighting  (currently 0.25)

word characters are a-z, 0-9 and _

I can mess around with the weightings to alter the relative importance of exact word matches, etc..

Please try it out and give me any feedback you might have.