AmiGO: Search Relevance

From GO Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

[From Amelia's WG message:]

If you just want to try it out, please go to:

http://toy.lbl.gov:9006/cgi-bin/amigo/search.cgi

to give it a whirl. I've left in the info about the best match. Comments and bug reports gratefully received! The relevance algorithm now has the following tweaks:

  • automatically remove 'complex' or 'activity' from the end of terms so that users are not penalised for not knowing GO speak
  • search results where words appear in the same order as the query phrase score higher than where the order is different
  • whole word matches score higher than partial matches

For those who are interested, the basic calculation performed to generate the relevance score is:

relevance = 1 - ( remainder / ( querystr + remainder ))

where:

  • querystr = length of the query phrase
  • remainder = a figure based on the length of the search result after the query phrase has been removed

This is then multiplied by a factor depending on what field the search result appears in; e.g. if it's a related synonym, the factor is 0.5, if it's a term name, the factor is 1.0, etc..

The figure for the 'remainder' is now calculated using this formula:

remainder =

       # word chars x word char weighting  (currently 1.0)
       + # word boundaries x boundary weighting  (currently 0.25)
       + (# non word chars + # search matches in same order as query phrase) x non word char weighting  (currently 0.25)

word characters are a-z, 0-9 and _

I can mess around with the weightings to alter the relative importance of exact word matches, etc..

Please try it out and give me any feedback you might have.