AmiGO: Search Relevance
[From Amelia's WG message:]
If you just want to try it out, please go to:
http://toy.lbl.gov:9006/cgi-bin/amigo/search.cgi
to give it a whirl. I've left in the info about the best match. Comments and bug reports gratefully received! The relevance algorithm now has the following tweaks:
- automatically remove 'complex' or 'activity' from the end of terms so that users are not penalised for not knowing GO speak
- search results where words appear in the same order as the query phrase score higher than where the order is different
- whole word matches score higher than partial matches
For those who are interested, the basic calculation performed to generate the relevance score is:
relevance = 1 - ( remainder / ( querystr + remainder ))
where:
- querystr = length of the query phrase
- remainder = a figure based on the length of the search result after the query phrase has been removed
This is then multiplied by a factor depending on what field the search result appears in; e.g. if it's a related synonym, the factor is 0.5, if it's a term name, the factor is 1.0, etc..
The figure for the 'remainder' is now calculated using this formula:
remainder =
# word chars x word char weighting (currently 1.0) + # word boundaries x boundary weighting (currently 0.25) + (# non word chars + # search matches in same order as query phrase) x non word char weighting (currently 0.25)
word characters are a-z, 0-9 and _
I can mess around with the weightings to alter the relative importance of exact word matches, etc..
Please try it out and give me any feedback you might have.