AmiGO Manual: Live Search: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(31 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Overview =
= Overview =


In addition to the [[AmiGO_Manual:_Search | traditional search]], AmiGO now provides a powerful method of rapidly searching using pre-computed indexes.
In addition to the [[AmiGO_Manual:_Search | traditional search]], AmiGO now provides a powerful method of rapidly searching using pre-computed indexes. In addition to speed, Live Search provides boolean operators, wildcards, and fuzzy searches.


The first column of results is a score for how much the search engine thought your query resembled the record. It is important to know that to generate this score, different fields are weighed differently; for example, the word "cho" appearing as a gene product's symbol weighs more than the same word appearing in a long list of gene product synonyms. For more control over this, please see the [Advanced Usage]
When searching, the first column of results is a score for how much the search engine thought your query resembled the indexed document. It is important to know that to generate this score, different fields are weighed differently; for example, the word "cho" appearing as a gene product's symbol weighs more than the same word appearing in a long list of gene product synonyms. For more control over this behavior, please see the [[#Fields]] section in Basic Usage and the [[#Advanced Usage]] section.


= Basic Usage =
= Basic Usage =


The largest initial difference, when comapared to the traditional AmiGO search, is that results are returned incrementally as you type. It should be noted that the search only increments when there are over three characters entered and when you are typing forward (backspaces and the like will not increment the search). All examples in the Basic Usage section will assume that you are doing a gene product search (by clicking on the "Gene Product" tab in the search interface).
The largest initial difference, when compared to the traditional AmiGO search, is that results are returned incrementally as you type. It should be noted that the search only increments when there are over three characters entered and when you are typing forward with standard characters (spaces, backspaces, and the like will not increment the search). All examples in the Basic Usage section will assume that you are doing a gene product search (by clicking on the "Gene Product" tab in the search interface).


For our first example, let's say that you want to search for "pleckstrin". As you type it in, you will get results incrementally returned for "ple", "plec", and finally "pleckstrin". If you continued and entered:
For our first example, let's say that you want to search for "pleckstrin". As you type it in, you will get results incrementally returned for "ple", "plec", and finally "pleckstrin". If you continued and entered:
Line 13: Line 13:
  pleckstrin domain
  pleckstrin domain


You would get a list of gene products that contain in their record "pleckstrin", "domain", or both. Finally, if you want to search for the phrase "pleckstrin domain", where the words occur side by side, you would have to put the phrase in quotes:
You would get a list of gene products that contain in their record "pleckstrin", "domain", or both (see [[#Boolean Operators (and/or/not)|#Boolean Operators]] below). Finally, if you want to search for the phrase "pleckstrin domain", where the words occur side by side, you would have to put the phrase in quotes:


  "pleckstrin domain"
  "pleckstrin domain"


== Filtering ==
== Boolean Operators (and/or/not) ==
 
You may also use boolean logic and nesting in the search. For example, if you wanted to see all records that contain both "pleckstrin" and "domain" in them, you would enter:
 
pleckstrin and domain
 
To exclude "domain" from "pleckstrin" results, you would enter:
 
pleckstrin and not domain
 
To get everything that has, "pleckstrin", "domain", or both, you would enter:
 
pleckstrin or domain
 
Note that:
 
pleckstrin or domain
pleckstrin domain
 
Are functionally equivalent--spaces between words are considered to be an implicit "or". This is the reason that you need to quote phrases when you want them to appear together.
 
We can also nest our boolean search with parentheses. If you wanted to search for "top" in conjuction with either "alpha" or "beta", you would enter:
 
top and (alpha or beta)
 
For more about the results from "top" (and why there are not as many as you think), please see the next section.
 
== Wildcards ==
 
For example, let's sat that we search for "top". In our results, we can see that the fly gene that we were looking for, Topoisomerase 1 with the symbol Top1, is not in the results. Unlike the [[AmiGO_Manual:_OpenSearch | OpenSearch]] widgets available for AmiGO, Live Search does not automatically search for words where "top" is only part of it--you must explicitly instruct the search engine to look for these things by adding a wildcard at the end of the word: "*". By entering:
 
top*
 
You will now see the fly gene you were searching for returned as the first result. It is important to know that wildcards '''cannot''' be the first character in a word.
 
In addition to the '*' wildcard, there are several others that can do things like fuzzy searches or proximity. Please see [[#Advanced Usage]] below.
 
==Fields==
 
It is important to note that when searching, this new engine treats all of the information about a term or gene product as a single document by default. If you are interested in, say, the term "kinase activity" and entered it into the search box:
 
kinase activity


The term and gene product searches have different (and hopefully self-explanatory) filters that you can apply to them to reduce the number of returned results. Let's say that you've search for the phrase:
You might be disapointed to find that the first returned term is "JUN kinase kinase kinase activity". However, from the point of view of the search engine, with the triple kinase in the name and all of the kinase activity appearing in the synonyms, it seems like a really close match for what you are asking.


"pleckstrin domain"
In order to narrow the search down to things that are just the phrase "kinase activity" in name of the term, you would enter:


And wish to just see the results for RGD. You can do this by clicking the "RGD" item in the "Data source" filter box. Multiple filters can be added by holding a control key (varies by operating system and browser) while clicking on filters. Any combination of filters may be added; to remove a filter set, click on "No filter" item, the first item in each filter set.
  name:"kinase activity"


== Simple Boolean Operators ==
Similarly, for a gene_product, you might enter:


You may also use boolean logic and nesting in the search. For example, if you wanted to see all records that contain both "pleckstrin" and "domain" in them, you would enter:
full_name:chocolate


pleckstrin and domain
While these two special fields may be the most common, there are a number of distinct special fields for terms and gene products. For a complete list of available fields, please see [[#Additional Search Fields]] below.


== Filtering ==


The term and gene product searches have different (and hopefully self-explanatory) filters that you can apply to them to reduce the number of returned results. Let's say that you've search for the phrase:


Whgives gene products that contain both. Finally, you may also use one of the various results filters to constrain your search. For example, if you click "RGD" under "Source" and input "pleckst", the items returned will contain "pleckst" and/or be sourced from RGD.
"pleckstrin domain"


For more information about the query syntax, please see the [http://lucene.apache.org/java/1_4_3/queryparsersyntax.html lucene documentation].
And wish to just see the results for RGD. You can do this by clicking the "RGD" item in the "Data source" filter box. Multiple filters can be added by holding a control key (varies by operating system and browser) while clicking on filters. Any combination of filters may be added; to remove a filter set, click on "No filter" item, the first item in each filter set.


= Advanced Usage =
= Advanced Usage =
For full information about the query syntax and special characters, please see the [http://lucene.apache.org/java/1_4_3/queryparsersyntax.html Lucene documentation].


== Additional Search Fields ==
== Additional Search Fields ==


In additional to typing search text, there are also special fields that you can add to constrain the search beyond the listed filters. These special fields are different depending on whether you are search for terms or gene products.
=== Term Search Fields===


=== Term Search ===
The following are a complete list of fields that can be used in a term search, as well as how they correspond to the GO database.


TODO: complete list for terms
* acc (acc from the term table)
* name (name from the term table)
* ontology (name of the ontology term corresponding to this term from the term table)
* synonym (term_synonym from term_synonym table)


=== Gene Product Search ===
=== Gene Product Search Fields===


TODO: complete list for gene products
The following are a complete list of fields that can be used in a gene product search, as well as how they correspond to the GO database.


For example:
* dbxref (gene product's xref_dbname and xref_key from the dbxref table joined by ':')
* full_name (full_name from the gene_product table)
* symbol (symbol from the gen_product table)
* species (gene_product's ncbi_taxa_id from the species table)
* scientific (generated scientific species name)
* source (gene product's xref_dbname from the dbxref_table)
* gptype (gene_product's type name from the term table)
* gpsynonym (gene_product's product_synonym from the gene_product_synonym table)
* homolset (generated "yes" or "no")


aatf and homolset:yes
== Fuzzy Searches and Levenshtein Distance ==


gives all gene products that contain the string "aatf" and are also a member of a homolog set.
This section is not yet complete. In the meantime, please see the [http://lucene.apache.org/java/1_4_3/queryparsersyntax.html Lucene documentation]. All of the documentation there should be valid for AmiGO.


= Experimental =
==Exotic Searches==


On the AmiGO Labs server, there may also be similar Live Search implementations for dbxrefs and associations.
In addition to the types of searches listed above, there are also more exotic types of search that involve ranges, proximity, optional wildcards, boosting, result weights, and others. Unfortunately, these are outside the scope of this document, but are part of the standard syntax. For a more in-depth look at these, please see the [http://lucene.apache.org/java/1_4_3/queryparsersyntax.html Lucene documentation].


= Troubleshooting =
= Troubleshooting =

Latest revision as of 19:21, 24 March 2010

Overview

In addition to the traditional search, AmiGO now provides a powerful method of rapidly searching using pre-computed indexes. In addition to speed, Live Search provides boolean operators, wildcards, and fuzzy searches.

When searching, the first column of results is a score for how much the search engine thought your query resembled the indexed document. It is important to know that to generate this score, different fields are weighed differently; for example, the word "cho" appearing as a gene product's symbol weighs more than the same word appearing in a long list of gene product synonyms. For more control over this behavior, please see the #Fields section in Basic Usage and the #Advanced Usage section.

Basic Usage

The largest initial difference, when compared to the traditional AmiGO search, is that results are returned incrementally as you type. It should be noted that the search only increments when there are over three characters entered and when you are typing forward with standard characters (spaces, backspaces, and the like will not increment the search). All examples in the Basic Usage section will assume that you are doing a gene product search (by clicking on the "Gene Product" tab in the search interface).

For our first example, let's say that you want to search for "pleckstrin". As you type it in, you will get results incrementally returned for "ple", "plec", and finally "pleckstrin". If you continued and entered:

pleckstrin domain

You would get a list of gene products that contain in their record "pleckstrin", "domain", or both (see #Boolean Operators below). Finally, if you want to search for the phrase "pleckstrin domain", where the words occur side by side, you would have to put the phrase in quotes:

"pleckstrin domain"

Boolean Operators (and/or/not)

You may also use boolean logic and nesting in the search. For example, if you wanted to see all records that contain both "pleckstrin" and "domain" in them, you would enter:

pleckstrin and domain

To exclude "domain" from "pleckstrin" results, you would enter:

pleckstrin and not domain

To get everything that has, "pleckstrin", "domain", or both, you would enter:

pleckstrin or domain

Note that:

pleckstrin or domain
pleckstrin domain

Are functionally equivalent--spaces between words are considered to be an implicit "or". This is the reason that you need to quote phrases when you want them to appear together.

We can also nest our boolean search with parentheses. If you wanted to search for "top" in conjuction with either "alpha" or "beta", you would enter:

top and (alpha or beta)

For more about the results from "top" (and why there are not as many as you think), please see the next section.

Wildcards

For example, let's sat that we search for "top". In our results, we can see that the fly gene that we were looking for, Topoisomerase 1 with the symbol Top1, is not in the results. Unlike the OpenSearch widgets available for AmiGO, Live Search does not automatically search for words where "top" is only part of it--you must explicitly instruct the search engine to look for these things by adding a wildcard at the end of the word: "*". By entering:

top*

You will now see the fly gene you were searching for returned as the first result. It is important to know that wildcards cannot be the first character in a word.

In addition to the '*' wildcard, there are several others that can do things like fuzzy searches or proximity. Please see #Advanced Usage below.

Fields

It is important to note that when searching, this new engine treats all of the information about a term or gene product as a single document by default. If you are interested in, say, the term "kinase activity" and entered it into the search box:

kinase activity

You might be disapointed to find that the first returned term is "JUN kinase kinase kinase activity". However, from the point of view of the search engine, with the triple kinase in the name and all of the kinase activity appearing in the synonyms, it seems like a really close match for what you are asking.

In order to narrow the search down to things that are just the phrase "kinase activity" in name of the term, you would enter:

  name:"kinase activity"

Similarly, for a gene_product, you might enter:

full_name:chocolate

While these two special fields may be the most common, there are a number of distinct special fields for terms and gene products. For a complete list of available fields, please see #Additional Search Fields below.

Filtering

The term and gene product searches have different (and hopefully self-explanatory) filters that you can apply to them to reduce the number of returned results. Let's say that you've search for the phrase:

"pleckstrin domain"

And wish to just see the results for RGD. You can do this by clicking the "RGD" item in the "Data source" filter box. Multiple filters can be added by holding a control key (varies by operating system and browser) while clicking on filters. Any combination of filters may be added; to remove a filter set, click on "No filter" item, the first item in each filter set.

Advanced Usage

For full information about the query syntax and special characters, please see the Lucene documentation.

Additional Search Fields

Term Search Fields

The following are a complete list of fields that can be used in a term search, as well as how they correspond to the GO database.

  • acc (acc from the term table)
  • name (name from the term table)
  • ontology (name of the ontology term corresponding to this term from the term table)
  • synonym (term_synonym from term_synonym table)

Gene Product Search Fields

The following are a complete list of fields that can be used in a gene product search, as well as how they correspond to the GO database.

  • dbxref (gene product's xref_dbname and xref_key from the dbxref table joined by ':')
  • full_name (full_name from the gene_product table)
  • symbol (symbol from the gen_product table)
  • species (gene_product's ncbi_taxa_id from the species table)
  • scientific (generated scientific species name)
  • source (gene product's xref_dbname from the dbxref_table)
  • gptype (gene_product's type name from the term table)
  • gpsynonym (gene_product's product_synonym from the gene_product_synonym table)
  • homolset (generated "yes" or "no")

Fuzzy Searches and Levenshtein Distance

This section is not yet complete. In the meantime, please see the Lucene documentation. All of the documentation there should be valid for AmiGO.

Exotic Searches

In addition to the types of searches listed above, there are also more exotic types of search that involve ranges, proximity, optional wildcards, boosting, result weights, and others. Unfortunately, these are outside the scope of this document, but are part of the standard syntax. For a more in-depth look at these, please see the Lucene documentation.

Troubleshooting

  • Live Search may occasionally drop a packet of results as you type (especially for fast typers). If you think you should have gotten results, but the display seems "stuck", try erasing the last letter and retyping it.
  • Live Search is a new piece of software and some of the AmiGO_Labs caveats may apply as bugs are worked out. If you have a problem, please contact the GO Helpdesk.
  • While we aim to be functional on as wide a variety of platforms and browsers as possible, this software depends on features that may not be found in older systems. A complete compatibility list is currently under construction.