AmiGO Manual: BLAST

From GO Wiki
Revision as of 16:54, 16 March 2011 by Girlwithglasses (talk | contribs) (→‎Entering sequences)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What is BLAST?

BLAST is a search algorithm designed to find sequence similarities (Altschul et al., 1990). An online guide to BLAST searching can be found at the NCBI BLAST Help Manual. The AmiGO BLAST server uses WU-BLAST (Gish, W. 1996-2004); technical information may be found at Washington University BLAST Archives.


What does the BLAST search do?

The AmiGO BLAST server searches the sequences from the GO protein sequence database, which comprises protein sequences of genes and gene products that have been annotated to a GO term and submitted to the GO Consortium. Protein queries are searched using BLASTP, while nucleotide sequences are searched using BLASTX. There is no need to specify which program to use, but if more than one sequence is submitted in single query, all sequences must be of the same type.

Entering sequences

The BLAST query form accepts three methods for submitting a query sequence:

  • Enter a UniProtKB accession ID, for example P55269. There is a list of tools for ID mapping if your IDs are in another format.
  • Upload a file containing sequences in FASTA format. Sequences should be separated by a line break.
  • Paste FASTA sequence(s) into the textbox. Sequences should be separated by a line break.

When entering more than one sequence, separate queries with a line break.

GOst allows BLAST queries of up to 100 sequences; the total number of residues cannot exceed 3 million. More information on FASTA format can be found on Wikipedia.

BLAST parameters

Expect threshold

The expect threshold (E value) is the maximum expect value required for a hit to be returned. The expect threshold is the statistical significance threshold for reporting matches against database sequences. If the statistical significance ascribed to a match is greater than the expect threshold, the match will not be reported. Lower expect threshold values are more stringent, leading to fewer chance matches being reported (source: NCBI BLAST help).

Maximum number of alignments

Select the number of target sequences to display in the results. Choosing fewer sequences produces results faster.

BLAST filter

Filtering is on by default and filters the query sequence for low complexity regions. In a protein search low complexity regions appear as 'X's in the alignment while in a nucleotide search they appear as 'N's. The score and E value of a match may be affected by filtering since it effectively shortens the query length.


Hit Submit to submit the query to the BLAST server. An intermediate page containing the BLAST parameters is displayed while the BLAST job is running. This page will automatically refresh until the BLAST job is finished.

BLAST Results

Query Summary

The Query Summary displays the BLAST parameters used for the BLAST job. If more than one sequence was included in the query set, each sequence will return a separate results page. Results for each of the sequences can be viewed by clicking on the page numbers at the top of the page, or all the results can be viewed on a single page by clicking on 'View All Results'.

Query Sequence

The Query Sequence section of the BLAST results page displays the sequence used as the query sequence is displayed.

High Scoring Gene Products

The High Scoring Gene Products lists the gene products that meet the E value selected as a BLAST parameter from the query page. The results are listed in ascending order of P-values. P-value is the probability that the score was the result of chance, while E-value is the expectation value. P-value can be converted to E-value and vice versa using the following formula:

P = 1 - e-E 

When E < 0.01, P-value and E-value are nearly identical.

In the column titled 'Name', gene products that show high similarity to the query sequence are listed. The name of the gene product is a hyperlink to the [AmiGO_Manual:_Gene_Product_Details gene product details page] for that gene or gene product.

In the column titled 'Species', the species the gene product is from is listed. The species name is a hyperlink to the NCBI Taxonomy Browser.

The checkbox on the left of each row of the table can be used to download the FASTA sequence for one or more proteins, or to get an [AmiGO_Manual:_Gene_Product_Annotations overview of the annotations] to a set of gene products. Select the gene products you are interested in, pick the appropriate option using the radio buttons at the bottom of the page, and hit Submit Query to view the results.

Raw BLAST Data

The raw BLAST results or the alignment details are displayed below the table of High Scoring Gene Products. Further details of interpreting the BLAST results are available in the NCBI BLAST Help Manual.