InterProScan

From GO Wiki
Jump to: navigation, search

InterProScan is a piece of software which scans a range of protein signatures against your sequence.  These signatures mainly represent proteins belonging to the same family, functional domains, or active sites and so can be used to extrapolate the potential function of your protein.  Many of the protein signatures have been integrated into the <a href="http://www.ebi.ac.uk/InterPro/">InterPro database </a> and therefore have GO terms associated with them.  In this way, you can quickly use InterProScan to associate functional information and GO terms to your protein of interest.

Tutorial

Tutorial on <a href="http://www.ebi.ac.uk/2can/tutorials/function/InterProScan.html">InterProScan</a>
<a href="http://www.ebi.ac.uk/2can/tutorials/function/InterProScan2.html"> Schematic Diagram</a> showing what happens to sequences entered into InterProScan.

Where to find InterProScan

InterProScan can be accessed in 3 different ways.
1) Via <a href="http://www.ebi.ac.uk/InterProScan/">EBI website</a>
2) Via <a href="http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html">EBI web services</a>
3) By <a href="ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/index.html">downloading</a> the program to your local servers and running it.


If you only have a small number of sequences to characterize, it's probably easiest to run them via the EBI website - you will get your results in a graphical, tabular and XML format; all of which contain links to InterPro and GO. Currently, because the service is so popular, we have unfortunately had to restrict the number of sequences you can submit at one time to 1.

Alternatively, if you know how to program with Perl, you can use EBI's InterProScan web service to submit multiple searches in parallel (up to 20 sequences at once).  All you need to do is install the appropriate client from the page listed in point 2) above and you can submit your sequences. Again, you can get your results back in a variety of formats, all with links to GO terms and InterPro.

Most people download the stand-alone version of the program if they have large numbers of sequences to characterize, as there is no real limit to the number of sequences you can search at any one time.  However, the program is very computationally expensive and should only be installed if you are sure that your hardware set up is sufficient to cope with the demands of the software.  The stand-alone version can either be run through the unix command-line or a web interface can be installed.

Both the EBI-based services and the stand-alone version come with extensive documentation to help you use them.


Protein or Nucleotide Sequence

In all three cases, you can start with either protein or nucleotide sequences (if you start with nucleotide sequence, it can be translated into multiple frames and the ORFs characterized instead).  You can choose from a variety of output formats to display the results.