InterProScan
InterProScan is a piece of software which scans a range of protein
signatures against your sequence. These signatures mainly
represent proteins belonging to the same family, functional domains, or
active sites and so can be used to extrapolate the potential function
of your protein. Many of the protein signatures have been
integrated into the <a href="http://www.ebi.ac.uk/InterPro/">InterPro database </a>
and therefore have GO terms associated with them. In this way,
you can quickly use InterProScan to associate functional information
and GO terms to your protein of interest.
Tutorial
Tutorial on <a href="http://www.ebi.ac.uk/2can/tutorials/function/InterProScan.html">InterProScan</a>
<a href="http://www.ebi.ac.uk/2can/tutorials/function/InterProScan2.html"> Schematic Diagram</a> showing what happens to sequences entered into InterProScan.
Where to find InterProScan
InterProScan can be accessed in 3 different ways.
1) Via <a href="http://www.ebi.ac.uk/InterProScan/">EBI website</a>
2) Via <a href="http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html">EBI web services</a>
3) By <a
href="ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/index.html">downloading</a> the program to your local servers and running it.
If you only have a small number of sequences to characterize, it's
probably easiest to run them via the EBI website - you will get your
results in a graphical, tabular and XML format; all of which contain
links to InterPro and GO. Currently, because the service is so popular,
we have unfortunately had to restrict the number of sequences you can
submit at one time to 1.
Alternatively, if you know how to program with Perl, you can use EBI's
InterProScan web service to submit multiple searches in parallel (up to
20 sequences at once). All you need to do is install the
appropriate client from the page listed in point 2) above and you can
submit your sequences. Again, you can get your results back in a
variety of formats, all with links to GO terms and InterPro.
Most people download the stand-alone version of the program if they
have large numbers of sequences to characterize, as there is no real
limit to the number of sequences you can search at any one time.
However, the program is very computationally expensive and should only
be installed if you are sure that your hardware set up is sufficient to
cope with the demands of the software. The stand-alone version
can either be run through the unix command-line or a web interface can
be installed.
Both the EBI-based services and the stand-alone version come with
extensive documentation to help you use them.
Protein or Nucleotide Sequence
In all three cases, you can start with either protein or nucleotide
sequences (if you start with nucleotide sequence, it can be translated
into multiple frames and the ORFs characterized instead). You can
choose from a variety of output formats to display the results.