Annotation outreach group meeting 31st March 2007
Minutes for GO Outreach call of 3/30/07
Rama Balakrishnan, Evelyn Camon, Jennifer Clark, Harold Drabkin*, Michelle Gwinn, Pascale Gaudet, Fiona McCarthy
We attempted to come to a finished product about the IEA flowchart(s). A basic idea is that how one gets IEA annotation depends upon many factors, including whether one can make use of UniProt, where the sequences given to EMBL/Genbank; etc.
How to obtain records for one taxon ID from UniProt? Sending everything “through” UniProt has limitations. UniProt does not have anything. For example, the farm animal grouped in UniPark do not have UniProt ids, but IPI accession Ids. UniProt does not have a great deal of prokaryotic products. TIGRE may be a better source for comparisons.
Things to emphasize InterPro domain to GO mappings are meant to be broad. ISS via blasts help you get more specific.
HAMAP to GO : a more manual GO mapping
New in UniProt: SPCL to GO (subcellular localization to GO)
Harold attempted to clarify MGI IEA flow chart
During the night we download all the uniprot records for the mouse taxon id. Each uniprot record has a section that lists embl and genbank records (nucleic acid version.) If any ids match any gene in the MGI database then they keep that record. This record is attached to the marker. That means that we load the swissprot ids, and the two accessions are linked in a relational database.
Each record also contains keyword. We don't load the keywords but we map them to GO in house. The keyword2go mapping could be used but MGI makes there own as they have particular needs.
We load the EC numbers and the domains into the database where they are known to apply to a given gene product. (Unless it is a trembl record in which we'd only do an EC number.) Not all domains are taken as they'd get odd results from a patially curated record. e.g. s6kinase domain there were thousands of them.
Every night we load GO also.
Some very broad mapping terms may be filtered e.g. enzyme.
Questions about “marker” ; could change to gene; again specific to MGI (and maybe others) because we have many seq_ids (nucleic acid and proteins) that are collected under the thing we call Marker. (Originally, MGI was a chromosome mapping db, so marker was appropriate: something one followed in crosses). Evelyn indicated that there will be/are now manual rules applied to translating the keywords to GO terms, and that perhaps the rules should be shared. However, MGI strictly takes the keywords in record and takes the GO term that the keyword maps to in the translation table with NO inspection (done nightly by HAL2000).
Think of ISS or IEA as a “suggestion”. Use of ISS noting the context within an organisms as to whether to take the GO terms of the organisms that the ISS points to.
Michelle: TIGRE does not make much use of the UniProt resource, but uses Procite, Pfam, and TIGRE2GO mappins.
If one wanted to do the IEA on ones own, say for sequence that is nowhere else other than at the users site, then it would appear that the best approach would be a domain scan using any of several tools, and then using the IP2GO mappings.