ABOUT | MYSQL | MORE  

  The idea/method of mapping Gene Ontology (GO) terms onto SCOP domains was first described in the SUPERFAMILY 2011 NAR database issue paper. As a result of this generalised method applied to Open Biomedical Ontologies (OBO) of diverse contexts and protein domains of different levels, we generated a resource `dcGO` officially released in the dcGO 2013 NAR database issue paper.

  Over time (10 years on), this resource has evolved to support domain-centric annotations not only for SCOP superfamilies/families and also for Pfam and InterPro families. Such annotations are made available in terms of a variety of knowledge contexts, ranging from functions, pathways, regulators and hallmarks, to phenotypes and diseases, further down to the information about drugs.

  In parallel with the growth in knowledgebase (both protein domains and ontologies), the dcGO now provides users the opportunities, including:
  •  FACETED SEARCH, an entry point to mine the dcGO resource with the keyword in query returning term-specific pages and domain-specific pages;
  •  ONTOLOGY HIERARCHY, enabling the browsing along with annotated domains and crosslinked terms (according to shared domain-centric ontology annotations);
  •  ENRICHMENT ANALYSIS, identifying the knowledge of functions, phenotypes and diseases enriched/over-represented within an input list of protein domains.

  Please refer to  User Manual detailing how to mine the dcGO resource.

dcGOdb_202301.sql.gz    contains 9 tables, with the schema outlined below:



interpro_info


  A table about InterPro family domains    Table_interpro_info.txt.gz
mysql> DESC interpro_info;
+-------------+------------------+------+-----+---------+-------+
| Field       | Type             | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+-------+
| id          | varchar(20)      | NO   | PRI | NULL    |       |
| description | varchar(255)     | NO   |     | NULL    |       |
| level       | enum('interpro') | NO   | MUL | NULL    |       |
+-------------+------------------+------+-----+---------+-------+
    
  • The id column is the InterPro domain ID
  • The description column is the description for InterPro family domains
  • The level column is the domain level ('interpro')

mapping_interpro


  A table about ontology annotations for InterPro family domains (that is, term-domain mappings)    Table_mapping_interpro.txt.gz
mysql> DESC mapping_interpro;
+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| obo       | char(8)     | NO   | PRI | NULL    |                |
| term_id   | varchar(50) | NO   | PRI | NULL    |                |
| domain_id | varchar(20) | NO   | PRI | NULL    |                |
| ascore    | double      | YES  |     | NULL    |                |
| auto      | int         | NO   | UNI | NULL    | auto_increment |
+-----------+-------------+------+-----+---------+----------------+
    
  • The obo column is the ontology
  • The term_id column is the ontology term ID
  • The domain_id column is the InterPro domain ID
  • The ascore column is the annotation score [1-100]
  • The auto column is an increasing sequence of unique numbers, collectively acting as a primary key

mapping_pfam


  A table about ontology annotations for Pfam family domains (that is, term-domain mappings)    Table_mapping_pfam.txt.gz
mysql> DESC mapping_pfam;
+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| obo       | char(8)     | NO   | PRI | NULL    |                |
| term_id   | varchar(50) | NO   | PRI | NULL    |                |
| domain_id | varchar(20) | NO   | PRI | NULL    |                |
| ascore    | double      | YES  |     | NULL    |                |
| auto      | int         | NO   | UNI | NULL    | auto_increment |
+-----------+-------------+------+-----+---------+----------------+
    
  • The obo column is the ontology
  • The term_id column is the ontology term ID
  • The domain_id column is the Pfam domain ID
  • The ascore column is the annotation score [1-100]
  • The auto column is an increasing sequence of unique numbers, collectively acting as a primary key

mapping_scop


  A table about ontology annotations for SCOP superfamily and family domains (that is, term-domain mappings)    Table_mapping_scop.txt.gz
mysql> DESC mapping_scop;
+-----------+--------------------+------+-----+---------+----------------+
| Field     | Type               | Null | Key | Default | Extra          |
+-----------+--------------------+------+-----+---------+----------------+
| obo       | char(8)            | NO   | PRI | NULL    |                |
| term_id   | varchar(50)        | NO   | PRI | NULL    |                |
| domain_id | mediumint unsigned | NO   | PRI | NULL    |                |
| ascore    | double             | YES  |     | NULL    |                |
| auto      | int                | NO   | UNI | NULL    | auto_increment |
+-----------+--------------------+------+-----+---------+----------------+
    
  • The obo column is the ontology
  • The term_id column is the ontology term ID
  • The domain_id column is the SCOP domain ID
  • The ascore column is the annotation score [1-100]
  • The auto column is an increasing sequence of unique numbers, collectively acting as a primary key

pfam_info


  A table about Pfam family domains    Table_pfam_info.txt.gz
mysql> DESC pfam_info;
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| id          | varchar(20)  | NO   | PRI | NULL    |       |
| description | varchar(255) | NO   |     | NULL    |       |
| level       | enum('pfam') | NO   | MUL | NULL    |       |
+-------------+--------------+------+-----+---------+-------+
    
  • The id column is the Pfam domain ID
  • The description column is the description for Pfam family domains
  • The level column is the domain level ('pfam')

scop_hie


  A table about SCOP hierarchy    Table_scop_hie.txt.gz
mysql> DESC scop_hie;
+--------+--------------------+------+-----+---------+----------------+
| Field  | Type               | Null | Key | Default | Extra          |
+--------+--------------------+------+-----+---------+----------------+
| parent | mediumint unsigned | NO   | PRI | NULL    |                |
| child  | mediumint unsigned | NO   | PRI | NULL    |                |
| auto   | int                | NO   | UNI | NULL    | auto_increment |
+--------+--------------------+------+-----+---------+----------------+
    
  • The parent column is the parent SCOP domain ID (i.e. superfamily)
  • The child column is the children SCOP domain ID (i.e. family)
  • The auto column is an increasing sequence of unique numbers, collectively acting as a primary key

scop_info


  A table about SCOP superfamily and family domains    Table_scop_info.txt.gz
mysql> DESC scop_info;
+-------------+--------------------+------+-----+---------+-------+
| Field       | Type               | Null | Key | Default | Extra |
+-------------+--------------------+------+-----+---------+-------+
| id          | mediumint unsigned | NO   | PRI | NULL    |       |
| description | varchar(255)       | NO   |     | NULL    |       |
| level       | enum('sf','fa')    | NO   | MUL | NULL    |       |
+-------------+--------------------+------+-----+---------+-------+
    
  • The id column is the SCOP domain ID
  • The description column is the description for SCOP domains
  • The level column is the domain level (i.e., either 'sf' for superfamily or 'fa' for family)

term_hie


  A table about ontology hierarchy    Table_term_hie.txt.gz
mysql> DESC term_hie;
+----------+------------------------+------+-----+---------+----------------+
| Field    | Type                   | Null | Key | Default | Extra          |
+----------+------------------------+------+-----+---------+----------------+
| obo      | char(8)                | NO   | PRI | NULL    |                |
| parent   | varchar(20)            | NO   | PRI | NULL    |                |
| child    | varchar(20)            | NO   | PRI | NULL    |                |
| relation | enum('is_a','part_of') | NO   |     | NULL    |                |
| root     | enum('yes','no')       | NO   |     | NULL    |                |
| auto     | int                    | NO   | UNI | NULL    | auto_increment |
+----------+------------------------+------+-----+---------+----------------+
    
  • The obo column is the ontology
  • The parent column is the parent ontology term ID
  • The child column is the children ontology term ID
  • The relation column is the relation between terms (i.e., either 'is_a' or 'part_of')
  • The root column is to whether or not the parent terms are at root (i.e., either 'yes' or 'no')
  • The auto column is an increasing sequence of unique numbers, collectively acting as a primary key

term_info


  A table about ontology terms    Table_term_info.txt.gz
mysql> DESC term_info;
+----------+--------------------+------+-----+---------+----------------+
| Field    | Type               | Null | Key | Default | Extra          |
+----------+--------------------+------+-----+---------+----------------+
| id       | varchar(50)        | NO   | MUL | NULL    |                |
| name     | varchar(255)       | NO   | PRI | NULL    |                |
| obo      | char(8)            | NO   | PRI | NULL    |                |
| distance | mediumint unsigned | NO   |     | NULL    |                |
| auto     | int                | NO   | UNI | NULL    | auto_increment |
+----------+--------------------+------+-----+---------+----------------+
    
  • The id column is the ontology term ID
  • The name column is the ontology term name
  • The obo column is the ontology
  • The distance column is the distance to the root term
  • The auto column is an increasing sequence of unique numbers, collectively acting as a primary key

USE


  Restore the dcGO MySQL database
mysql -u root -p -e "create database dcGOdb;"
gunzip < dcGOdb_202301.sql.gz | mysql -u root -p dcGOdb
    

  Show tables
mysql> USE dcGOdb;
mysql> SHOW tables;
+------------------+
| interpro_info    |
| mapping_interpro |
| mapping_pfam     |
| mapping_scop     |
| pfam_info        |
| scop_hie         |
| scop_info        |
| term_hie         |
| term_info        |
+------------------+
    

  Extract Pfam domains annotated by Experimental Factor Ontology (EFO) term 'EFO:0000540' ('immune system disease')
mysql> SELECT b.id, b.description 
FROM mapping_pfam AS a, pfam_info AS b 
WHERE a.domain_id=b.id AND a.obo='EFO' AND a.term_id='EFO:0000540'
ORDER BY b.description ASC;
+---------+-----------------------------------------------------------+
| id      | description                                               |
+---------+-----------------------------------------------------------+
| PF00001 | 7 transmembrane receptor (rhodopsin family)               |
| PF00023 | Ankyrin repeat                                            |
| PF07716 | Basic region leucine zipper                               |
| PF00170 | bZIP transcription factor                                 |
| PF00656 | Caspase domain                                            |
| PF00619 | Caspase recruitment domain                                |
| PF00129 | Class I Histocompatibility antigen, domains alpha 1 and 2 |
| PF00993 | Class II histocompatibility antigen, alpha domain         |
| PF00969 | Class II histocompatibility antigen, beta domain          |
| PF00173 | Cytochrome b5-like Heme/Steroid binding domain            |
| PF00008 | EGF-like domain                                           |
| PF00178 | Ets-domain                                                |
| PF07654 | Immunoglobulin C1-set domain                              |
| PF07686 | Immunoglobulin V-set domain                               |
| PF03770 | Inositol polyphosphate kinase                             |
| PF00605 | Interferon regulatory factor transcription factor         |
| PF09294 | Interferon-alpha/beta receptor, fibronectin type III      |
| PF10401 | Interferon-regulatory factor 3                            |
| PF00169 | PH domain                                                 |
| PF00130 | Phorbol esters/diacylglycerol binding domain (C1 domain)  |
| PF07714 | Protein tyrosine and serine/threonine kinase              |
| PF01023 | S-100/ICaBP type calcium binding domain                   |
| PF00017 | SH2 domain                                                |
| PF00018 | SH3 domain                                                |
| PF00048 | Small cytokines (intecrine/chemokine), interleukin-8 like |
| PF01017 | STAT protein, all-alpha domain                            |
| PF02864 | STAT protein, DNA binding domain                          |
| PF02865 | STAT protein, protein interaction domain                  |
| PF02198 | Sterile alpha motif (SAM)/Pointed domain                  |
| PF01582 | TIR domain                                                |
| PF01108 | Tissue factor                                             |
| PF00229 | TNF(Tumour Necrosis Factor) family                        |
| PF00020 | TNFR/NGFR cysteine-rich region                            |
+---------+-----------------------------------------------------------+
    

VERSIONS


  January 2023 release    dcGOdb_202301.sql.gz
  July 2022 release    dcGOdb_202207.sql.gz
  September 2021 release    dcGOdb_202109.sql.gz
  April 2021 release    dcGOdb_202104.sql.gz

    Hai FANG, PhD

      Professor of Bioinformatics

      [email protected]

      Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China


    National Natural Science Foundation of China: 32170663

    2022 - 2025

    Program for Professor of Special Appointment at Shanghai Institutions of Higher Learning: Eastern Scholar

    2020 - 2022

    BBSRC Tools and Resources Development Fund: BB/L018543/1

    2014 - 2015


    The dcGO domain-centric ontology database in 2023: new website and extended annotations for protein structural domains

    Journal of Molecular Biology 2023

    DOI: 10.1016/j.jmb.2023.168093

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Genome Biology 2019

    DOI: 10.1186/s13059-019-1835-8

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Genome Biology 2016

    DOI: 10.1186/s13059-016-1037-6

    The SUPERFAMILY 1.75 database in 2014: a doubling of data

    Nucleic Acids Research 2015

    DOI: 10.1093/nar/gku1041

    dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations

    PLoS Computational Biology 2014

    DOI: 10.1371/journal.pcbi.1003929

    A large-scale evaluation of computational protein function prediction

    Nature Methods 2013

    DOI: 10.1038/nmeth.2340

    A domain-centric solution to functional genomics via dcGO Predictor

    BMC Bioinformatics 2013

    DOI: 10.1186/1471-2105-14-S3-S9

    dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more

    Nucleic Acids Research 2013

    DOI: 10.1093/nar/gks1080

    SUPERFAMILY 1.75 including a domain-centric gene ontology method

    Nucleic Acids Research 2011

    DOI: 10.1093/nar/gkq1130