User manual for the dcGO database in 2023
2023-02-15
Section 1 Overview
1.1 Motivations
Most available protein sequences lack biological annotation, and protein structural domains are less studied than proteins in terms of ontology annotation. The dcGO database addresses this need, enabling systematic annotations of domains using ontologies.
1.2 Timelines
The idea/method of mapping Gene Ontology (GO) terms onto protein superfamily and family domains was first described in the SUPERFAMILY 2011 Nucleic Acids Research database issue paper. As a result of this generalised method applied to ontologies of diverse contexts, we generated a resource dcGO officially released in the dcGO 2013 Nucleic Acids Research database issue paper, developed the software (dcGOR), and supported the updates of the SUPERFAMILY database (see 2015 and 2019).
1.3 Contents
Over time (10 years on), the dcGO resource has evolved to support annotations for protein domains of different levels, not only SCOP superfamilies and families, but also Pfam families and InterPro families. Such annotations are made available in a wide variety of knowledge contexts:
functions (GO)
pathways (see KEGG, REACTOME, PANTHER, WikiPathways and MitoCarta)
transcriptional regulators (see ENRICHR Consensus TFs and TRRUST TFs)
molecular hallmarks (see MSIGDB hallmarks)
phenotypes in human and mouse, and other model organisms (including worm, fly, zebrafish and Arabidopsis)
diseases (see Disease Ontology and EFO disease traits)
drugs (see DGIdb druggable categories and Open Targets tractability buckets)
1.4 Mining
In parallel with the growth in knowledgebase (both protein domains and biomedical ontologies), the dcGO now provides the users with new web interfaces and various mining opportunities, including:
FACETED SEARCH, an entry point to mine the resource with keywords in query returning term-specific pages and domain-specific pages;
ONTOLOGY HIERARCHY, browsing alongside annotated domains and crosslinked terms according to shared domain annotations; and
ENRICHMENT ANALYSIS, identifying the knowledge of functions, phenotypes, diseases and many others that are enriched/over-represented within an input list of protein domains; see Example Output.