Section 1 Overview

The logo for the domain-centric Gene Ontology (dcGO) resource.

FIGURE 1.1: The logo for the domain-centric Gene Ontology (dcGO) resource.


1.1 Motivations

Most available protein sequences lack biological annotation, and protein structural domains are less studied than proteins in terms of ontology annotation. The dcGO database addresses this need, enabling systematic annotations of domains using ontologies.

1.2 Timelines

The idea/method of mapping Gene Ontology (GO) terms onto protein superfamily and family domains was first described in the SUPERFAMILY 2011 Nucleic Acids Research database issue paper. As a result of this generalised method applied to ontologies of diverse contexts, we generated a resource dcGO officially released in the dcGO 2013 Nucleic Acids Research database issue paper, developed the software (dcGOR), and supported the updates of the SUPERFAMILY database (see 2015 and 2019).

1.3 Contents

Over time (10 years on), the dcGO resource has evolved to support annotations for protein domains of different levels, not only SCOP superfamilies and families, but also Pfam families and InterPro families. Such annotations are made available in a wide variety of knowledge contexts:

1.4 Mining

In parallel with the growth in knowledgebase (both protein domains and biomedical ontologies), the dcGO now provides the users with new web interfaces and various mining opportunities, including:

  • FACETED SEARCH, an entry point to mine the resource with keywords in query returning term-specific pages and domain-specific pages;

  • ONTOLOGY HIERARCHY, browsing alongside annotated domains and crosslinked terms according to shared domain annotations; and

  • ENRICHMENT ANALYSIS, identifying the knowledge of functions, phenotypes, diseases and many others that are enriched/over-represented within an input list of protein domains; see Example Output.