Unison Overview [Concept]

Unison In a Nutshell

Goals

Unison's primary goal is to decrease the barriers associated with answering both simple and complex questions about protein function and structure. Unison was designed with three key functional goals in mind:
Sequence analysis
Given a sequence, provide a reliable, up-to-date source of features.
Feature-based mining
Given a set of protein characteristics, identify matching proteins. This is the inverse of sequence analysis.
Example: Identify Human proteins that contain 1) an immunoglobulin (ig) domain by Pfam HMM or structure prediction, 2) a transmembrane (TM) domain, and 3) an intracellular immunotyrosine inhibitory motif (ITIM), in that order, and 4) that have a mouse ortholog with similar feature composition.
Hypothesis generation
Analyze the function of a set of proteins in terms of their constituent features.
Example: In a set of known or putative ITIM-containing proteins, which extracellular domains occur and how frequently? Again, with orthologs?

Results Sliced to Order [Results Cube]
Conceptually, prediction results are stored in a sparse cube. The axes of the cube are 1) distinct sequences, 2) feature types/models, and 3) parameters. The elements of the cube store structured data appropriate for the prediction. [Click for a larger image.]

Unique Features

A few of Unison's distinguishing features are:
Unison is comprehensive.
Unison contains a superset of all database sources from all species. This is required to be confident in the completeness of queries. Sequences are stored non-redundantly so that you'll never hit the same exact sequence twice.
Unison integrates diverse protein characteristics.
Sequence properties, functional regions, homology, structure prediction, and many other predictions are available. Furthermore, Unison allows multiple predictions of the same type with different runtime parameters. Because Unison stores digests of the prediction results, querying is much more sophisticated and accurate that keyword searching.
Unison is easy to maintain and update.
Unison's release "flow" is nearly fully automated and updates are incremental -- only new sequences and new features are computed.
Unison incorporates auxiliary data to enable expressive queries and rich interpretation of results.
These data include Gene Ontology, Structural Classification of Proteins, NCBI HomoloGene, NCBI GeneRIF, the Protein Data Bank (PDB), and others.
Unison is freely available to use and download.
The database schema, tools, web pages, and non-proprietary data are released under the Academic Free License, an OpenSource (TM) approved license. The database and web interface are available for public access (query times are limited).
* The public version of Unison contains only public sequences and results of non-proprietary methods. The entire schema and loading tools are included with the public release; institutions may load these proprietary data if they wish.

Take the Tour

Much more sophisticated queries are possible using the Perl API and the PostgreSQL interactive SQL interpreter. Please see the Unison tour for real-life examples and a demonstration of some of Unison's features.