Unison's primary goal is to decrease the barriers associated with answering
both simple and complex questions about protein function and structure.
Unison was designed with three key functional goals in mind:
Sequence analysis
Given a sequence, provide a reliable, up-to-date source of
features.
Feature-based mining
Given a set of protein characteristics, identify matching
proteins. This is the inverse of sequence analysis.
Example: Identify Human proteins that contain 1) an
immunoglobulin (ig) domain by Pfam HMM or structure prediction, 2) a
transmembrane (TM) domain, and 3) an intracellular immunotyrosine
inhibitory motif (ITIM), in that order, and 4) that have a mouse
ortholog with similar feature composition.
Hypothesis generation
Analyze the function of a set of proteins in terms of their
constituent features.
Example: In a set of known or putative ITIM-containing
proteins, which extracellular domains occur and how frequently? Again,
with orthologs?
Unique Features
A few of Unison's distinguishing features are:
Unison is comprehensive.
Unison contains a superset of all database sources from all
species. This is required to be confident in the completeness of
queries. Sequences are stored non-redundantly so that you'll never hit
the same exact sequence twice.
Unison integrates diverse protein characteristics.
Sequence properties, functional regions, homology, structure
prediction, and many other predictions are available. Furthermore,
Unison allows multiple predictions of the same type with different
runtime parameters. Because Unison stores digests of the prediction
results, querying is much more sophisticated and accurate that keyword
searching.
Unison is easy to maintain and update.
Unison's release "flow" is nearly fully automated and updates are
incremental -- only new sequences and new features are computed.
Unison incorporates auxiliary data to enable expressive queries
and rich interpretation of results.
These data include Gene Ontology, Structural Classification of
Proteins, NCBI HomoloGene, NCBI GeneRIF, the Protein Data Bank (PDB),
and others.
Unison is freely available to use and download.
The database schema, tools, web pages, and non-proprietary data
are released under the Academic Free License,
an OpenSource (TM) approved
license. The database and web interface are available for public
access (query times are limited).
* The public version of Unison
contains only public sequences and results of non-proprietary methods.
The entire schema and loading tools are included with the public release;
institutions may load these proprietary data if they wish.
Take the Tour
Much more sophisticated queries are possible using the Perl API
and the PostgreSQL interactive SQL interpreter. Please see
the Unison tour for real-life examples
and a demonstration of some of Unison's features.