Standardized Representations of the Literature:
Combining Diverse Sources of Ribosomal Data
Russ B. Altman, Neil F. Abernethy & Richard O. Chen
Stanford Section on Medical Informatics
SUMC, MSOB X-215, Stanford, CA, USA, 94305-5479
(415) 725-3394, fax: (415) 725-7944, {rba, nfa, rchen}@smi.stanford.edu
Abstract
We are
building a knowledge
base (KB) of published structural data on the 30s ribosomal subunit
in prokaryotes. Our KB is distinguished by a standardized
representation
of biological experiments and their results, in a reusable
format. It can
be accessed by computer programs that exploit the rich
interconnections
within the data. The KB is designed to support the
construction of 3D models
of the 30S subunit, as well as the analysis and
extension of relevant functional
and phylogenetic information. Most
published information about the structure
of the ubiquitous ribosome
focuses on E. coli as a model system.
At the same time,
thousands of RNA sequences for the ribosome have been
gathered and
cataloged. The volume and complexity of these data can complicate
attempts to separate structural data peculiar to E. coli from
data of universal relevance. We have written an application that
dynamically
queries the KB and the Ribosome Database Project, a
repository of ribosomal
RNA sequences from other organisms, in order to
assess the relevance of
structural data to particular organisms. The
application uses the RDP
alignment to determine whether a set of data
refer primarily to conserved,
mismatched , or gapped positions. For a
set of 16 representative articles
evaluated over 211 sequences, 84% of
observations have unambiguous translations
from E. coli to the
other organisms, 12% have somewhat ambiguous
translations, and 4% have no
translations. There is a wide variation in
these numbers over different
articles and organisms, confirming that some
articles report structural
information specific to E. coli while
others report information
that is quite general.
This page under construction. It will contain the detailed results of our analysis in an Excel spreadsheet.