Article: Mountains of new data are challenging old views



Genome 2.0
Mountains of new data are challenging old views
Patrick Barry

When scientists unveiled a draft of the human genome in early 2001, many
cautioned that sequencing the genome was only the beginning. The long list
of the four chemical components that make up all the strands of human DNA
would not be a finished book of life, but a road map of an undiscovered
country that would take decades to explore.

Only 6 years later, the landscape of the genome is already proving to be
dramatically different than most scientists had expected.

The established view of the genome began to take shape in 1958, just 5 years
after Francis Crick and James D. Watson worked out the structure of DNA. In
that year, Crick expounded what he called the "central dogma" of molecular
biology: DNA's genetic information flows strictly one way, from a gene
through a series of steps that ends in the creation of a protein. That
principle developed into a modern orthodoxy, according to which a genome is
a collection of discrete genes located at specific spots along a strand of
DNA. This old view got the basics right: that genes encode proteins and that
proteins do the myriad work necessary to keep an organism alive.

Researchers slowly realized, however, that genes occupy only about 1.5
percent of the genome. The other 98.5 percent, dubbed "junk DNA," was
regarded as useless scraps left over from billions of years of random
genetic mutations. As geneticists' knowledge progressed, this basic picture
remained largely unquestioned. "At one time, people said, 'Why even bother
to sequence the whole genome? Why not just sequence the [protein-coding
part]?'" says Anindya Dutta, a geneticist at the University of Virginia in
Charlottesville.

Closer examination of the full human genome is now causing scientists to
return to some questions they thought they had settled. For one, they're
revisiting the very notion of what a gene is. Rather than being distinct
segments of code amid otherwise empty stretches of DNA-like houses along a
barren country road-single genes are proving to be fragmented, intertwined
with other genes, and scattered across the whole genome.

Even more surprisingly, the junk DNA may not be junk after all. Most of this
supposedly useless DNA now appears to produce transcriptions of its genetic
code, boosting the raw information output of the genome to about 62 times
what genes alone would produce. If these active nongene regions don't carry
code for making proteins, just what does their activity accomplish?

"What we thought was important before was really just the tip of the
iceberg," says Hui Ge of the Whitehead Institute for Biomedical Research in
Cambridge, Mass.

With the genome sequence in hand, exploration has moved at a brisk pace
during the past 6 years. A milestone was reached in June, when a project
called the Encyclopedia of DNA Elements (ENCODE) thoroughly mapped the
functional regions in 1 percent of the human genome. The effort involved was
staggering: Thirty-five teams of scientists from around the world worked for
4 years and compiled more than 600 million data points, the consortium
reported in the June 14 Nature.

From the accumulating mountains of data, scientists are building a new
picture of how the genome works as a whole. They have found mutations in
nongene regions of DNA that are linked to common diseases such as diabetes
and forms of cancer. And some researchers propose that DNA once labeled junk
could have spawned the complex bodies of higher organisms-even the
complexities of the human brain.

SECOND FIDDLE TO A SUPERSTAR

In the emerging picture of the genome's functioning, many of the key
elements identified so far are molecules of RNA, a chemical cousin of DNA.

In the old central dogma, RNA had a strictly subservient role in the
all-important task of making proteins. An RNA molecule is made from units of
genetic code strung together, much like DNA. But while DNA has two strands
twisted together into a double helix, RNA usually has only a single strand.

Protein synthesis begins when the two strands of a section of DNA unzip.
Units of RNA then pair up with their counterparts on one of the DNA strands,
forming a complementary messenger RNA (mRNA) molecule. The mRNA detaches and
floats off to other parts of the cell, where it hooks up with machinery that
transcribes its coded message into a protein.

If RNA's only job were making proteins, then nearly all the RNAs produced in
cells should be transcripts of protein-coding genes. (A small fraction of
RNAs serve in the protein-transcription machinery.) But in 2005, Jill Cheng
and her colleagues at Affymetrix, a genomics company in Santa Clara, Calif.,
showed that less than half of the RNA produced by 10 of the chromosomes in
human cells represented transcripts of traditional genes. In the team's
experiments, 57 percent of the RNA was transcribed from noncoding, "junk"
regions.

The results from ENCODE were even more striking. In the slice of DNA studied
in that project, between 74 percent and 93 percent of the genome produced
RNA transcripts. What becomes of this tremendous output is uncertain. John
M. Greally of the Albert Einstein College of Medicine in New York says it's
likely that some portion of it is made accidentally and simply discarded.
But the discovery that so much of the genome is being transcribed into RNA
underscores how out-of-date the central dogma has become.

Indeed, the closer researchers look, the more functions they find that RNA
transcripts perform. An alphabet soup of new acronyms describes the newfound
roles of RNAs. First there were short nuclear RNAs (snRNAs) and short
nucleolar RNAs (snoRNAs), both of which reside inside the nucleus and help
control production of other RNAs. These were joined by microRNAs (miRNAs)
and short interfering RNAs (siRNAs), which can modulate the activity of
protein-coding genes. In mice, about 34,000 of the RNA transcripts produced
by the genome are nonprotein-coding, outnumbering the roughly 32,000
transcripts that code for proteins, according to a 2005 study by an
international group of scientists called the Functional Annotation of Mouse
Consortium.

These new families of RNAs add a layer of regulation that fine-tunes the
production of proteins. While scientists already knew that some proteins
influence the activity of other genes, "there are many more RNAs than
proteins that play a regulatory role," Ge says.

Gene regulation may not sound sexy, but it's a powerful way for a cell to
evolve complex behaviors using the tools-proteins-that it already has.
Consider the difference between a one-bedroom bungalow and an ornate,
three-story McMansion. Both are made from roughly the same materials-lumber,
drywall, wiring, plumbing-and are put together with the same tools-hammers,
saws, nails, and screws. What makes the mansion more complex is the way that
its construction is orchestrated by rules that specify when and where each
tool and material must be used.

In cells, regulation controls when and where proteins spring into action. If
the traditional genome is a set of blueprints for an organism, RNA
regulatory networks are the assembly instructions. In fact, some scientists
think that these additional layers of complexity in genome regulation could
be the answer to a long-standing puzzle.

GENOME AS NETWORK

The biggest surprise in the first sequence of the human genome was how few
protein-coding genes it contained.

"We humans do not have that many more genes than simpler organisms like
flies or mice," Ge says. Earlier guesses of the number of genes in humans
ran as high as 100,000, but the published sequence in fact contained only
about 23,000. That's not much more than the roughly 21,000 genes possessed
by the roundworm, a microscopic creature without a brain. If protein-coding
genes are the only functional elements in an organism's DNA, where does the
extra information come from that's needed to assemble and operate the
complex bodies and brains of people, as compared with the simplicity of
roundworms? "If we just look at the number of genes, it doesn't make sense,"
Ge says.

While the number of genes isn't much different in roundworms and people, the
human genome is 30 times the size of the roundworms'. People have a much
larger quantity of DNA beyond what codes for proteins. Since much of this
"junk" DNA is being transcribed into RNA, perhaps it's responsible for much
of the complexity of human bodies and brains. In fact, organisms simpler
than roundworms, such as single-celled bacteria, carry little noncoding DNA
and may have no regulatory RNA at all.

"Scientists have been suspecting that it is the regulatory networks that
lead to this amazing complexity" in higher organisms, Ge says.

John S. Mattick of the University of Queensland in Brisbane, Australia,
points to a known example of the importance of regulatory RNAs: their
crucial role in fetal development. For example, most multicellular animals
possess a gene called Notch that helps guide neural development. While the
gene itself has much the same form in both simple and complex animals, its
activity is regulated by miRNAs that are highly variable from one animal to
another. Such miRNAs also influence a gene called Hox, which acts in many
animals to define a fetus' body axis and the placement of its limbs.

What's more, the changes that distinguish human brains from those of
chimpanzees and other apes could be due in part to evolutionary changes in
RNAs that don't encode proteins. A group led by Katherine S. Pollard of the
University of California, Davis identified DNA sequences shared by people
and chimpanzees, but with large differences, meaning that they have evolved
rapidly since the two species shared a common ancestor.

The researchers found that one of these sequences is a noncoding region of
DNA that's related to brain function, they reported in the Sept. 14, 2006
Nature. Pollard and her colleagues speculate that this region produces a
regulatory RNA and that changes in this RNA contributed to the evolution of
the human brain.

With regulatory RNAs appearing to play such an instrumental role in animal
development, it's no surprise that scientists are finding disease-associated
mutations in regions of the genome formerly regarded as junk.

David Altshuler of the Broad Institute in Cambridge, Mass., and his
colleagues looked for DNA mutations in 1,464 patients with type 2 diabetes.
Three of the mutations that correlated with the disease were in DNA segments
that don't code for proteins, the team reported in the June 1 Science. Other
scientists have found mutations in noncoding DNA that link to diseases such
as autism, breast cancer, lung cancer, prostate cancer, and schizophrenia.

To be sure, the specific functions of most of the noncoding DNA remain
unknown. Projects such as ENCODE have focused on identifying the broad
functional categories for active regions of the genome without working out
the specific cellular function of each transcript, a task that will take
biologists years, if not decades.

In fact, scientists debate whether some fraction of the genome's copious RNA
output might do nothing at all. It may simply be that once the cellular
machinery that transcribes DNA into RNA gets started, it sometimes doesn't
know when to stop. On the other hand, making lots of RNA that does nothing
would be a waste of a cell's energy. That's something that natural systems
tend to avoid, so the fact of its production argues for at least some of
this RNA being biologically active.

THE GENE IS DEAD

In the old view, each gene sat in splendid isolation on its segment of the
genome. Other genes might be nearby, but scientists assumed that they didn't
overlap each other.

Now it's clear that a single length of DNA can be transcribed in multiple
ways to produce many different RNAs, some coding for proteins and others
constituting regulatory RNAs. By starting and stopping in different places,
the transcription machinery can generate a regulatory RNA from a length of
DNA that overlaps a protein-coding gene. Moreover, the code for another
regulatory RNA might run in the opposite direction on the facing strand of
DNA. According to the ENCODE project results, up to 72 percent of known
genes have transcripts on the facing DNA strand as well as the main strand.

"The same sequences are being used for multiple functions," says Thomas R.
Gingeras of Affymetrix. That introduces complications into the evolution of
the genome, which had until recently been assumed to act through single DNA
mutations affecting single genes. Now, "a mutation in one of those sequences
has to be interpreted not only in terms of [one gene], but [of] all the
other transcripts going through the region," Gingeras explains.

The implications of this single mutation-multiple consequence model are
still a matter of debate. In some cases, the RNA transcripts from DNA that
overlaps a protein-coding gene regulate that same gene, so a mutation could
affect both the structure and the regulation of a protein. But often, those
transcripts regulate genes that are far away, or even on different
chromosomes. This complex interweaving of genes, transcripts, and regulation
makes the net effect of a single mutation on an organism much more difficult
to predict, Gingeras says.

More fundamentally, it muddies scientists' conception of just what
constitutes a gene. In the established definition, a gene is a discrete
region of DNA that produces a single, identifiable protein in a cell. But
the functioning of a protein often depends on a host of RNAs that control
its activity. If a stretch of DNA known to be a protein-coding gene also
produces regulatory RNAs essential for several other genes, is it somehow a
part of all those other genes as well?

To make things even messier, the genetic code for a protein can be scattered
far and wide around the genome. The ENCODE project revealed that about 90
percent of protein-coding genes possessed previously unknown coding
fragments that were located far from the main gene, sometimes on other
chromosomes. Many scientists now argue that this overlapping and dispersal
of genes, along with the swelling ranks of functional RNAs, renders the
standard gene concept of the central dogma obsolete.

LONG LIVE THE GENE

Offering a radical new conception of the genome, Gingeras proposes shifting
the focus away from protein-coding genes. Instead, he suggests that the
fundamental units of the genome could be defined as functional RNA
transcripts.

Since some of these transcripts ferry code for proteins as dutiful mRNAs,
this new perspective would encompass traditional genes. But it would also
accommodate new classes of functional RNAs as they're discovered, while
avoiding the confusion caused by several overlapping genes laying claim to a
single stretch of DNA. The emerging picture of the genome "definitely shifts
the emphasis from genes to transcripts," agrees Mark B. Gerstein, a
bioinformaticist at Yale University.

Scientists' definition of a gene has evolved several times since Gregor
Mendel first deduced the idea in the 1860s from his work with pea plants.
Now, about 50 years after its last major revision, the gene concept is once
again being called into question.

Source: ScienceNews (I've sniped references and sources)
http://www.sciencenews.org/articles/20070908/bob9.asp

Posted by
Robert Karl Stonjek


.



Relevant Pages

  • News: Junk DNA proves functional
    ... 'Junk' DNA proves functional ... In a paper published in Genome Research on Nov. 4, ... Institute of Singapore (GIS) report that what was previously believed to be ... other genes, bind specific repeat elements. ...
    (sci.bio.evolution)
  • Re: natural transgenic gene movement
    ... definite occurenceof low-frequency movement of genes between species ... DNA caneven travel via the placenta into the unborn."So ... the rest of the genesin the fruit fly's genome have fought ... DNA of humans issourced from mobile DNA."Not to mention the fruit ...
    (talk.origins)
  • Re: Cold water on micro-RNAs
    ... There is a cost for replicating the genome, ... There are postulated, sequence-independent, uses for DNA. ... same genes in chickens as they are in humans. ... reduce the size of the transcript to half a million nucleotides. ...
    (talk.origins)
  • Re: DNA, RNA and Protein questions
    ... means DNA came first. ... the first replicators most likely used RNA as ... a RNA genome has never been successfully proven to exist. ...
    (talk.origins)
  • Re: Some more Bull****! Whats the 1.2-5% dna Diff! (Chimp to man)
    ... DNA by only 1.2-5%? ... the expression of various genes. ... Forces Shaping the Fastest Evolving Regions in the Human Genome ... Arbiza et al on selection amongst 13,198 genes in chimps and humans ...
    (talk.origins)

Loading