News: Human Gene Count Tumbles Again
- From: "Robert Karl Stonjek" <rstonjek@xxxxxxxxxxxxxx>
- Date: Thu, 17 Jan 2008 13:34:35 -0500 (EST)
Human Gene Count Tumbles Again
ScienceDaily (Jan. 15, 2008) - Estimates of the number of genes in the human
genome have ranged wildly over the past two decades, from 20,000 all the way
up to 150,000. By the time the working draft of the human genome was
published in 2001, the best approximation stood at 35,000, yet even that
number has fallen. A new analysis, one that harnesses the power of comparing
genome sequences of various organisms, now reveals that the true number of
human genes is about 20,500, thousands fewer than what is currently listed
in human gene catalogs.
The work, led by researchers at the Broad Institute of MIT and Harvard, has
implications beyond merely settling the debate over how many genes are in
the human genome. An accurate gene count can help identify the locations of
genes and their functions, an important step in translating genomic
information into biomedical advances.
Ironically, the way genes are recognized has triggered much of the confusion
over the human gene count. Scientists on the hunt for typical genes - that
is, the ones that encode proteins - have traditionally set their sights on
so-called open reading frames, which are long stretches of 300 or more
nucleotides, or "letters" of DNA, bookended by genetic start and stop
signals. This method produced the most recent gene count of roughly 25,000,
but the number came under scrutiny after the 2002 publication of the mouse
genome revealed that many human genes lacked mouse counterparts and vice
versa.
Such a discrepancy seemed suspicious in part because evolution tends to
preserve gene sequences - genes, by virtue of the proteins they encode,
usually serve crucial biological roles. But like it or not, the 25,000 DNA
sequences were already listed in the catalogs of human protein-coding genes,
and skeptics had no systematic way to remove them. "At that point, no one
had gone through the gene catalogs with a fine-toothed comb to find evidence
that they weren't valid," said Michele Clamp, first author of the study and
senior computational biologist at the Broad Institute.
Far from blatant mistakes, non-gene sequences can masquerade as true genes
if they are long enough and happen by chance to fall between start and stop
signals. Despite having gene-like characteristics, these open reading frames
may not encode proteins. Instead, they might have other functions or
possibly none at all.
To distinguish such misidentified genes from true ones, the research team,
led by Clamp and Broad Institute director Eric Lander, developed a method
that takes advantage of another hallmark of protein-coding genes:
conservation by evolution. The researchers considered genes to be valid if
and only if similar sequences could be found in other mammals - namely,
mouse and dog. Applying this technique to nearly 22,000 genes in the Ensembl
gene catalog, the analysis revealed 1,177 "orphan" DNA sequences. These
orphans looked like proteins because of their open reading frames, but were
not found in either the mouse or dog genomes.
Although this was strong evidence that the sequences were not true
protein-coding genes, it was not quite convincing enough to justify their
removal from the human gene catalogs. Two other scenarios could, in fact,
explain their absence from other mammalian genomes. For instance, the genes
could be unique among primates, new inventions that appeared after the
divergence of mouse and dog ancestors from primate ancestors. Alternatively,
the genes could have been more ancient creations - present in a common
mammalian ancestor - that were lost in mouse and dog lineages yet retained
in humans.
If either of these possibilities were true, then the orphan genes should
appear in other primate genomes, in addition to our own. To explore this,
the researchers compared the orphan sequences to the DNA of two primate
cousins, chimpanzees and macaques. After careful genomic comparisons, the
orphan genes were found to be true to their name - they were absent from
both primate genomes. This evidence strengthened the case for stripping
these orphans of the title, "gene."
After extending the analysis to two more gene catalogs and accounting for
other misclassified genes, the team's work invalidated a total of nearly
5,000 DNA sequences that had been incorrectly added to the lists of
protein-coding genes, reducing the current estimate to roughly 20,500.
In addition to suggesting a major revision to the human gene count, this
work provides a set of rules for evaluating any future proposed additions to
the human gene catalog. It also underscores the benefit of genome sequencing
projects. "Without several primate genomes, we wouldn't have been able to
put the final nail in the coffin of these putative genes," said Clamp.
More broadly, the research reveals that little invention of genes has
occurred since mammalian ancestors diverged from the non-mammalian lineage.
"There's no real creativity going on in the mammalian genome," explained
Clamp. That means that the number, structure, and function of protein-coding
genes are not expected to differ very much from mammal to mammal, so what
makes humans different from mice and dogs likely lies outside this realm of
the genome. Clamp and her Broad Institute colleagues are now peering into
the genomes of many other mammals, in an attempt to explain what parts of
our genome truly make us human.
Journal reference: Clamp M et al. Distinguishing protein-coding and
noncoding genes in the human genome. Proc. Natl. Acad. Sci. USA. DOI:
10.1073/pnas.0709013104
Adapted from materials provided by Broad Institute of MIT and Harvard.
Broad Institute of MIT and Harvard (2008, January 15). Human Gene Count
Tumbles Again. ScienceDaily. Retrieved January 16, 2008, from
http://www.sciencedaily.com/releases/2008/01/080113161406.htm
Posted by
Robert Karl Stonjek
.
- Prev by Date: Re: Nei's "new mutation theory" resurrects William Bateson
- Next by Date: Mutationism Redux
- Previous by thread: Angioplasty animation
- Next by thread: Mutationism Redux
- Index(es):