Results from the Encyclopedia of DNA Elements consortium have provided the first systematic and comprehensive look at how gene expression is regulated in humans. Although the data provide the most detailed picture yet of the human genome since its complete sequencing over a decade ago, the new information-much like the initial sequencing of the genome-has no immediate application to drug discovery. Rather, the data provide researchers with a more focused starting point for formulating new hypotheses about which targets to pursue.

A decade ago, the Encyclopedia of DNA Elements (ENCODE) consortium set out to catalog the function of the entire genome, not just the protein-coding portions previously thought to be the most important parts. The consortium used a variety of DNA- and RNA-profiling techniques on 147 human cell and tissue types to construct a database of complex genetic interactions that regulate the activity of genes.

The consortium's first-pass analysis of the genome, published last week in 32 papers across 6 journals, garnered headlines in the press for explaining the function of noncoding or 'junk' DNA in coordinating gene expression. Although regulatory functions of noncoding DNA have been observed previously, the extent to which these sequences influence gene expression was a surprise.

However, gauging the true significance of noncoding DNA in disease will require years of old-fashioned experimental validation. In fact, the consortium uncovered considerably more information about genome interactions than just the junk DNA data that grabbed headlines.

A case in point is a study by University of Washington researchers on the regulatory functions of genomic regions previously implicated in disease.1 The study, arguably the most disease-relevant of the 32 papers published, showed that many noncoding DNA markers flagged in previous genomic studies of disease may control the expression of distant genes rather than of near neighbors, as is generally thought.

For academics, the entire set of findings provides a window into the complex structure of the genome. For industry, the findings are likely to spark a re-examination of which genes are truly regulated by noncoding regions identified in genomewide association (GWA) studies of common cardiovascular, metabolic, neurological and autoimmune diseases.

Collectively, the ENCODE studies paint a picture of how the genome's noncoding DNA-up to 80% of total DNA-coordinates the production of protein-coding mRNA.

The ENCODE data "rewrites the way we think about the genome," said Philip Gregory, CSO and VP of research at Sangamo BioSciences Inc. "Even the people who were into transcriptional regulation are surprised by the extent to which the genome controls its own gene expression profile."

Companies that stand to gain the most from the discoveries are those with knockdown technologies that can rapidly screen for biological effects of modulating the genes that the ENCODE consortium identified as key disease players.

Long-distance runaround

The University of Washington team, led by ENCODE consortium member John Stamatoyannopoulos, combined data from prior GWA studies and ENCODE's new information about the genome's physical interactions to predict which genes are central to disease.

The ENCODE study "gives us a framework for comprehensively analyzing the epigenetic basis of disease traits," said Stamatoyannopoulos, associate professor of genome sciences and medicine.

His team used an in vitro chromosome-mapping technique to identify binding sites for transcriptional regulators throughout the genomes of 349 cell and tissue types from healthy individuals.

The group then compared the map of these regulatory sequences with 5,654 SNPs in noncoding regions drawn from 207 GWA studies. These SNPs were previously identified as hereditary factors in at least one disease.

Finally, the researchers used a physical mapping technique to identify gene promoters that were most likely to be activated by the proteins that bound to the SNP sites.

When these three data sets were superimposed, the resulting map pointed to the regions of the genome likely to be regulated by the disease-associated SNPs.

The surprise came from comparing the locations of disease-linked SNPs and their target genes. Previously, the assumption was that most regulatory DNA sequences affected the expression of nearby genes. Instead, the Stamatoyannopoulos team found that many disease-linked SNPs directly affected the expression of distant genes (see "Rethinking disease markers").

Stamatoyannopoulos suspects this long-range regulation occurs because of physical interactions between the SNPs and their target genes across the complex 3D structure of chromatin.

"People had assumed that if a SNP is near a gene, maybe it's affecting a nearby gene," said Stamatoyannopoulos. "But we found that regulatory DNA is controlling genes that are located 10-12 genes away."

Results were published in Science. The raw data from the study are freely available, and Stamatoyannopoulos has filed patents on some of the analytical methods used in the study. The patents are available for licensing.

Other analyses by the ENCODE consortium were simultaneously published last week in 31 other papers in Nature, Genome Research, Genome Biology, BMC Genetics, Cell and Science.

SNP off the old block

The findings suggest a slew of new potential players in many disease categories, but proving those proteins are bona fide targets will require independent experimental validation of the team's findings in cell culture and animal models.

Meanwhile, the results are forcing a rethink of previous GWA study findings.

For example, Stamatoyannopoulos' team compiled lists of genes that could be pivotal in various cancers and autoimmune, metabolic and neurodegenerative diseases. Many of these genes had been overlooked by previous GWA analyses because their immediate chromosomal environment did not have SNPs associated with disease.

Eric Schadt, professor and chair of genetics and genomic sciences and director of the Institute for Genomics and Multiscale Biology at Mount Sinai School of Medicine, said the findings provide a mechanism to explain prior observations about the complex regulation of gene expression.

Schadt is cofounder of Sage Bionetworks, a not-for-profit systems biology institute that spun out of Merck & Co. Inc.'s shuttered Rosetta Inpharmatics unit in 2009.

In 2008, Schadt's team reported results from Rosetta's comprehensive analysis of gene expression that pointed to central players in metabolic disease.2 Schadt said Stamatoyannopoulos' findings identify the likely regulatory sites and the transcription factors that control expression of those genes.

"As opposed to just relying on inferred causal connections from gene expression, these data help us to identify the specific proteins that are responsible for the biological effects that we observed," said Schadt. "These data will help us to identify causal variants and identify the proteins involved in the changes in gene expression that we identified as being important in disease."

The new findings also will be useful in helping to winnow the results of GWA studies down to the most critical players in disease. GWA studies are designed to find relatively common genetic variants that each contribute modestly to disease risk.

The goal of GWA studies is to gain insights into disease mechanisms, but this has proven difficult to do because so many GWA hits are in noncoding DNA with nonobvious biological effects.3

Stamatoyannopoulos said industry has largely steered clear of GWA studies because it has been hard to understand what disease-associated SNPs actually do.

"The traditional disease target approach is to see what's upregulated and downregulated and go after that," said Stamatoyannopoulos. "Now that there's a real way to connect these SNPs to targets in the genome," it should be possible to uncover the downstream genes that are the true drivers of disease.

From hit to target

The challenge now is to show that hitting the regulatory elements and their target genes implicated by Stamatoyannopoulos' study can affect disease.

Gregory thinks Sangamo's zinc finger nuclease technology could be useful for studying the effect of tweaking the regulatory sites identified in Stamatoyannopoulos' study.

The company's technology is "capable of introducing mutations that change regulatory elements" identified by Stamatoyannopoulos' team, he said. "This would allow us to determine which of these are truly causative in disease."

Sangamo is collaborating with Stamatoyannopoulos to study how distant regulatory elements influence transcription of globin genes, which are misregulated in thalassemias, a common class of blood disorders.

Targeting the genes controlled by the disease-associated regulatory regions is a greater challenge because it is not immediately clear which genes to focus on and how to modulate their activity.

Moreover, many of the key genes identified by the Washington team are transcription factors, which are hard to drug.

Tim Harris, SVP of translational medicine and biochemistry at Biogen Idec Inc., said companies will be poring over the data from the ENCODE consortium and from Stamatoyannopoulos' study to identify targetable candidate regions.

"The most immediate effect on drug discovery will be for companies that can look at these control regions to see how direct interference with them might affect disease," said Harris. He noted that companies using RNA interference technologies are well positioned to rapidly validate new targets suggested by the studies.

"Companies like Alnylam Pharmaceuticals Inc. and Isis Pharmaceuticals Inc., which are interested in antisense approaches, will be able to make more of the data more immediately than most other people," said Harris.

Biogen Idec and Isis are developing preclinical antisense candidates for a range of neurological diseases.

Biogen Idec and Alnylam have a collaborative research agreement to discover RNA interference-based therapeutics for progressive multifocal leukoencephalopathy (PML). Last month, Biogen Idec partnered with Regulus Therapeutics Inc. to identify microRNA biomarkers in blood from patients with multiple sclerosis (MS). Regulus is a joint venture between Alnylam and Isis.

Osherovich, L. SciBX 5(36); doi:10.1038/scibx.2012.945
Published online Sept. 13, 2012


1.   Maurano, M.T. et al. Science; published online Sept. 5, 2012; doi:10.1126/science.1222794
Contact: John A. Stamatoyannopoulos, University of Washington, Seattle, Wash.

2.   Chen, Y. et al. Nature 452, 429-435 (2008)

3.   Edelson, S. & Osherovich, L. SciBX 2(16); doi:10.1038/scibx.2009.64


Alnylam Pharmaceuticals Inc. (NASDAQ:ALNY), Cambridge, Mass.

Biogen Idec Inc. (NASDAQ:BIIB), Weston, Mass.

Isis Pharmaceuticals Inc. (NASDAQ:ISIS), Carlsbad, Calif.

Merck & Co. Inc. (NYSE:MRK), Whitehouse Station, N.J.

Mount Sinai School of Medicine, New York, N.Y.

Regulus Therapeutics Inc., San Diego, Calif.

Sage Bionetworks, Seattle, Wash.

Sangamo BioSciences Inc. (NASDAQ:SGMO), Richmond, Calif.

University of Washington, Seattle, Wash.