"In vivo enhancer analysis of human conserved non-coding sequences",
Len A. Pennacchio 1, 2, Nadav Ahituv 2, Alan M. Moses 2, Shyam Prabhakar 2, Marcelo A. Nobrega 2, 5, Malak Shoukry 2, Simon Minovitsky 2, Inna Dubchak 1, 2, Amy Holt 2, Keith D. Lewis 2, Ingrid Plajzer-Frick 2, Jennifer Akiyama 2, Sarah De Val 4, Veena Afzal 2, Brian L. Black 4, Olivier Couronne 1, 2, Michael B. Eisen 2, 3, Axel Visel 2, and Edward M. Rubin 1, 2
1 US Department of Energy Joint Genome Institute, Walnut
Creek, California 94598, USA
2 Genomics Division, MS 84-171, Lawrence Berkeley National
Laboratory, Berkeley, California 94720, USA
3 Molecular and Cellular Biology Department, University
of California-Berkeley, California 954720, USA
4 Cardiovascular Research Institute, University of California,
San Francisco, California 94143-2240, USA
5 Present address: Department of Human Genetics, University
of Chicago, Chicago, Illinois 60637, USA.
Correspondence to: Len A. Pennacchio 1, 2 Correspondence and requests for materials should be addressed to L.A.P. ( Email: LAPennacchio@lbl.gov )
Identifying the sequences that direct the spatial and temporal expression
of genes and defining their function in vivo remains a significant
challenge in the annotation of vertebrate genomes. One major obstacle is
the lack of experimentally validated training sets. In this study, we made
use of extreme evolutionary sequence conservation as a filter to identify
putative gene regulatory elements, and characterized the in vivo
enhancer activity of a large group of non-coding elements in the human
genome that are conserved in human–pufferfish, Takifugu (Fugu)
rubripes,
or ultraconserved [1] in human–mouse–rat. We tested
167 of these extremely conserved sequences in a transgenic mouse enhancer
assay. Here we report that 45% of these sequences functioned reproducibly
as tissue-specific enhancers of gene expression at embryonic day
11.5. While directing expression in a broad range of anatomical structures
in the embryo, the majority of the 75 enhancers directed expression to
various regions of the developing nervous system. We identified sequence
signatures enriched in a subset of these elements that targeted forebrain
expression, and used these features to rank all ~3,100 non-coding elements
in the human genome that are conserved between human and Fugu. The
testing of the top predictions in transgenic mice resulted in a threefold
enrichment for sequences with forebrain enhancer activity. These data dramatically
expand the catalogue of human gene enhancers that have been characterized
in
vivo, and illustrate the utility of such training sets for a variety
of biological applications, including decoding the regulatory vocabulary
of the human genome.
Editor's Summary:
23 November 2006
"Gene regulators unmasked".
Identifying the non-coding DNA sequences that act at a distance to regulate patterns of gene expression is not a simple matter; one useful pointer is evolutionary sequence conservation. An in vivo analysis of 167 non-coding elements in the human genome that are extremely conserved based on comparisons with pufferfish, rat and mouse genomes, has identified 75 previously unknown tissue-specific enhancers. These are active in embryos on day 11, most of them directing expression in the developing nervous system. The success of this method suggests that the further 5,500 non-coding sequences conserved between humans and pufferfish may yield another new batch of gene enhancers.
Significant progress has been made in the identification of core promoter elements based on their defined position immediately upstream of each gene and their nearly universal activation by RNA polymerase II [2, 3]. However, the identification of distant acting gene regulatory sequences that direct precise spatial and temporal patterns of expression has been limited, despite their established roles in development [4], phenotypic diversity [5] and human disease [6, 7, 8]. Comparative genomic-based approaches have proved to be useful in identifying gene regulatory sequences, primarily on a gene-by-gene basis. These studies involved sequence comparisons of human (or other vertebrate) genomic intervals to orthologous regions from organisms separated by varying evolutionary distances, ranging from primates to fish [9, 10, 11, 12]. From this work it has been implied that ancient conservation (such as between human and fish) as well as 'ultra'-conservation among mammals (sequences at least 200 base pairs (bp) in length that are 100% identical among human/mouse/rat) [1] may be useful indicators of sequences with an increased likelihood of demonstrating gene regulatory activity. These gene-centric investigations, however, have identified only a relatively small number of distant-acting enhancer sequences.
As one of the goals of this work was to assess the validity of a genome-based approach, rather than a gene-centric one, we chose non-coding target sequences based on one of two 'extreme' comparative genomic criteria: ancient conservation between human and Fugu (separated by 450 million years of evolution) or ultra-conservation among human/mouse/rat [1]. In total, 167 human DNA fragments were assessed for spatial enhancer activity in a well-established transgenic mouse enhancer assay that links the human conserved fragment to a minimal mouse heat shock promoter fused to a lacZ reporter gene [10, 13, 14, 15, 16]. We chose to determine tissue-specific reporter gene expression at embryonic day 11.5 (e11.5), as this developmental stage allows for whole-mount staining and whole-embryo visualization. Moreover, at this time-point many of the major tissues and organs have been specified. We also expected this stage to be particularly informative because 'extreme' conserved non-coding elements tend to be enriched and clustered near genes expressed during embryonic development [1, 12, 17, 18].
Overall, we found that 29% (24/83) of human–Fugu elements
alone
and 61% (33/54) of human–Fugu elements that are also ultraconserved
were positive enhancers in this in vivo assay (Fig.
1; Supplementary Table 1; the entire
data set including the sequence coordinates, conservation, and whole-mount
embryo digital imagery can be accessed and queried at the VISTA Enhancer
Browser, http://enhancer.lbl.gov).
FIGURE 1. A summary of all sequences tested for enhancer activity
in transgenic mice.
FIGURE 1. A summary of all sequences tested for enhancer activity in transgenic mice.
a, A breakdown of the assayed non-coding sequences by human–Fugu conservation and/or human–rodent ultraconservation: Human–Fugu only, human–Fugu and human–rodent, or human–rodent only.
b, The total percentage of positive human enhancers broken down by the same parameters as described in a.
The total number of elements tested is indicated within a, while the number of positives is found above the bars of the graph in b.
As an example of these data, we present 23 elements meeting our selection
criteria that were located in a gene-poor 2.5 Mb stretch bracketing
SALL1,
a gene encoding a transcription factor expressed in early development and
mutated in Townes-Brocks syndrome [19] (Fig. 2).
FIGURE 2. A 3 Mb region of human chromosome 16 enriched for human–Fugu
non-coding conservation flanking the SALL1 gene.
FIGURE 2. A 3 Mb region of human chromosome 16 enriched for human–Fugu non-coding conservation flanking the SALL1 gene.
The coordinates and gene annotations located at the top of the diagram are based on the hg17 assembly at the UCSC Genome Browser (http://genome.ucsc.edu). The middle tracks depict human fragments that were tested in the transgenic mouse enhancer assay, and their classification as either 'negative' or 'positive' refers to their enhancer activity at e11.5. All human elements tested were conserved in the Fugu genome, and two of these elements were also defined as ultraconserved (denoted by arrowheads). The bottom panel indicates the positive enhancer activities captured through transgenic mouse testing of human–Fugu conserved non-coding fragments in this interval.
Seven of the elements flanking SALL1 directed tissue-specific reporter gene expression in the transgenic in vivo assay, recapitulating aspects of SALL1's endogenous expression characteristics at e11.520 and further supporting the postulated modular nature of distant acting gene enhancers [21, 22]. In addition, we tested 30 ultraconserved non-coding sequences that lacked identifiable conservation with Fugu of which 18 (60%) functioned as enhancers, similar to the success rate observed for ultraconserved elements that also have Fugu conservation (Fig. 1). Whereas the average size of the human fragments tested was 1,270 bp, the positive enhancers overlapped longer human–rodent conserved regions (average length 1,630 bp versus 966 bp; t-test P-value=0.0087; see Supplementary Methods) and were more conserved among mammals (human–rodent conservation score, t-test P-value=0.0004; see Supplementary Methods) relative to negatives in the assay.
These experimental results reveal the high propensity of extremely conserved human non-coding sequences to behave as transcriptional enhancers in vivo, and support both ancient human–fish conservation and human–rodent ultraconservation as highly effective filters to identify such functional elements. The large percentage of elements positive for enhancer activity is particularly surprising, considering the single time-point of investigation and the likely possibility that a fraction of the negatives may be enhancers active either earlier or later in development. An important question arising from the significant fraction of ultra and Fugu conserved elements functioning as enhancers is whether the tissue-specific enhancer activity that we assess completely explains why these sequences are so constrained. Overlaying our data set with results from a recent ChIP-Chip study [23] indicates that at least seven of the elements reported here (including four that are enhancers at e11.5) presumably function as gene silencers in embryonic stem cells. Such data imply that functions in addition to tissue-specific transcriptional activation are embedded in some fraction of extremely conserved non-coding elements, thus potentially contributing to their extreme level of constraint. However, the high efficiency of enhancer identification through this approach nonetheless suggests that tissue-specific transcriptional enhancer activity may be one of the predominant functions of non-coding genomic regions under extreme constraint throughout vertebrate evolution.
We categorized all 75 identified enhancers by their general anatomical
patterns of expression using an existing standardized nomenclature [24]
(Fig. 3).
FIGURE 3. Grouping of positive expression patterns captured in
the transgenic mouse enhancer assay.
FIGURE 3. Grouping of positive expression patterns captured in the transgenic mouse enhancer assay.
The total number of elements displaying a given anatomical pattern is depicted by the height of the bars in the chart. A representative transgenic embryo is provided for each expression pattern. Elements with reproducible staining in more than one structure are included in each respective category.
All positive enhancer annotations are based on a minimum of three independent transgenic F0 embryos carrying the same construct and demonstrating the same expression pattern, though the majority (83%) had four or more supporting embryos. We observed reporter gene expression in a variety of anatomical regions, including embryonic structures that are subject to major morphogenetic and remodelling events at e11.5, such as the developing limb, the somites, the heart and the branchial arches (Fig. 3). Of the 16 distinct anatomical structures where expression was noted, it was most frequently observed in the central and peripheral nervous system, with the most prevalent patterns corresponding to forebrain, midbrain, neural tube, and hindbrain (Fig. 3). This bias may be partially explained by the intrinsic complexity of the genetic cascades underlying vertebrate nervous system development [25] as well as the high percentage of all genes that are expressed in the nervous system.
The majority of the enhancers (50 elements, 66%) directed reproducible expression only to a single anatomical structure at the resolution of whole-mounts. This is consistent with the notion that complex endogenous messenger RNA expression patterns commonly result from the combined effects of several independent cis-regulatory sequences. The remaining one-third (25/75) of the enhancers directed expression to two or more anatomical structures. We speculate that these enhancer elements may be composed of two or more adjacent functional modules that are too tightly linked to each other to be resolved by our comparative approach, or that several tissue-specific enhancer activities overlap within a single enhancer element that is used in more than one developmental process. Importantly, the enhancer data set reported here provides a sizeable sequence-based substrate to begin to dissect these possible regulatory mechanisms, as well as reagents for further in-depth biological investigation.
To explore if our in vivo enhancer data set could be used
to identify sequence features associated with elements driving reporter
gene expression in specific anatomical structures, we focused on
the forebrain as a test case and selected as a training set four
of the strongest enhancers identified early on in our survey. Using a motif
finding strategy, we identified six motifs significantly over-represented
in these enhancers (see Supplementary Methods).
We then scored and ranked all 3,100 human–Fugu conserved non-coding
elements in the human genome (Supplementary
Table 2) for the statistical over-representation of these putative
forebrain motifs (Supplementary Methods).
The 30 highest-ranking elements included the four known forebrain enhancers
that constituted the training set as well as 26 additional elements
(Supplementary Table 3). Of these 26 elements,
23 were successfully cloned and tested for in vivo enhancer activity
in transgenic mice. We observed robust forebrain enhancer activity for
4 of the 23 elements (17%) tested in the transgenic assay system. By comparison,
only 4 (5%) of the 77 otherwise uncharacterized human–Fugu conserved
elements used to identify the training set were forebrain enhancers
(see Supplementary Methods;
Fig.
4).
FIGURE 4. Application of a forebrain enhancer training set to
identify forebrain-specific enhancer sequences elsewhere in the human genome.
FIGURE 4. Application of a forebrain enhancer training set to identify forebrain-specific enhancer sequences elsewhere in the human genome.
a, The four human–Fugu chromosome 16 forebrain enhancers used in the training set.
b, The four positive forebrain enhancers from the 23 human–Fugu genome-wide elements predicted to direct forebrain expression based on the training set in a. The UCSC human genome coordinates of the tested fragments (May 2004) and the flanking gene(s) are provided, as well as three representative embryos. Numbers indicate the ratio of forebrain-positive to the total number of stained embryos.
This preliminary result, although based on a small training set,
indicates that a combined comparative and motif-based strategy provided
a greater than threefold enrichment (P = 0.08, see Supplementary
Methods) over comparative-only approaches for the identification of
enhancers active in a particular tissue of interest. This initial computational
investigation also highlights the need for larger characterized enhancer
training sets, the annotation of tissue specificities at high spatial resolution
and the development of improved computational methods, which will probably
provide a substrate to conclusively establish the predictive power of such
approaches.
This study provides quantitative support for previous anecdotal
observations that 'extreme' evolutionary non-coding conservation
is a powerful predictor of mammalian tissue-specific enhancers. Of note,
there are at least an additional 5,500 human–fish conserved non-coding
sequences in the human genome with similar levels of constraint that are
strong candidates for acting as gene enhancers [11].
The efficiency of enhancer identification coupled with the relatively high
throughput transgenic assay used here represents a feasible approach for
the generation of a genome-wide experimentally validated enhancer data
set. Such collections are expected to define functional candidate regions
as medical sequencing efforts escalate, as well as provide a foundation
for inferring the network of regulatory interactions among key developmental
genes during vertebrate development, analogous to the well-developed efforts
in non-vertebrate model systems [21, 22]. Regulatory
insights derived from these analyses should also enable the creation of
modules driving pre-determined expression patterns for various biological
applications, as well as contribute to an understanding of the vocabulary
and grammar of DNA sequences dictating gene expression.
Methods
Identification of conserved elements and transgenic enhancer assay
Human–Fugu conserved non-coding elements with 70% identity, a score of match-mismatch >/=60, and lacking evidence of encoding a protein or being transcribed in mRNA were derived from whole-genome alignments (see Supplementary Methods). The coordinates of ultraconserved elements were retrieved from ref. 1. Conserved elements were amplified from human genomic DNA by polymerase chain reaction (PCR), sequence-validated and transferred into an Hsp68-LacZ reporter vector. Generation of transgenic mice and embryo staining was done as previously described [26] in accordance with protocols approved by the Lawrence Berkeley National Laboratory. For each enhancer fragment, all transgenic embryos exhibiting LacZ-staining were scored and annotated independently by multiple curators.
Motif identification and prediction of forebrain enhancers
To find sequence motifs that were associated with forebrain expression,
we used a discrete, enumerative motif-finding approach [27].
We identified motifs enriched in the training set of forebrain enhancers
relative to three sets of background sequences:
(1) random sequences from chromosome 16 (ATTAA and GATTA,
which we note are motifs present in previously characterized embryonic
forebrain enhancers [28, 29] ),
(2) a chromosome 16 set of human–Fugu conserved elements
(TTNNAAA, CANNGGC and TANNTGA), and
(3) a chromosome 16 set of human–Fugu sequences that displayed
enhancer
activity (TTNNTTT) (see Supplementary Methods
for details). We then combined information from all the motifs for the
prediction of new forebrain enhancers in the genome by scoring each of
3,124 human/mouse/Fugu non-coding alignments[18]
for the number of conserved (found aligned in human/mouse/Fugu)
matches to each of the 6 significant 5-mers (see Supplementary
Methods for details of scoring procedure). The top 30 fragments
are available in
Supplementary Table 3.
Acknowledgements
Research was conducted at the E. O. Lawrence Berkeley National Laboratory, under the Programs for Genomic Application, funded by the National Heart, Lung, and Blood Institute, USA as well as the National Human Genome Research Institute, USA, and performed under a Department of Energy Contract with the University of California.
Competing interests statement: The authors declared no competing interests.
http://www.nature.com/nature/journal/v444/n7118/suppinfo/nature05295.html
A summary of all the human conserved noncoding fragments tested
for enhancer activity at embryonic day 11.5. Enhancer ID refers
to a unique identifier defined at: http://enhancer.lbl.gov
Supplementary Table 1. - Download Excel file (37KB)
http://www.nature.com/nature/journal/v444/n7118/suppinfo/nature05295.html
A compilation of human-fugu conserved noncoding elements
in the human genome.
Supplementary Table 2. - Download Excel file (208KB)
http://www.nature.com/nature/journal/v444/n7118/suppinfo/nature05295.html
The top 30 forebrain enhancer predictions in the human genome.
The strategy to generate this list can be found in the Supplementary
Methods.
Supplementary Table 3. - Download Excel file (18KB)
http://www.nature.com/nature/journal/v444/n7118/suppinfo/nature05295.html
An expanded version of the Materials and Methods.
http://www.nature.com/nature/journal/v444/n7118/suppinfo/nature05295.html
Identification of Conserved Noncoding Elements.
The set of human-fugu conserved noncoding elements tested in vivo was derived from a computational pipeline described previously [1]. Using genomic DNA alignments, we constructed a syntenic map defining homology between the human and fugu genomes. We then identified discrete fragments showing conservation (70% identity with a match minus mismatch score [2] >/=60). Transcribed sequences in the conserved set were filtered out using known genes, spliced ESTs and mRNA annotations obtained from the UCSC genome browser (intronic conservation was allowed). We then manually curated the data set to remove any additional false-positives by visual examination of UCSC genomic data. Whole-genome human-mouse-rat and human-mouse-fugu conserved noncoding elements were subsequently identified and assigned p-values using Gumby [3] in combination with a more recent version of the program used to construct synteny maps [4]. In the course of our study, we initially focused on human chromosome 16 and tested 79 elements (73 were human-fugu conserved, 4 were human-fugu-ultra conserved, and 2 were ultraconserved alone) from this chromosome but subsequently expanded the study to include 88 elements on other chromosomes (10 were human-fugu conserved, 50 were human-fugu-ultra conserved, and 28 were ultraconserved alone). The primary rationale for this expansion was to compare the success rate of enhancer identification of human-fish versus human-ultra conservation.
Mouse transgenic enhancer assay.
Primers were designed to flank the conserved element by several hundred basepairs using primer3 [5] and can be found for each element at http://enhancer.lbl.gov/. Enhancer element constructs were PCR amplified from human genomic DNA (BD Biosciences) and directionally cloned into the pENTR/D-TOPO vector (Invitrogen). All inserts were sequence-validated and transferred into a Hsp68-LacZ vector [6] encompassing a Gateway cassette using LR recombination (Invitrogen). Generation of transgenic mice and embryo staining was performed as previously described [7] in accordance with protocols approved by the Lawrence Berkeley National Laboratory. Transgenic mouse DNA was prepared from yolk sacs that were carefully dissected from embryos, boiled for 5 minutes in lysis solution (50 mM Tris HCl pH 8.0, 20mM NaCl, 1mM EDTA pH 8.0, 1% SDS), and then screened by PCR with LacZ primers (LacZ-fwd 5'-TTTCCATGTTGCCACTCGC; LacZ-Rev 5'-AACGGCTTGCCGTTCAGCA) for positive transgenic animals. Images were obtained using a Leica MZ16 microscope and DC480 camera, cropped and level adjusted with Adobe Photoshop. High-resolution images of single embryos were deposited into our internal database.
Positive enhancer scoring and annotation.
For each enhancer fragment, all transgenic embryos exhibiting LacZ-staining were scored and annotated independently by multiple curators. Positive enhancers required a minimum of 3 independent transgenic embryos showing a consistent expression pattern (though 83% had 4 or more, and 67% had 5 or more supporting embryos) while negatives required no obvious consistent reporter gene expression and/or at least 3 non-staining mice that were also positive for the transgene as determined by PCR [7]. Nomenclature standards were obtained from Bard et al [8] and, in general, used a low-resolution vocabulary based on whole embryo microscopy visualization.
Length and Conservation of Positive versus Negative Enhancer Sequences.
To investigate possible predictive features of positive enhancers, we mapped 160 of the 167 tested sequence elements to the whole-genome set of syntenic human-mouse-rat conserved noncoding elements [3,4]. We selected the human-rodent dataset to enable assessment of both the human-rodent-ultra- as well as human-fugu- tested fragments, with 7 of the assayed elements eliminated due to limited synteny or missing sequence in the rat genome. We found that positive enhancers overlapped human-rodent conserved elements with a mean length 1,630 bp, many of which extended beyond the boundaries of the tested sequence, whereas the negative enhancers mapped to significantly shorter (t-test p value=0.0087) human-rodent elements (mean: 966 bp). Similarly, the positive enhancers mapped to human-rodent elements with significantly higher (t-test p value: 0.0004) evolutionary conservation scores (mean Gumby -log(p value): 67.1) than the negatives (mean -log(p value): 43.5 [4], indicating that the degree of conservation between humans and rodents can be used to further prioritize human-fish and ultra-conserved elements for functional activity under this experimental design. It is worth noting that a previous study [9] also indicated that of 15 less conserved sequences tested in this assay, only one functioned as a developmental enhancer.
Motif-finding in a preliminary set of human enhancers.
To find sequence motifs that were associated with particular expression patterns, we used a discrete, enumerative motif-finding approach [10]. We focused our training set on all tested human-fugu elements from chromosome 16 which comprised our first available dataset, where 4 of the 77 tested fragments yielded strong forebrain enhancer activity. Because the training set was small (4 fragments, totaling 1,090 bp of conserved sequence), we chose to search for words of length 5 to retain statistical power. We tested all the 5-mers (allowing a spacer of up to 2 before the 3rd base) against the null hypothesis that they appeared in the 4 robust forebrain enhancers as frequently as in a background set (see below). We assigned significance using the binomial distribution:
where P is the probability of observing x or more given n tries and frequency in the background set, f. We chose a significance threshold of:
which we expect to produce less than one motif by chance since we treat each 5-mer and its reverse compliment as one motif (there are only 2560 motifs tested) and the motifs are not truly independent. When we identified a 5-mer that exceeded our threshold, we removed it from the training set, and repeated the procedure until there were no 5-mers that exceeded the significance threshold. Using this procedure we searched for motifs that were enriched in the forebrain enhancers relative to three sets of background sequences:
1) random sequences from chromosome 16 (which yielded ATTAA and GATTA, which we note are motifs present in previously characterized embryonic forebrain enhancers [11,12] ),
2) the chromosome 16 set of human-fugu fragments (which yielded TTNNAAA, CANNGGC and TANNTGA) and
3) the chromosome 16 set of human-fugu sequences that displayed enhancer activity (which yielded TTNNTTT). Because the latter two comparisons are between sets of sequences for which we have alignments with mouse and fugu, in those cases, rather than counting the 5-mers in the human sequence, we counted conserved 5mers (defined as a match in the same position in each species in the alignment of human, mouse and fugu). We note that motifs identified in each of these comparisons have slightly different interpretations, and we decided that all might be important for de novo forebrain enhancer prediction.
Predicting forebrain enhancers.
Because tissue-specific enhancers often contain multiple binding sites for multiple transcription factors, we sought to combine information from all the motifs for the prediction of new forebrain enhancers in the genome. We scored each of 3,124 human-mouse-fugu noncoding alignments for the number of conserved (found aligned in human-mouse-fugu) matches to each of the 6 significant 5-mers [4]. We ranked the fragments using a score that compares the frequency of conserved motif matches (as defined above) in the fragment to the expectation based on the background frequencies (f) over all the fragments. The score, S, for the ith fragment is given by:
where the sum is over each of the motifs, m, x is the number of times a particular motif occurred in a particular fragment and n is the length of a fragment. The top 30 fragments are available in Supplementary Table 3.
Assessing significance of predictions.
We assessed whether our motif enrichment and conservation-based prediction method was more effective than using conservation alone by attempting to reject the null hypothesis that k1 successes out of n1 tests (in the training data) and k2 successes out of n2 tests (in the test set) were obtained from the same distribution. We calculated the probability of observing k2 or more successes out of n2 draws from a binomial distribution, integrating over all possible values of the binomial probability p weighted by the posterior probability of observing k1 successes out of n1 tests for each value of p (the integrals were estimated numerically):
Potential scoring bias.
Because our motif conservation score is based on the number of conserved motifs, the top predictions tended to be more conserved and longer than the average. Since we had found that longer, more conserved fragments are more likely to function as enhancers in our assay, we considered the possibility that the enrichment of forebrain enhancers was simply due to an overall increase in enhancer prediction. We found, however, that the fraction of fragments that showed expression patterns other than forebrain in the chromosome 16 set 17/77 was not different to the fraction observed for that in the predictions 7/23 (p value=0.28), suggesting that the frequency of enhancer discovery for patterns other than forebrain had not changed.
Supplementary Methods References
1. Grimwood, J. et al. The DNA sequence and biology of human chromosome
19. Nature 428, 529-35 (2004).
2. Waterston, R. H. et al. Initial sequencing and comparative analysis
of the mouse genome. Nature 420, 520-62 (2002).
3. Prabhakar, S. et al. Close sequence comparisons are sufficient
to identify human cis-regulatory elements. Genome Res (2006).
4. Ahituv, N., Prabhakar, S., Poulin, F., Rubin, E. M. & Couronne,
O. Mapping cis-regulatory domains in the human genome using multi-species
conservation of synteny. Hum Mol Genet 14, 3057-63 (2005).
5. Rozen, S. & Skaletsky, H. Primer3 on the WWW for general
users and for biologist programmers. Methods Mol Biol 132, 365-86 (2000).
6. Kothary, R. et al. Inducible expression of an hsp68-lacZ hybrid
gene in transgenic mice. Development 105, 707-14 (1989).
7. Poulin, F. et al. In vivo characterization of a vertebrate
ultraconserved enhancer. Genomics 85, 774-81 (2005).
8. Bard, J. L. et al. An internet-accessible database of mouse developmental
anatomy based on a systematic nomenclature. Mech Dev 74, 111-20 (1998).
9. Nobrega, M. A., Zhu, Y., Plajzer-Frick, I., Afzal, V. & Rubin,
E. M. Megabase deletions of gene deserts result in viable mice. Nature
431, 988-93 (2004).
10. van Helden, J., Andre, B. & Collado-Vides, J. Extracting
regulatory sites from the upstream region of yeast genes by computational
analysis of oligonucleotide frequencies. J Mol Biol 281, 827-42 (1998).
11. Kurokawa, D. et al. Regulation of Otx2 expression and
its functions in mouse forebrain and midbrain. Development 131, 3319-31
(2004).
12. Zhou, J., Zwicker, J., Szymanski, P., Levine, M. & Tjian,
R. TAFII mutations disrupt Dorsal activation in the Drosophila embryo.
Proc Natl Acad Sci U S A 95, 13483-8 (1998).
1. Li L-C, Okino ST, Zhao H, Pookot D, Place RF, Urakami S, Enokida
H, and Dahiya R,
"Small dsRNAs induce
transcriptional activation in human cells".
2. Pollard KS, Salama SR, Lambert N, Lambot M-A, Coppens S, Pedersen
JS, Katzman S, King B, Onodera C, Siepel A, Kern AD, Dehay C, Igel H, Ares
M Jr, Vanderhaeghen P, and Haussler D,
"An RNA gene
expressed during cortical development evolved rapidly in humans".
3. Pandorf CE, Haddad F, Roy RR, Qin AX, V. Edgerton R, and Baldwin KM, "Dynamics of myosin heavy chain gene regulation in slow skeletal muscle: Role of natural antisense RNA".
4. Ge X, Wu Q, Jung Y-C, Chen J, and Wang SM,
"A large quantity
of novel human antisense transcripts detected by LongSAGE".
5. Christov CP, Gardiner TJ, Szüts D, and Krude T,
"Functional
Requirement of Noncoding Y RNAs for Human Chromosomal DNA Replication".
6. O’Gorman W, Kwek KY, Thomas B, and Akoulitchev A,
"Non-coding
RNA in transcription initiation".
7. Mollica LR, Crawley JTB, Liu K, Rance JB, Cockerill PN, Follows GA, Landry J-R, Wells DJ, and Lane DA, "Role of a 5'-enhancer in the transcriptional regulation of the human endothelial cell protein C receptor gene".
8. Shin JT, Priest JR, Ovcharenko I, Ronco A, Moore RK, C. Burns
CG, and MacRae CA,
"Human-zebrafish
non-coding conserved elements act in vivo to regulate transcription".
9. Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T,
Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W,
Edwards YJK, Cooke JE, and Elgar G,
"Highly Conserved
Non-Coding Sequences Are Associated with Vertebrate Development".
10. Coudert AE, Pibouin L, Vi-Fane B, Thomas BL, Macdougall M, Choudhury A, Robert B, Sharpe PT, Berda A, and Lezot F, "Expression and regulation of the Msx1 natural antisense transcript during development".
11. Jones EA, and Flavell RA,
"Distal Enhancer
Elements Transcribe Intergenic RNA in the IL-10 Family Gene Cluster".
12. Kuwabara T, Hsieh J, Nakashima K, Taira K, Gage FH, "A Small Modulatory dsRNA Specifies the Fate of Adult Neural Stem Cells", Cell, vol 116, no. 6, pp.779-793 (19 March 2004).
13. Ling J, Baibakov B, Pi W, Emerson BM, and Tuan D, "The HS2 Enhancer of the b-globin Locus Control Region Initiates Synthesis of Non-coding, Polyadenylated RNAs Independent of a cis-linked Globin Promoter", J. Mol. Biol. vol. 350, no. 5, pp. 883-896 (July 29, 2005).
14. Frenster JH, "Selective Control of DNA Helix Openings during Gene Regulation", Cancer Research, vol. 36, pp. 3394-3398 (September, 1976).
15. Hovsepian JA, and Frenster JH, "Sense and Antisense during RNA Initiation of the DNA Transcription Bubble", "RNA2005", p. 279, The RNA Society, Bethesda, MD 20814-3998, (May, 2005).
16. Frenster JH, and Hovsepian JA, "Activator RNA Exchange during Interphase Chromatin Reprogramming".
17. Frenster JH, and Hovsepian JA, "Ultrastructure of Euchromatin Contact Points between the Closed Loops of Adjacent Interphase Chromosomes", Molec. Biol. Cell, vol. 16, suppl., p. 1280a (December, 2005).
18. Frenster JH, and Hovsepian JA, "Kissing Chromosomes and Paired Sense-Antisense RNA Synthesis", Cold Spring Harbor Symp. Quant. Biology 71, 62 (May, 2006).
Links to RNA and Biological Causality:
A Brief History of Activator RNA:
Links to
Euchromatin Activator RNA Reviews:
Links to
Euchromatin Activator RNA Research:
Links to Ultrastructural
Probes of DNase I-Sensitive Sites:
Links to
RNA as a Therapeutic Agent:
Links to Hodgkin Lymphoma
Immuno-Pathology:
Links to Activated
T-Lymphocyte Immunotherapy:
Links to Medical
Systems Biology:
Links to Selective
Gene Transcription:
Links to RNA-Induced
Epigenetics:
Links to RNA-Induced
Embryogenesis:
Links to RNA and
Biological Causality:
Links to Reprogramming
and Neoplasia:
"Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".