Miao Sun 1, Laurence D. Hurst 2, 4, Gordon G. Carmichael 3, and Jianjun Chen 1, 4
1 Section of Hematology/Oncology, Department of Medicine,
University of Chicago, Chicago, Illinois 60637, USA; ,
2 Department of Biology and Biochemistry, University
of Bath, Somerset, BA2 7AY, United Kingdom; ,
3 Department of Genetics and Developmental Biology, University
of Connecticut Health Center, Farmington, Connecticut 06030-3301, USA
4 Corresponding authors.
E-mail:
jchen@medicine.bsd.uchicago.edu
fax: +1 (773) 702-3002.
E-mail: l.d.hurst@bath.ac.uk
fax: + 44 (0)1225-386779.
Given that humans have about the same number of genes as mice and
not so many more than worm, what makes us more complex? Antisense transcripts
are implicated in many aspects of gene regulation. Is there a functional
connection between antisense transcription and organismic complexity, that
is, is antisense regulation especially prevalent in humans? We used the
same robust protocol to identify antisense transcripts in humans and five
other metazoan genomes (mouse, rat, chicken, fruit fly, and nematode),
and found that the estimated proportions of genes involved in antisense
transcription are highly sensitive to the number of transcripts included
in the analysis. By controlling for transcript abundance, we find that
the probability that any given transcript is putatively involved in sense–antisense
regulation is no higher in humans than in other vertebrates but appears
unusually
high in flies and especially low in nematodes. Similarly, there is
no evidence that the proportion of sense–antisense transcripts is especially
higher in humans than other vertebrates in a given subset of transcript
sequences such as mRNAs, coding sequences, conserved, or nonconserved transcripts.
Although antisense transcription might be enriched in mammalian brains
compared with nonbrain tissues, it is no more enriched in human brain than
in mouse brain. Overall, therefore, while we see striking variation between
multicellular animals in the abundance of antisense transcripts, there
is no evidence for a link between antisense transcription and organismic
complexity. More particularly, we see no evidence that humans are in any
way unusual among the vertebrates in this regard. Instead, our results
suggest that antisense transcription might be prevalent in almost
all metazoan genomes, nematodes being an unexplained exception.
Supplemental material is available online at www.genome.org
http://www.genome.org/cgi/content/full/gr.5210006/DC1
While it appears intuitively reasonable to suppose that organisms differ in their "complexity," this apparently simple assertion begs numerous further questions. One issue is definitional, that is, what is complexity, and how might it be measured? Organismic complexity, it is argued, is a compound term with at least four types being distinguished: nonhierarchical morphological, nonhierarchical developmental, hierarchical morphological, and hierarchical developmental (McShea 1996). According to the complexity in differentiated cell, tissue, and organ types, with or without developed limbs and nervous systems, as well as language ability, and so on, it is a common notion that humans are the most complex species, while mammals are more complex than primitive vertebrates, and vertebrates are more complex than invertebrates.
Assuming that humans are, in some sense, more complex than mice and flies, the second issue is then biological. What factors underlie the differences in complexity? Following the discovery of the remarkably small number of protein-coding genes in the human genome (Lander et al. 2001; Venter et al. 2001), it was suggested that complexity might arise from alternative splicing (Lander et al. 2001; Venter et al. 2001; Modrek and Lee 2002; Kim et al. 2004b). While no doubt this is true in part, it is remarkable that across a wide span of taxa, there is little difference in the abundance of alternative splicing (Brett et al. 2002; Harrington et al. 2004). What else might underpin the differences in complexity? It has been suggested that the basis of eukaryotic complexity and phenotypic variation may lie primarily in a control architecture composed of a highly parallel system of trans-acting RNAs that relay state information required for the coordination and modulation of gene expression (Mattick 2001, 2004, 2005) and that organismal complexity arises from progressively more elaborate regulation of gene expression (Levine and Tjian 2003). Antisense regulation might be a good candidate for such gene control (Mattick 2004, 2005). Indeed, antisense regulation has been suggested to be important to all species from bacteria to humans (Merino et al. 1994; Kumar and Carmichael 1998; Vanhee-Brossollet and Vaquero 1998; Carmichael 2003; Kramer et al. 2003; Lavorgna et al. 2004; Wang and Carmichael 2004).
The majority of natural antisense transcripts are cis-encoded ones, which are transcribed from the opposite strand of the same genomic loci from their sense counterparts (Vanhee-Brossollet and Vaquero 1998). Besides the cis-encoded antisense transcripts, there is another kind of antisense called trans-encoded antisense such as microRNAs (Ambros 2004; Bartel 2004), which are transcribed from different genomic loci than their target genes (Kumar and Carmichael 1998; Vanhee-Brossollet and Vaquero 1998; Chen et al. 2004). To date, 250–300 microRNAs have been identified or predicted in humans and mice, respectively (http://microrna.sanger.ac.uk/sequences/index.shtml; December 2005), and most of them are conserved between these two species; in addition, their putative target sites might also be conserved between mammals (Lewis et al. 2005). Although this might be due to the fact that the conservation across species is often the very reason why they are detected by computational methods, there is no evidence yet that microRNA regulation is much more prevalent in humans than in mice, and thus we focus our study on cis-encoded antisense transcription alone. All the antisense transcription/transcripts or sense–antisense pairs mentioned in this study hence refer to cis-encoded ones.
Antisense regulation (mediated by cis-encoded antisense) has been implicated in many aspects of gene regulation, including translational regulation, genomic imprinting, RNA interference, alternative splicing, X chromosome inactivation, RNA editing, and heterochromatic gene silencing (for reviews, see Kumar and Carmichael 1998; Vanhee-Brossollet and Vaquero 1998; Lavorgna et al. 2004). In addition, our recent studies with regard to antisense intron size and sense–antisense coordinate expression have suggested that antisense regulation is likely to be a common and important gene-regulation mechanism in humans (Chen et al. 2005a, b,c).
Several recent genome-wide analyses have suggested that antisense regulation might be common but have different abundance in many eukaryotic genomes. Notably, very different proportions of sense–antisense (SA) transcripts have been proposed in different genomes, for example, ~5% in rice (Oryza sativa) (Osato et al. 2003), 15% in the fruit fly (Drosophila melanogaster) (Misra et al. 2002), 15% in the mouse (Mus musculus) (Kiyosawa et al. 2003), and 22% (Chen et al. 2004) to even >40% (Cheng et al. 2005) in the human (Homo sapiens) genome. If this variation is real, it might suggest a relatively rapid loss and gain of antisense regulation in evolution. In this regard, the disparate figures for mouse and human are especially interesting given that these two species are relatively close evolutionarily (diverged ~75 million years ago [Mya]), and their genomes are well annotated. This would fit with the suggestion, albeit not one commonly considered, that there may be more antisense regulation in humans and this may be related to the greater complexity of humans, particularly to the higher demands of human brain learning, memory, speech, and cognitive capabilities (J.S. Mattick, pers. comm.).
Here we ask whether the differences between humans and other animals
are likely to be real. In the aforementioned studies, different protocols
were used. If then we apply the same protocol to humans and other animals,
do we still observe large differences in the estimated proportion of SA
transcripts? In this study, we used the same robust protocol that has been
demonstrated to be very specific and efficient (Chen
et al. 2004), to identify antisense transcripts in the human and other
five metazoan genomes. Although the estimated putative SA pairs are much
more common in humans than in mice, rats (Rattus norvegicus), chicken
(Gallus gallus), fruit flies, and nematodes (Caenorhabditis elegans),
the estimated proportions of genes involved in antisense transcription
are seriously affected by the number of transcripts available for analysis.
By controlling for transcript abundance, we show that the estimated proportion
of SA genes is approximately constant in vertebrates.
Results
Differences in frequencies of putative SA gene pairs are not owing to differences in methods to identify SA pairs
We used the same robust protocol (Chen et al.
2004) to identify putative sense–antisense (SA) transcripts in the human,
mouse, rat, chicken, fruit fly, and nematode genomes (see Methods).
In an experimental validation of SA pairs by orientation-specific RT-PCR,
we observed that the specificity of this protocol is >92%; in addition,
this protocol also has a higher sensitivity compared with other protocols
established previously (Chen et al. 2004). As shown
in Figure 1 and Table 1, the putative
SA transcripts appear to be much more common in humans (accounting for
22.7% of the whole set of genes) than in mice (11.6%), rats (4.8%), chickens
(4.8%), fruit flies (17.2%), and nematodes (0.5%). Thus, the differences
that have been reported previously (Misra et al. 2002;
Kiyosawa
et al. 2003; Chen et al. 2004; Cheng
et al. 2005) appear not to be the result of underlying differences
in the protocols used. A similar pattern was observed regarding the proportion
of qualified transcript sequences (i.e., those that have the correct orientation
ensured by our stringent criteria (Chen et al.
2004; see also Methods) estimated to be involved
in antisense transcription (Fig. 1), that is, 27.0%,
12.2%, 6.5%, 4.4%, 18.6%, and 0.9% of the qualified transcripts in the
human, mouse, rat, chicken, fruit fly, and nematode genome, respectively,
may form SA pairs (Table 1). Therefore, the overall proportion
of antisense transcripts varies strikingly between the organisms. Most
importantly, the pattern appears then to support the possibility that antisense
regulation is especially prevalent among human genes. However, there remain
further important problems. Notably, the total number of qualified transcript
sequences also differs dramatically among the genomes, with threefold to
18-fold more sequences in humans compared with the others (Table
1). Could this explain the apparent enrichment of sense–antisense transcripts
in humans?
Figure 1. Proportion of sense–antisense (SA) genes or sequences in the whole gene or sequence data set in each genome.
One sequence represents one (qualified) transcript sequence, while one gene may contain several (qualified) transcript sequences. For example, as shown in Table 1, a total of 100,444 human SA transcript sequences belong to 6194 SA genes (i.e., ~16 transcript sequences per sense/antisense gene). Humans have the highest proportion of SA genes and sequences, while nematodes have the lowest proportion. See Table 1 for more details.
The higher proportion of SA transcripts in humans is owing to greater
availability of transcript sequences
To evaluate the effect of transcript abundance on the estimated
proportion of SA genes, we randomly selected a set number of sequences
from the whole qualified transcript sequence data set for each genome (see
Supplemental
Table 1) and then followed the same protocol (see Methods)
to estimate the proportion of SA genes. For each number of transcripts,
we repeated the analysis independently 1000 times and calculated the mean
value of SA proportions. As expected, the estimated proportion of SA genes
increases as the transcript number increases (Fig. 2A).
Notably, with the same numbers of transcript sequences, the estimated proportion
of SA genes in humans is about the same as those in mice, rats, and chickens
(and perhaps even a little lower). Unexpectedly, the estimated proportions
of SA genes are much higher in flies and much lower in nematodes compared
with those in the vertebrates. For example, with 20,000 transcript sequences,
the estimated proportion of SA genes is 2.05%, 2.90%, 2.74%, 4.22%, 12.09%,
and 0.52% in the human, mouse, rat, chicken, fly, and nematode genome,
respectively (Fig. 2A; Supplemental
Table 1). A similar pattern has also been observed with respect to
the effect of abundance of transcript sequences on the proportion of relevant
sequences (rather than genes) estimated to be involved in antisense transcription
(Fig. 2B; Supplemental
Table 1).
Figure 2: Relationship between the number of qualified transcripts
and the estimated proportion of SA genes or sequences.
Figure 2. Relationship between the number of qualified transcripts and the estimated proportion of SA genes or sequences.
We randomly selected a set number of sequences from the whole qualified transcript sequence data set for each genome (see Supplemental Table 1) and then followed the same procedure (see Methods) to estimate the SA gene/sequence proportion. For each number of transcripts, we repeated the analysis independently 1000 times. The mean values with their standard deviation (mean ± SD) are shown. "% of SA genes/sequences" refers to the proportion of genes/sequences involved in putative antisense transcription (i.e., forming putative SA pairs) in a given gene/sequence data set.
(A) Relationship between the number of qualified transcripts and the estimated proportion of SA genes.
(B) Relationship between the number of qualified transcripts and the estimated proportion of SA sequences.
As the protocols and sources used to produce cDNAs/ESTs vary greatly between organisms (Harrington et al. 2004), heterogeneities would exist in transcript data sets between different species. As shown in Table 1, the proportion of mRNA sequences in the total qualified transcript sequences dramatically varies from 31.2% in humans to 88.7% in nematodes; the proportion of protein-coding sequences in the total qualified transcript sequences also dramatically varies from 25.0% in humans to 88.6% in nematodes. Would these heterogeneities in sequence sources bias our observations? To address this issue, we used the same procedures as above to analyze the effect of abundance of mRNA sequences alone or of protein-coding sequences alone on the estimated SA proportions. We observe similar patterns to those reported above (Supplemental Fig. 1a–d), indicating that the different abundance of different sorts of sequence data among the organisms (Table 1) does not bias the observations.
If humans have more SA pairs than mice, then how did the novel SA pairs come about? One possibility is that genes orthologous in mice and humans came to overlap in humans but not in mice. An alternative would be that a gene with an ortholog in mice and humans was subject to de novo generation of new overlapping transcripts but only in humans (hence there would be no mouse ortholog for one of the two genes in the SA pair). A third possibility is that both S and A are de novo in humans. To address these possibilities, we identified all the human–mouse ortholog gene pairs in our data set (see Methods). In some cases, one human gene might have several different ortholog genes in mice, and vice versa. To simplify the analysis, we excluded all the one-to-multiple or multiple-to-multiple ortholog pairs (including all the transcripts that belong to these ortholog genes) from the analysis. After such treatment, we identified 11,931 one-to-one human–mouse ortholog gene pairs in our data set. To avoid the potential bias on the analysis of SA proportion due to the difference in transcript number between the paired human and mouse ortholog gene clusters, we further selected a single ortholog transcript sequence with the longest size from each ortholog gene cluster to represent that ortholog gene. Thus, we obtained 11,931 one-to-one human–mouse ortholog transcript sequence pairs, so that humans and mice have the same number of unique ortholog transcripts. We then analyzed the SA pairs formed between these 11,931 one-to-one human–mouse ortholog transcripts in each species. We found that 314 (i.e., 2.6%) of the 11,931 ortholog transcripts form 157 SA pairs in humans, fewer than that in mice (388, i.e., 3.3% of the 11,931 ortholog transcripts form 194 SA pairs). Therefore, under the first possibility, humans do not have more SA pairs formed between the ortholog genes than mice.
To test whether the one-to-one human–mouse ortholog genes form more
SA pairs with nonortholog transcripts in humans than in mice, we randomly
selected a set number of nonortholog transcripts (those that do not belong
to any one-to-one, one-to-multiple, or multiple-to-multiple ortholog pairs)
and combined them with the fixed number of 11,931 one-to-one human–mouse
ortholog transcripts in each species; we then clustered them and detected
the SA pairs formed between ortholog and nonortholog transcripts. As shown
in Figure 3A, with the same numbers of nonortholog transcripts,
the percentages of the 11,931 one-to-one human–mouse ortholog transcripts
that form SA pairs with nonortholog transcripts are almost the same between
humans and mice. Thus, under the second possibility, humans do not have
more SA pairs formed between the ortholog and nonortholog transcripts than
mice.
Figure 3: Relationship between the number of nonortholog transcripts
and the percentage of human–mouse ortholog transcripts that form SA pairs
with nonortholog transcripts, or between the number of nonortholog transcripts
and the proportion of SA transcripts within the selected nonortholog transcripts
in humans and mice, respectively.
Figure 3. Relationship between the number of nonortholog transcripts and the percentage of human–mouse ortholog transcripts that form SA pairs with nonortholog transcripts, or between the number of nonortholog transcripts and the proportion of SA transcripts within the selected nonortholog transcripts in humans and mice, respectively.
(A) We randomly selected a set number of sequences from the whole qualified nonortholog transcript sequence data set for each genome, and then combined them with the 11,931 one-to-one human–mouse ortholog transcripts and determined the SA pairs formed between ortholog and nonortholog transcripts. Thus, the percentages of the 11,931 one-to-one human–mouse ortholog transcripts that form SA pairs with nonortholog transcripts were determined in each species.
(B) We randomly selected a set number of qualified nonortholog transcript sequences and determined the SA pairs formed between nonortholog transcripts. Thus, the percentages of SA transcripts within the selected nonortholog transcripts were determined in each species. For each point, we repeated the analysis independently 1000 times. The mean values with their standard deviation (mean ± SD) are shown.
To test whether the nonortholog transcripts form more SA pairs between themselves in humans than in mice, we randomly selected a set number of nonortholog transcripts as described above, and then clustered them and detected the SA pairs formed between the selected nonortholog transcripts. As shown in Figure 3B, with the same numbers of nonortholog transcripts, the proportions of the SA transcripts within the selected nonortholog transcripts are even slightly higher in mice than in humans. A similar pattern was observed regarding the proportions of SA genes (rather than sequences) (data not shown). Thus, under the third possibility, humans do not have more SA pairs formed between the nonortholog transcripts than mice either.
Similarly, we have identified 3537 one-to-one human–rat ortholog transcript pairs in our data set (see Methods). We found that 14 ortholog transcripts form seven SA pairs in humans, while 10 ortholog transcripts form five SA pairs in rats, and there is no significant difference between them (14/3537 vs. 10/3537; C2-test, P = 0.5396). Moreover, we have also randomly selected a set number of nonortholog transcripts to detect SA pairs formed within themselves or with the one-to-one ortholog transcripts. As shown in Supplemental Figure 2, a and b, with the same number of nonortholog transcripts, the SA proportions are not higher in humans than in rats. In addition, we identified 905 one-to-one ortholog transcripts between humans and chickens. We found no SA pairs formed between the ortholog transcripts, while the same small number of SA pairs (nine pairs) formed between the ortholog transcripts and nonortholog transcripts in both genomes. As expected, random-subsampling analysis indicates that humans do not have a higher proportion of SA pairs formed within nonortholog transcripts than chickens either (data not shown). These tests, note, additionally control for differences in the data sources in terms of relative completeness of coverage.
Taken together, the higher overall proportion of SA transcripts in humans is owing to greater availability of transcript sequences (Table 1). After controlling for transcript abundance, although the proportion of SA transcripts (in a given size of transcript set) still varies between the organisms, it is not specifically higher in humans compared with other organisms in either the whole transcript data set (Fig. 2A,B), or in a given specific subset of transcript sequences such as mRNAs (Supplemental Fig. 1a,b), protein-coding sequences (Supplemental Fig. 1c,d), or conserved or nonconserved transcripts (Fig. 3A,B; Supplemental Fig. 2a,b).
Human brain appears to be no more enriched for antisense transcription
than mouse brain
Human brain is the most complex organ in the human body, and is
much more complex than the brains of other mammals. It was hypothesized
that an increase in noncoding regulatory capacity reflects the higher demands
of human brain learning, memory, speech, and cognitive capabilities (J.S.
Mattick, pers. comm.).
This raises two questions:
(1) Is antisense transcription more prevalent in brain tissue than
other tissues?
(2) Is antisense transcription more prevalent in human brain compared
with the brains of other mammals?
To address these two questions, we have performed a comparison of
antisense transcription prevalence between human and mouse regarding three
different tissue/cell types including brain, liver, and embryonic stem
cells. The difference in complexity in liver and embryonic stem cells between
human and mouse is thought to be very limited, so that these two types
of tissues/cells can be used as a control for the brain. Only the transcripts
that have SAGE expression data to support their expression in a
given tissue were used in the assembly of transcript clusters (i.e., genes)
and further in the analysis of SA proportion in the given tissue (see Methods).
We found that the overall proportion of SA genes/sequences is much higher
in brain than in liver and embryonic stem cells in both species, and that
in each type of tissue/cell, the overall proportion of SA genes/sequences
is much higher in humans than in mice. After controlling for transcript
abundance, the proportion of SA genes/sequences in a certain number of
transcripts is still higher in brain than in liver and embryonic stem cells,
and it is very similar between liver and embryonic stem cells in both species
(Figs. 4A,B). To try to understand what underlies the
finding that antisense expression is enriched in brain, we further detected
in these three tissues the number of SA genes that are conserved between
human and mouse (and hence likely to be functional). We identified 4951,
2465, and 2948 one-to-one human–mouse ortholog genes that are expressed
in both species’ brain, liver, and embryonic stem cells, respectively;
of them, 3.0% (148/4951), 1.4% (35/2465), and 2.1% (63/2948) form SA pairs
in both species in the relevant tissue, respectively. The fraction of conserved
SA transcripts in brain (3.0%) is significantly greater than that in liver
(1.4%; C2-test, P < 0.0001)
and that in embryonic stem cells (2.1%; C2-test,
P
< 0.05). Thus, it seems that the excess of antisense transcripts in
brain is not spurious and that the brain might tolerate or need a higher
level of antisense expression, although further studies will be required
to clarify whether this is also a reflection of differences in transcription
pattern between the brain and other tissues. Importantly, however, in each
type of tissue/cell, brain included, the proportion is not higher, maybe
even lower, in humans compared with that in mice (Figs. 4A,B).
Thus, although antisense transcription appears to be more prevalent in
mammalian brain tissue compared with nonbrain tissues/cells, the human
brain appears to be no more enriched for antisense transcription than the
mouse brain (Figs. 4A,B). In fact, the similarity between
human and mouse regarding SA proportion in each tissue or cell type (Figs.
4A,B) is the same as that in the whole data set (Figs.
2A,B).
Figure 4: Relationship between the number of qualified transcripts
and the estimated proportion of SA genes or sequences in human and mouse
brain, liver, and embryonic stem cells.
Figure 4. Relationship between the number of qualified transcripts and the estimated proportion of SA genes or sequences in human and mouse brain, liver, and embryonic stem cells.
We randomly selected a set number of sequences from the whole set of qualified transcript sequences expressed in each type of tissue/cell in each genome and then estimated the SA proportion. For each number of transcripts, we repeated the analysis independently 1000 times. The mean values with their standard deviation (mean ± SD) are shown.
(A) Relationship between the number of qualified transcripts and the estimated proportion of SA genes.
(B) Relationship between the number of qualified transcripts and the estimated proportion of SA sequences. (Hs_) Human (Homo sapiens); (Mm_) mouse (Mus musculus); (ESC) embryonic stem cells.
One may also suggest that it might be necessary to control for differences
in cDNA cloning methods (e.g., random EST cloning vs. full-length cDNA
cloning; using total RNA vs. using poly(A)+ RNA samples) in each tissue.
Unfortunately, it is presently not possible to obtain enough transcript
sequences that were sequenced from the same tissue and with the same method
from each genome for such analysis. However, as shown above, the dramatically
different proportion of mRNA sequences among the genomes (Table
1), which is largely caused by the difference in cDNA cloning methods
(e.g., random EST cloning vs. full-length cDNA cloning) preferentially
used in different species, does not significantly bias our findings (Supplemental
Fig. 1a,b). Thus, the difference in cDNA cloning methods is unlikely
to significantly bias our findings. In addition, according to our procedure
(see Methods), mRNA sequences with CDS would be
selected as qualified transcript sequences no matter whether they have
poly(A) tails/signals or not, and, in fact, >95% of the whole selected
protein-coding sequences are mRNAs with CDS; therefore, difference in RNA-sample
selection [total RNA vs. poly(A)+ RNA] has already been controlled for
in the analysis of protein-coding sequences among the genomes, and our
results imply that the difference in RNA-sample selection is unlikely to
significantly bias our findings either (Supplemental
Fig. 1c,d).
Discussion:
Taken together, our results suggest that it is the number of qualified transcript sequences, not the heterogeneities in sequence sources, that has a major effect on the estimated proportions of SA genes and sequences. While it is likely that many more and a greater diversity of transcripts are required to detect the vast majority of antisense transcripts in each species, we find no evidence that humans are in any way unusual among the vertebrates regarding antisense transcription. Why the fly is especially rich in antisense transcription while the nematode is especially poor is far from clear. Genomic compaction seems not to be the primary reason for the enrichment of antisense in fly, because the nematode genome is even more compact than the fly genome (Table 1; see also Levine and Tjian 2003). One might speculate that the low proportion in worm may in part be related to the abundance of operons in the worm genome (Blumenthal et al. 2002), because genes in operons are typically on the same strand, and the individual transcripts are made by splicing the polycistronic RNA, and thus neighboring genes in operons are less likely to be SA regulated in cis. Nonetheless, although only 0.34% (8/2326) of the genes incorporated into operons belong to putative SA genes, lower than that (0.53%; 76/14,406) in the whole gene set, the difference is not significant (P > 0.05). In addition, because only 15%–16% of the worm genes are incorporated into operons (Blumenthal 1998; Blumenthal et al. 2002; this study), after removing these genes, the overall SA gene proportion only slightly increased from 0.53% to 0.56%. Therefore, further explanation needs to be explored. One might speculate that the dearth in worm is associated with the highly programmed set of cell divisions seen in this taxa.
Might antisense regulation be more prevalent in humans?
As our study identifies putative SA pairs, rather than experimentally confirmed ones, we cannot exclude the possibility that antisense regulation might be more prevalent in humans than in other vertebrate genomes. There are two important parameters. First, it might be the case that in humans, for a given number of predicted putative SA pairs, a higher proportion of these actually function in antisense regulation. If so, even if humans have a similar proportion of putative SA transcripts to other vertebrates, antisense regulation would be more prevalent in humans. Second, humans may have a larger and more diverse transcriptome than other vertebrates. Were this the case, we would predict a higher absolute number of putative SA pairs for the genome with the large transcriptome, even if the proportion of predicted SA pairs that are functional is no different between the taxa. Owing to the combinatorial nature of transcription regulation, possible differences in the absolute number of antisense transcripts could produce a dramatic expansion in regulatory complexity (Levine and Tjian 2003). Is either of these possibilities reasonable?
Might human putative SA transcripts function more frequently in antisense regulation?
We first ask whether human SA genes are more frequently subjected to A-to-I editing. Cis-encoded antisense RNAs may function in gene regulation by forming long and perfect double-strand RNAs (dsRNAs) with target sense transcripts in the nucleus (Kumar and Carmichael 1998; Vanhee-Brossollet and Vaquero 1998; Carmichael 2003; Chen et al. 2004, 2005a; Lavorgna et al. 2004; Wang and Carmichael 2004). These, in turn, might be A-to-I edited by adenosine deaminases that act on dsRNA (ADARs) (Bass 2002; Tonkin and Bass 2003). Recent genome-wide computational surveys revealed that 1%~10% of human genes might be subjected to A-to-I editing (Athanasiadis et al. 2004; Blow et al. 2004; Kim et al. 2004a; Levanon et al. 2004; Eisenberg et al. 2005), which is at least an order of magnitude more frequent than in the mouse, rat, chicken, or fly genomes (Kim et al. 2004a; Eisenberg et al. 2005). If the majority of the genes subjected to A-to-I editing belong to SA genes, these findings would suggest that functional antisense transcription is much more prevalent in humans (J.S. Mattick, pers. comm.). However, in analysis of human transcript sequences that have been identified as undergoing RNA editing (Kim et al. 2004a), we see no significant enrichment in putative SA transcripts. Among the 2674 full-length human cDNAs in which Kim et al. (2004a) observed A-to-I editing, 2104 were collected in our human transcript data set; of them, only 30.6% (645) belong to SA transcripts, close to the overall proportion (27.0%) of SA transcripts in the whole human transcript data set. Moreover, the majority of the editing sites are unlikely to be in the putative SA (exonic) overlapping regions, as the sites are predominantly in intronic and intergenic RNAs (Athanasiadis et al. 2004; Blow et al. 2004). In addition, while most of the human A-to-I editing sites reside in primate-specific Alu elements (Athanasiadis et al. 2004; Blow et al. 2004; Kim et al. 2004a; Levanon et al. 2004; Eisenberg et al. 2005), Athanasiadis et al. (2004) found that Alu-mediated RNA duplexes targeted by RNA editing are formed mainly intramolecularly. Indeed, a recent study (Neeman et al. 2005) demonstrated that the RNA editing level in sense–antisense overlapping areas (apart from the Alu regions within them) is negligible in both human and mouse genomes. If so, it is not clear how and why the potential SA dsRNAs avoid the fate of A-to-I editing. One possibility is that sense–antisense pairing might actually occur within the cell, but the resulting RNA duplexes, edited or unedited, are either retained in the nucleus (Carmichael 2003; Wang and Carmichael 2004) or degraded, and are thus not represented in expressed sequence data (Neeman et al. 2005).
Second, antisense transcripts have unusually short introns, which
may reflect selection of antisense transcripts for rapid gene regulation
(Chen et al. 2005b, c), because transcription
is a slow process (Castillo-Davis et al.
2002) and thus long introns would interfere with the need for rapid expression
and regulation (Gerhart et al. 1998; Altuvia
and Wagner 2000). Here then we ask whether human putative antisense
transcripts are unusual in this regard. If this were the case, it would
suggest that human antisense transcripts might function more frequently
in antisense regulation. Because the numbers of putative SA genes are too
small in rats, chickens, and nematodes, we performed this analysis only
in humans, mice, and flies. As shown in Figure 5, in
all three genomes, antisense genes have significantly shorter introns than
do any other category of genes (S, AL, SL and NBD; see Methods
for their classification). Notably, the difference of average intron lengths
between antisense and other genes is even more evident in mice and flies
than in humans (Fig. 5). Moreover, as expected, in both
human and mouse genomes, the 347 Human–Mouse (HM)-conserved SA pairs in
which at least one member has an ortholog in both species (see Methods)
have significantly shorter introns than do nonconserved antisense genes
and other genes (Fig. 5). Thus, it might be a common
feature in all the genomes that antisense transcripts have unusually short
introns.
Figure 5: Comparison of the average intron lengths between antisense
and other categories of genes in the human, mouse, and fly genomes.
Figure 5. Comparison of the average intron lengths between antisense and other categories of genes in the human, mouse, and fly genomes.
The mean values with their 95% confidence intervals of the logarithm (log) values of the intron lengths are shown. We use "independent samples t-test" to analyze the logarithm values of the intron lengths to determine significance (P-value) in difference of intron lengths between antisense and other genes. In all three genomes, antisense (A) genes have significantly shorter introns (P < 10-4 in the mouse and fly genomes; P < 0.01 in the human genome) than do any other category of genes (S, AL, SL, and NBD; see Methods for their classification). Notably, the difference of average intron lengths between antisense and other genes is even more significant in mice and flies than in humans. Moreover, as expected, in both human and mouse genomes, the average intron length of antisense genes in the 347 HM-conserved SA pairs (HMC_A) is significantly shorter than that of the remaining antisense genes (Non_HMC_A), and than those of the other categories of genes. Similar patterns were observed in the analysis of the total length of intron sequences (data not shown).
Taken together, our results show no evidence that SA transcripts in humans are functionally unusual compared with those in other genomes. Thus, potentially functional SA transcripts might be prevalent in almost all the metazoan genomes.
Do humans have a richer transcriptome than other mammals?
The subsampling simulations suggest that the diversity of the transcriptomes of mouse and human are comparable. This being so, we need then to ask whether humans have more transcripts, as, if they do, they would then be likely to have a greater number of SA pairs. The fact that humans have more transcript sequences deposited in public databases compared with other mammals is not especially informative, as this must in some part be owing to the fact that the human transcriptome is the best studied one. Indeed, recent studies suggest that the human, mouse, and rat genomes encode similar numbers of genes (Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002; Gibbs et al. 2004; International Human Genome Sequencing Consortium 2004) and have similar numbers of alternative transcripts (Brett et al. 2002; Harrington et al. 2004); thus a priori we should expect similarly sized transcriptomes. However, any such conclusion must be somewhat provisional as it remains unknown just how big each transcriptome is, and recent studies suggest that we have missed many transcripts in each genome. Even in the well-studied human genome, tiling-array studies indicate that there are approximately twice as many nucleotides in poly(A)+ transcripts as are currently annotated in transcript databases (Kapranov et al. 2002; Bertone et al. 2004; Cawley et al. 2004; Kampa et al. 2004; Schadt et al. 2004; Cheng et al. 2005; Johnson et al. 2005; Kapranov et al. 2005). Similarly, a large effort to isolate mouse full-length cDNAs (Okazaki et al. 2002) found that >65% of the cloned full-length cDNAs (39,694 out of 60,770) are novel.
While there is no evidence that humans have a greater number of different transcripts, might they yet have more antisense transcripts?
As the majority of antisense transcripts are noncoding RNAs (Cawley et al. 2004; Chen et al. 2004), then, are noncoding RNAs much more abundant in humans than in mice? The best current evidence suggests not. Okazaki et al. (2002) found that >47% (15,815) of a nonredundant set of 33,409 full-length mouse cDNAs are apparently noncoding. This is comparable with 50% (5481) of the 10,897 clusters of human full-length cDNA sequences that appear to be noncoding (Ota et al. 2004). Perhaps, however, are human ncRNAs more enriched in putative SA transcripts? As shown in Table 1, 23.7% of human ncRNA genes appear to be putative SA transcripts, close to that (22.0%) of mouse ncRNA genes.
Taken together, we see no reason to suppose that the human
transcriptome is larger than that of mice, with regard to either protein-coding
or noncoding transcripts. Therefore, there is no evidence that humans
have a much greater number of SA pairs than any other mammal. It is reasonable
to suggest that with extensive large-scale cDNA sequencing effort, a similar
high abundance of SA transcripts would be detected in nonhuman genomes.
Indeed, based on 104,876 FANTOM2 mouse cDNA and public mRNA sequences (Okazaki
et al. 2002), Kiyosawa et al. (2003) observed
that 15.1% of the assembled mouse transcriptional units (TUs) might form
SA pairs; two years later, based on 158,807 FANTOM3 mouse cDNA and public
mRNA sequences (Carninci et al. 2005), Katayama
et al. (2005) very recently reported that 28.7% of the assembled mouse
TUs (with overlapping cDNA evidence) might form SA pairs.
Conclusions:
In summary, although antisense regulation could contribute to organismic
complexity and we did observe that the abundance of antisense transcripts
varies between multicellular animals, there is no prima facie evidence
that sense–antisense regulation in cis is any more common in humans than
in other vertebrates, or is always more common in vertebrates than in invertebrates.
Moreover, while brain tissue appears to be enriched for SA transcripts
in both mouse and human, the human brain shows no greater enrichment
than the mouse brain. We therefore propose that the difference in complexity
between mouse and human is unlikely to be owing to different rates of sense–antisense
regulation, assuming, as seems parsimonious, that the human transcriptome
is no larger than that of mouse. The apparent abundance of antisense transcription
in flies and the dearth of it in worm are unexplained.
Methods
Identification of transcript clusters in the six genomes
We used the same protocol described in our previous study (Chen
et al. 2004) to identify transcript clusters (i.e., genes) in the human
(Homo sapiens; an updated version) (Sun et al.
2005), mouse (Mus musculus) (Sun et al. 2005),
rat (Rattus norvegicus), chicken (Gallus gallus), fruit fly
(Drosophila melanogaster), and nematode (Caenorhabditis elegans)
genomes based on recent versions of databases. In brief, transcript clusters
were created based on the mRNA and EST sequences downloaded from UniGene
(Schuler et al. 1996) database (human Build #175;
mouse Build #141; rat Build #139; chicken Build #26; fly Build #35; nematode
Build #20) alignments to the relevant genome (human Build 35.1; mouse Build
33.1; rat Build 3.1; chicken Build 1.1; fly Build 4.0; nematode Wormbase
Release WS138). CAP3 (Huang and Madan 1999) and BLAT
(Kent 2002) were used for transcript assembling and
genome mapping, respectively (Chen et al. 2004;
Sun
et al. 2005). The transcript sequences and alignments were filtered
stringently to ensure the correct orientation: (1) Only the transcripts
whose correct orientation could be determined were selected for the study.
mRNA sequences had to have at least an annotated protein-coding region
(CDS), a poly(A) tail (namely, containing a stretch of at least
10 As at the 3'-end of a sequence), or a poly(A) signal (namely, containing
one of the six polyadenylation sites, AATAAA, ATTAAA, AATTAA, AATAAT, CATAAA,
and AGTAAA [see Caron et al. 2001], within the last
50 bp of the 3'-end of a sequence). ESTs and any other sequences had to
have a poly(A) tail and/or a poly(A) signal if having CDS, or both a poly(A)
tail and a poly(A) signal if without CDS. (2) All transcript sequences
having suspicious splice sites (e.g., CT-AC, CT-GC, and GT-AT, which are
reverse complements of the typical splice donor and acceptor sites GT-AG,
GC-AG, and AT-AC, respectively) were discarded. The conditions for genome
alignment are: Identity ~96%, Coverage ~70%, and Alignment ~97%. The transcript
sequences representing highly abundant and tandem duplicate genes such
as immunoglobulins and T-cell receptors were excluded. All transcript sequences
aligned to the same genomic locus were assembled into one transcript cluster.
After assembly, all clusters that contained only one sequence that did
not span an intron were excluded.
Classification of bidirectional transcript cluster pairs
As in our previous study (Chen et al. 2004, 2005a, b,c; Sun et al. 2005), the transcript clusters were classified according to the transcribed pattern in the genomes. Clusters containing at least one pair of transcript sequences transcribed from opposite strands of the same genomic locus were called "bidirectional (BD) clusters," while the remaining clusters containing only one-directional transcripts were called "nonbidirectional (NBD) clusters." We further separated each BD cluster into two new clusters (a cluster pair) based on their overlapping patterns: Sense (S) and antisense (A) clusters form putative sense–antisense (SA) pairs with exon overlaps (identity ~94%), while the sense-like (SL) and antisense-like (AL) clusters form non-exon-overlapping bidirectional (NOB) pairs without exon overlaps.
In our previous studies (Chen et al. 2004,
2005a, b,c), we defined the S and A or SL and AL genes in each BD gene
pair mainly based on a conventional concept (e.g., Lipman
1997) that the S (or SL) gene should exist in more tissues and/or be expressed
at a higher level, and thus would have been detected more frequently (i.e.,
having more transcript sequences deposited in the expressed sequence databases)
than its A (or AL) partner. Nevertheless, there is another (even more)
common notion that almost all sense genes are protein-coding genes, whereas
antisense genes might be coding or noncoding RNA (Kumar
and Carmichael 1998; Vanhee-Brossollet and Vaquero
1998; Kiyosawa et al. 2003). The fact that >90%
of the defined S (SL) genes in our previous study (Chen
et al. 2004) are protein-coding genes (i.e., with annotated CDS regions)
is in accord with this notion. However, in a few pairs, the defined S (or
SL) lacks CDS, while the corresponding A (or AL) partner has CDS. Thus,
recently we revised the previous rules as follows (Sun et
al. 2005):
(1) For the SA (or NOB) pairs in which one member has CDS while
the other lacks CDS, define the one with CDS as the S (or SL) and the other
as the A (or AL).
(2) For the remaining SA (NOB) pairs, the previous rules (Chen et al. 2004) are applied:
(i) define the one containing more transcript sequences as the S
or SL cluster, the other as the A or AL cluster;
(ii) if the sequence numbers were the same, define the one with
more mRNA sequences as the S or SL cluster, the other as the A or AL cluster;
(iii) if their mRNA sequence numbers were still the same, define
the one with intron-spanning sequence(s) as the S or SL cluster, while
the other one without such intron-spanning sequence(s) would be the A or
AL cluster.
If none of the above conditions are satisfied, define the one mapped to the sense strand of chromosome as the S or SL cluster and the other as the A or AL cluster.
After such separation, five categories of unique gene clusters were obtained: S, A, SL, AL, and NBD. A total of 27,333 human, 19,100 mouse, 11,332 rat, 7390 chicken, 10,542 fly, and 14,406 nematode unique genes were identified, each of which represents a single protein- or RNA-coding gene, of which 22.7% (6194) human, 11.6% (2212) mouse, 4.8% (548) rat, 4.8% (356) chicken, 17.2% (1814) fly, and 0.5% (76) nematode unique genes form 3097, 1106, 274, 178, 907, and 38 putative SA pairs, respectively. The full list of the putative SA gene pairs in each genome with information of genomic loci and their representative transcripts is available in Supplemental Tables 2–7.
Analysis of evolutionary conservation of putative SA pairs in the human, mouse, rat, and chicken genomes
As described previously (Sun et al. 2005), we examined ortholog pairs between mouse and human that were reciprocal best "hits" (matches) between the two genomes. We combined the ortholog pairs from the Mouse Genome Informatics Web Site (ftp://ftp.informatics.jax.org/pub/reports/HMD_HumanSequence.rpt; December 2004) and Ensembl MartView (http://www.ensembl.org/Multi/martview; December 2004). By comparing sequence IDs in our mouse and human gene sets with those in the combined ortholog data set, we obtained 11,931 one-to-one human–mouse ortholog pairs in our data sets. Among them, 2681 genes belong to sense–antisense transcripts in the human genome, and so do 1210 genes in the mouse genome. Of these, 347 putative SA pairs in which at least one member has an ortholog in both the human and mouse genomes are conserved in putative SA form in both genomes, and were called HM-conserved putative SA pairs (Sun et al. 2005). Owing to the facts that (1) the number of putative SA pairs in the mouse genome (even in the human genome) is significantly underestimated because of the limitation of qualified transcript sequences, and (2) many antisense transcripts are ncRNAs that are not included in the human–mouse ortholog databases, the number of HM-conserved putative SA pairs might be seriously underestimated. Note that, in the 347 HM-conserved putative SA pairs, it is not necessary that both members be ortholog transcripts. As described in the text, we selected 11,931 one-to-one human–mouse ortholog transcript (unique) sequence pairs, and found that 142 of them form 71 SA pairs in both species.
Similarly, we identified all the human–rat ortholog gene pairs in our data set based on the human–rat ortholog pairs from Ensembl MartView (http://www.ensembl.org/Multi/martview; March 2006). In some cases, one human gene might have several different ortholog genes in rats, and vice versa. To simplify the analysis, we excluded all the one-to-multiple or multiple-to-multiple ortholog pairs (including all the transcripts that belong to these ortholog genes) from the analysis. After such treatment, we identified 3537 one-to-one human–rat ortholog gene pairs in our data set. To avoid the potential bias on the analysis of SA proportion due to the difference in transcript number between the paired human and rat ortholog gene clusters, we further selected a single ortholog transcript sequence with the longest size from each ortholog gene cluster to represent that ortholog gene. Thus, we obtained 3537 one-to-one human–rat ortholog transcript sequence pairs, so that humans and rats have the same number of unique ortholog transcripts. We found that 14 ortholog transcripts form seven SA pairs in human, while 10 ortholog transcripts form five SA pairs in rats.
In addition, we identified 905 one-to-one ortholog transcripts between humans and chickens from Ensembl MartView (http://www.ensembl.org/Multi/martview; March 2005). We found no SA pairs formed between the ortholog transcripts, while the same small number of SA pairs (nine pairs) formed between the ortholog transcripts and nonortholog transcripts in both genomes.
Investigation of coexpression and inverse expression patterns of putative SA pairs in the human and mouse genomes
As described in our previous study (Chen et al. 2005a), we evaluated the coexpression and inverse expression of SA pairs at the whole genome level based on their expression profiles obtained from SAGE (serial analysis of gene expression) data (Velculescu et al. 1995). In our recent study (Sun et al. 2005), we have made some modifications on our established procedures (Chen et al. 2005a). We downloaded SAGE expression data (NlaIII SAGE libraries) from the NCBI GEO platform (http://www.ncbi.nlm.nih.gov/projects/geo; December 2004), including 245 human SAGE libraries and 76 mouse SAGE libraries. For both human and mouse, we constructed 16 tissue/cell-type SAGE library (including brain, liver, and embryonic stem cells that are available for both human and mouse) combinations to determine coexpression of gene pairs, and constructed 50 comparison cases, each of which is a pair of two states (two different unique SAGE libraries) at different developmental, differentiation, physiologic, or pathological stages/conditions of the same tissue, to determine inverse expression of gene pairs (Sun et al. 2005). Tag counts were converted to counts per million (cpm), and the expression data were cross-linked to our genes by extracting the 3'-most NlaIII SAGE tag for each transcript in the genes (i.e., transcript clusters). Only tags that matched to a single gene were taken into account. All SAGE tags mapped to the same gene were then combined, and the sum of their counts per million in a tissue/cell represented the expression level of that gene in that tissue/cell. To eliminate the potential sequence errors in low-count SAGE tags (Chen et al. 2005a), we kept only the genes that have an expression level of at least 3 cpm across all the 16 tissues (i.e., the sum of expression levels in the 16 tissues).
To evaluate the coexpression of an SA pair, we adopted an index of coexpression between two genes a and b (ICEa,b) defined by Lercher et al. (2002) that is the number of tissues with common positive expression, weighted by the geometric mean of the two breadths. Note that, unlike the conventional "Pearson correlation coefficient (r)," coexpression in this context refers not to the extent to which levels of transcripts are correlated, but rather to the coupled presence or absence of the transcripts across different tissues or cells (Chen et al. 2005a). ICEa,b ranges from 0 (no coexpression) to 1 (perfect coexpression). We found that it is higher than the 99% confidence intervals (i.e., P < 0.01) of the average ICEa,b values of all the possible gene pairs when ICEa,b ~0.6 in humans or ~0.5 in mice. Thus, we define two genes (a and b; e.g., the sense and antisense in a putative SA pair) to be coexpressed if the ICEa,b ~0.6 in humans or ~0.5 in mice (Sun et al. 2005).
To measure inverse-expression pattern in a more quantitative way compared with that described previously (Chen et al. 2005a), we defined (Sun et al. 2005) a new index of inverse expression between two genes a and b (IIEa,b) that is the number of comparison cases in which the two partners exhibit an inverse expression pattern between two states (i.e., a member is expressed at a higher level at state 1 but a lower level at state 2 compared with its partner; and vice versa) and a significantly greater change of the relative expression ratio of gene a to gene b between two states than expected by chance (i.e., exceeding the 99% confidence interval of the mean changes of all the randomly formed gene pairs), weighted by the geometric mean of the two presence breadths. A given gene with positive expression in at least one of the two states of a comparison case would be recognized as being presented in that case. The presence breadth for each gene is the number of cases in which the gene is presented. IIEa,b ranges from 0 (no inverse expression) to 1 (perfect inverse expression). Similarly, we define two genes (a and b; e.g., the sense and antisense in a putative SA pair) to be inversely expressed if the IIEa,b is higher than the 99% confidence intervals (i.e., P < 0.01) of the average IIEa,b values of all the randomly formed gene pairs (Sun et al. 2005).
To examine whether coexpression and inverse expression of human and mouse SA pairs are more frequent than expected by chance, we generated control data sets (i.e., "non-expression-level-dependent randomly-replaced" [NEDRR] pseudo SA pair sets) (see Chen et al. 2005a) by replacing each gene in the natural SA set with a randomly picked gene from non-SA genes regardless of its expression level. We compared the coexpression or inverse expression rate of the natural SA set with those from 100,000 pseudo SA sets.
The detailed list of the 3097 human and 1106 mouse putative SA gene pairs with information of evolutionary conservation, coexpression, and inverse expression is available in Supplemental Tables 2 and 3, respectively.
Analysis of the proportion of SA genes/sequences in human and mouse brain, liver, and embryonic stem cells
The 3'-most NlaIII SAGE tag was extracted from each qualified transcript in human (371,528 transcripts in total) (Table 1) and in mouse (110,076 transcripts in total) (Table 1), respectively. After removing those tags that matched to more than one gene, the remaining extracted tags were aligned to the real experimental SAGE tags collected in each of the three tissue/cell-type SAGE library combinations (brain, liver, and embryonic stem cells) in human and mouse, respectively. There are a total of 219,752, 143,616, 135,203, 60,684, 43,570, and 54,183 qualified transcripts that have SAGE expression data to support their expression in human brain, human liver, human embryonic stem cell, mouse brain, mouse liver, and mouse embryonic stem cell, respectively. The transcripts that have SAGE tags detected in a given tissue/cell-type were used to assemble transcript clusters (i.e., genes) as described above for the given tissue or cell type. A total of 21,142, 10,253, 9533, 11,632, 7712, and 10,284 transcript clusters (i.e., genes) were assembled in human brain, human liver, human embryonic stem cell, mouse brain, mouse liver, and mouse embryonic stem cell, respectively, of which 2902, 628, 544, 738, 308, and 488 form SA pairs in the given tissue/cell type, respectively. In addition, we detected 4951, 2465, and 2948 one-to-one human–mouse ortholog genes that are expressed in both species’ brain, liver, and embryonic stem cells, respectively; of them, 3.0% (148/4951), 1.4% (35/2465), and 2.1% (63/2948) form SA pairs in both species in the relevant tissue, respectively.
Analysis of the average intron lengths of five gene categories in
the human, mouse, and fly genomes
As described above (Chen et al. 2005b,
c), to avoid non-intron-spanning EST transcripts that might skew the result
of intron-length analysis, we only included intron-spanning genes for the
study. 1757 A, 2929 S, 864 AL, 1630 SL, and 14,851 NBD genes were collected
in humans; 642 A, 1046 S, 304 AL, 632 SL, and 14,325 NBD genes were collected
in mice; 743 A, 865 S, 429 AL, 503 SL, and 6919 NBD genes were collected
in flies.
Acknowledgments
We thank Janet D. Rowley for her support on this study. We also thank
John S. Mattick for constructive discussion on the relationship between
the prevalence of antisense transcription and organismic complexity, W.
James Kent and Xiaoqiu Huang for their help in genome BLAT analysis and
CAP3 assembly, and three anonymous referees for constructive comments.
This work was supported by the G. Harold and Leila Y. Mathers Charitable
Foundation (J.C.), Cancer Research Foundation Young Investigator Award
(J.C.), NIH grant CA84405 (Janet D. Rowley), and the Spastic Paralysis
Foundation of the Illinois, Eastern Iowa Branch of Kiwanis International
(Janet D. Rowley). L.D.H. was supported by the UK Biotechnology and Biological
Sciences Research Council. G.G.C. was supported by NIH grant GM066816.
Footnotes
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5210006
1. Frenster JH, "Kissing Chromosomes and Paired Sense-Antisense RNA Synthesis", 71st Cold Spring Harbor Symposium on Quantitative Biology, "RegulatoryRNAs". Program Book, page 62, May 31-June 5, 2006.
2. Parada LA, McQueen PG, and Misteli T, "Tissue-specific spatial organization of genomes", Genome Biology, vol. 5, no. 7, r44 (June 21, 2004).
3. Osborne CS, Chakalova L, Brown KE, Carter D, Horton A, Debrand E, Goyenechea B, Mitchell JA, Lopes S, Reik W, and Fraser P, "Active genes dynamically colocalize to shared sites of ongoing transcription", Nature Genetics, vol. 36, no. 10, pp. 1065-1071 (October, 2004).
4. Frenster JH, and Hovsepian JA, "Ultrastructure of Closed Loops within Euchromatin of Isolated Lymphocyte Nuclei", Molec. Biol. Cell, vol. 15, suppl., p. 450a (November, 2004).
5. Muller WG, Rieder D, Kreth G, Cremer C, Trajanoski Z, and McNally JG, "Generic Features of Tertiary Chromatin Structure as Detected in Natural Chromosomes", Molec. Cell. Biol., vol. 24, no. 21, pp. 9359-9370 (November, 2004).
6. Kioussis D, "Gene regulation: Kissing Chromosomes", Nature
vol. 435, no. 7042, pp. 579-580 (June 2, 2005).
http://www.nature.com/nature/journal/v435/n7042/full/435579a.html
7. Spilianakis CG, Lalioti MD, Town T, Lee GR, and Flavell RA, "Interchromosomal
associations between alternatively expressed loci", Nature vol. 435, no.
7042, pp. 637-645 (June 2, 2005).
http://www.nature.com/nature/journal/v435/n7042/abs/nature03574.html
8. Frenster JH, and Hovsepian JA, "Ultrastructure of Euchromatin Contact Points between the Closed Loops of Adjacent Interphase Chromosomes", Molec. Biol. Cell, vol. 16, suppl., p. 1280a (December, 2005).
9. Xu N, Tsai C-L, and Lee JT, "Transient Homologous Pairing Marks the Onset of X Inactivation", Science vol. 311, no. 5764, pp. 1149-1152 (February 24, 2006).
10. Carrel L, "X"-Rated Chromosomal Rendezvous", Science, vol. 311, no. 5764, 1107-1109 (February 24, 2006).
11. Bacher CP, Guggiari M, Brors B, Augui S, Clerc P, Avner P, Eils R, and Heard E, "Transient colocalization of X-inactivation centres accompanies the initiation of X inactivation"., Nature Cell Biology 8, 293 - 299 (March, 2006).
12. LeBrasseur N, "XX meetings", J. Cell Biol. 172: 483a-483a, (March, 2006).
13. Morey C, and Bickmore W, "Sealed with a X", Nature Cell Biology 8, 207 - 209 (March, 2006).
14. Branco MR, Pombo A, "Intermingling of Chromosome Territories in Interphase Suggests Role in Translocations and Transcription-Dependent Associations", PLoS Biology, vol. 4, issue 5, e138 (May, 2006).
15. Faghihi MA, and Wahlestedt C, "RNA interference is not involved in natural antisense mediated regulation of gene expression in mammals", Genome Biology 2006, vol. 7: issue 5, R38 (May 9, 2006).
16. Kronenberg LH, and Humphreys T, "Double-Stranded Ribonucleic
Acid in Sea Urchin Embryos", Biochemistry,
vol. 11, no. 11, pp. 2020-2026 (May, 1972).
(a) The proportion of genes involved in putative antisense transcription
in a given mRNA transcript
dataset.
(b) The proportion of transcript sequences involved in putative antisense transcription in a given mRNA transcript dataset.
(c) The proportion of genes involved in putative antisense transcription in a given protein-coding (i.e., with CDS) transcript dataset.
(d) The proportion of transcript sequences involved in putative antisense transcription in a given protein-coding transcript dataset.
Supplementary Figure 2. Relationship between the number of
non-ortholog transcripts and the
percentage of human-rat ortholog transcripts that form SA pairs
with non-ortholog transcripts, or
between the number of non-ortholog transcripts and the proportion
of SA transcripts within the
selected non-ortholog transcripts in humans and rats respectively.
http://www.genome.org/cgi/content/full/gr.5210006/DC1
(a) We randomly selected a set number of sequences from the whole
qualified non-ortholog transcript sequence dataset for each genome, and
then combined with the 3537 one-to-one human-rat ortholog transcripts and
determined the SA pairs formed between ortholog and non-ortholog transcripts.
Thus, the
percentages of the 3537 one-to-one human-rat ortholog transcripts
that form SA pairs with non-ortholog
transcripts were determined in each species.
(b) We randomly selected a set number of qualified non-ortholog transcript
sequences and determined the SA pairs formed between non-ortholog transcripts.
Thus, the percentages of SA transcripts within the selected non-ortholog
transcripts were determined in each species. For each point, we repeated
the analysis independently 1000 times. The mean values with their standard
deviation (mean ± SD) are
shown. A similar pattern was observed regarding the proportions
of SA genes (rather than sequences; data not shown).
Supplementary Table 1. Estimated proportions of SA genes and
sequences based on different
numbers of qualified transcript sequences in the six individual
genomes a .
http://www.genome.org/cgi/content/full/gr.5210006/DC1
Sequence Human
Mouse Rat
Chicken Fly
Nematode
Number
Estimated SA gene proportion
10,000 1.06
± 0.17 1.51 ± 0.20 1.47 ±
0.19 2.14 ± 0.22 6.98 ± 0.40
0.32 ± 0.07
15,000 1.57
± 0.18 2.22 ± 0.19 2.12 ±
0.19 3.17 ± 0.19 9.76 ± 0.36
0.43 ± 0.05
20,000 2.05
± 0.18 2.90 ± 0.20 2.74 ±
0.17 4.22 ± 0.13 12.09 ± 0.31
0.52 ± 0.01
20,194
0.53
22,845
4.82
25,000 2.52
± 0.18 3.54 ± 0.19 3.33
± 0.15
13.97 ± 0.26
30,000 2.98
± 0.18 4.15 ± 0.20 3.91
± 0.12
15.44 ± 0.20
35,000 3.43
± 0.19 4.74 ± 0.19 4.44
± 0.09
16.51 ± 0.13
38,909
4.84
39,612
17.21
40,000 3.87
± 0.19 5.30 ± 0.19
50,000 4.71
± 0.19 6.39 ± 0.18
75,000 6.70
± 0.20 8.79 ± 0.15
100,000 8.54 ±
0.20 10.83 ± 0.09
110,076
11.58
200,000 14.81 ± 0.18
300,000 19.71 ± 0.13
371,528
22.66
Estimated SA sequence proportion
10000 1.09
± 0.18 1.52 ± 0.20
1.76 ± 0.81 2.06 ± 0.25
6.82 ± 0.69 0.47 ± 0.11
15000 1.65
± 0.20 2.25 ± 0.25
2.85 ± 1.01 3.01 ± 0.26
9.75 ± 0.72 0.69 ± 0.08
20000 2.17
± 0.22 2.96 ± 0.22
3.64 ± 1.03 3.92 ± 0.16
12.38 ± 0.65 0.89 ± 0.02
20194
0.90
22845
4.43
25000 2.70
± 0.24 3.64 ± 0.23
4.38 ± 1.01
14.52 ± 0.61
30000 3.22
± 0.26 4.34 ± 0.29
5.25 ± 0.86
16.23 ± 0.48
35000 3.72
± 0.28 4.99 ± 0.31
5.98 ± 0.58
17.59 ± 0.36
38909
6.46
39612
18.61
40000 4.23
± 0.31 5.60 ± 0.31
50000 5.26
± 0.33 6.80 ± 0.37
75000 7.74
± 0.33 9.42 ± 0.40
100000 10.02 ± 0.35
11.53 ± 0.23
110076
12.24
200000 17.63 ± 0.39
300000 23.56 ± 0.28
371528
27.04
a We randomly selected a set number of sequences from the whole qualified transcript sequence dataset for each genome and then followed the same procedure (see Methods) to estimate the proportion of SA genes and of SA sequences. For each number of transcripts, we repeated the analysis independently 1000 times and calculated the mean value of SA proportions. The mean values with their standard deviation (mean ± SD) are shown. The (single) data in bold refer to the whole sequence set and the relevant SA proportions for the corresponding genome.
Identification of transcript clusters in the six genomes
We employed the same protocol described in our previous study (Chen et al. 2004) to identify transcript clusters (i.e., genes) in the human (Homo sapiens; an updated version; Sun et al. 2005), mouse (Mus musculus; Sun et al. 2005), rat (Rattus norvegicus), chicken (Gallus gallus), fruit fly (Drosophila melanogaster) and nematode (Caenorhabditis elegans) genomes based on recent versions of databases. In brief, transcript clusters were created based on the mRNA and EST sequences downloaded from UniGene (Schuler et al. 1996) database (human Build #175; mouse Build #141; rat Build #139; chicken Build #26; fly Build #35; nematode Build #20) alignments to the relevant genome (human Build 35.1; mouse Build 33.1; rat Build 3.1; chicken Build 1.1; fly Build 4.0; nematode Wormbase Release WS138). CAP3 (Huang and Madan 1999) and Blat (Kent 2002) were used for transcript assembling and genome mapping respectively (Chen et al. 2004; Sun et al. 2005). The transcript sequences and alignments were filtered stringently to insure the correct orientation:
(1) only the transcripts whose correct orientation could be determined
were selected for the study. mRNA sequences had to have at least an annotated
protein coding region (CDS), a poly(A) tail (namely, containing
a stretch of at least 10 As at 3' end of a sequence), or a poly(A) signal
(namely, containing one of the six polyadenylation sites, AATAAA, ATTAAA,
AATTAA, AATAAT, CATAAA and AGTAAA (see Caron et al. 2001), within the last
50 bp of 3' end of a sequence); ESTs and any other sequences had to have
a poly(A) tail and/or a poly(A) signal if having CDS, or both a poly(A)
tail and a poly(A) signal if
without CDS;
(2) all transcript sequences having suspicious splice sites (e.g., CT-AC, CT-GC and GT-AT, which are reverse complement of the typical splice donor and acceptor sites GT-AG, GC-AG and AT-AC, respectively) were discarded. The conditions for genome alignment are: Identity ~96%, Coverage ~70% and Alignment ~97%. The transcript sequences representing highly abundant and tandem duplicate genes such as immunoglobulins and T-cell receptors were excluded. All transcript sequences aligned to the same genomic locus were assembled into one transcript cluster. After assembly, all clusters that contained only one sequence that did not span an intron were excluded.
Classification of bidirectional transcript cluster pairs
As in our previous study (Chen et al. 2004,
2005a,b,c; Sun et al. 2005) the transcript clusters
were classified according to the transcribed pattern in the genomes.
Clusters containing at least
one pair of transcript sequences transcribed from opposite strands
of the same genomic locus
were called ‘bidirectional (BD) clusters’, while the remaining clusters
containing only one-directional
transcripts were called ‘non-bidirectional (NBD) clusters’. We further
separated each BD cluster into two new clusters (a cluster pair) based
on their overlapping patterns: sense (S) and antisense (A) clusters form
putative sense-antisense (SA) pairs with exon overlaps (identity ~94%),
while the sense-like (SL) and antisense-like (AL) clusters form non-exon-overlapping
bidirectional (NOB) pairs without exon overlaps.
In our previous studies (Chen et al. 2004,
2005a,b,c), we defined the S and A or SL and AL
genes in each BD gene pair mainly based on a conventional concept
(e.g., Lipman 1997) that the
S (or SL) gene should exist in more tissues and/or be expressed
at a higher level, and thus would
have been detected more frequently (i.e., having more transcript
sequences deposited in the
expressed sequence databases) than its A (or AL) partner. Nevertheless,
there is another (even
more) common notion that almost all sense genes are protein-coding
genes whereas antisense
genes might be coding or noncoding RNA (Kumar and
Carmichael 1998; Vanhee-Brossollet and
Vaquero 1998; Kiyosawa
et al. 2003). The fact that over 90% of the defined S (SL) genes in
our
previous study (Chen et al. 2004) are protein-coding
genes (i.e., with annotated CDS regions) is
in accord with this notion. However, in a few pairs, the defined
S (or SL) lacks CDS while the
corresponding A (or AL) partner has CDS. Thus, recently we revised
the previous rules as
follows (Sun et al. 2005):
(1) For the SA (or NOB) pairs in which one member has CDS while the other lacks CDS, define the one with CDS as the S (or SL) and the other as the A (or AL);
(2) For the remaining SA (NOB) pairs, the previous rules (Chen et al. 2004) are applied:
(i) define the one containing more transcript sequences as the S
or SL cluster, the other as the A or AL
cluster;
(ii) if the sequence numbers were the same, define the one with more mRNA sequences as the S or SL cluster, the other as the A or AL cluster;
(iii) if their mRNA sequence numbers were still the same, define the one with intron-spanning sequence(s) as the S or SL cluster while the other one without such intron-spanning sequence(s) would be the A or AL cluster.
If none of above conditions were satisfied, define the one mapped to the sense strand of chromosome as the S or SL cluster and the other as the A or AL cluster.
After such separation, five categories of unique gene clusters were
obtained: S, A, SL, AL and NBD. A total of 27,333 human, 19,100 mouse,
11,332 rat, 7390 chicken, 10,542 fly and 14,406 nematode unique genes were
identified, each of which represents a single protein- or RNA-coding gene,
of which 22.7% (6194) human, 11.6% (2212) mouse, 4.8% (548) rat, 4.8% (356)
chicken, 17.2% (1814) fly and 0.5% (76)
nematode unique genes form 3097, 1106, 274, 178, 907 and 38 putative
SA pairs, respectively.
The full list of the putative SA gene pairs in each genome with information
of genomic loci and
their representative transcripts is available in online Supplementary
Tables 2-7.
Analysis of evolutionary conservation of putative SA pairs in
the human, mouse, rat and chicken genomes
As described previously (Sun et al. 2005), we
examined ortholog pairs between mouse and human that were reciprocal best
“hits” (matches) between the two genomes. We combined the ortholog pairs
from Mouse Genome Informatics Web Site:
(ftp://ftp.informatics.jax.org/pub/reports/HMD_HumanSequence.rpt;
December 2004) and Ensembl MartView:
(http://www.ensembl.org/Multi/martview;
December 2004).
By comparing sequence IDs in our mouse and human gene sets with those
in the combined ortholog dataset, we obtained 11,931 one-to-one human-mouse
ortholog pairs in our datasets. Among them, 2681
genes belong to sense-antisense transcripts in the human genome,
and so do 1210 genes in the
mouse genome. Of these, 347 putative SA pairs in which at least
one member has an ortholog in
both the human and mouse genomes are conserved in putative SA form
in both genomes, and
were called HM-conserved putative SA pairs (Sun et
al. 2005). Due to the facts that:
a) the number of putative SA pairs in the mouse genome (even in the
human genome) are significantly
underestimated because of the limitation of qualified transcript
sequences, and
b) many antisense transcripts are ncRNAs that are not included in
the human-mouse ortholog databases, the number of HM-conserved putative
SA pairs might be seriously underestimated. Note that, in the 347
HM-conserved putative SA pairs, it is not necessary that both members
are ortholog transcripts.
As described in the text, we selected 11,931 one-to-one human-mouse
ortholog transcript
(unique) sequence pairs, and found that 142 of them form 71 SA pairs
in both species.
Similarly, we identified all the human-rat ortholog gene pairs in
our dataset based on the human-rat
ortholog pairs from Ensembl MartView (http://www.ensembl.org/Multi/martview;
March
2006). In some cases, one human gene might have several different
ortholog genes in rats, and
vice versa. To simplify the analysis, we excluded all the one-to-multiple
or multiple-to-multiple
ortholog pairs (including all the transcripts that belong to these
ortholog genes) from the
analysis. After such treatment, we identified 3537 one-to-one human-rat
ortholog gene pairs in
our dataset. To avoid the potential bias on the analysis of SA proportion
due to the difference in
transcript number between the paired human and rat ortholog gene
clusters, we further selected a
single ortholog transcript sequence with the longest size from each
ortholog gene cluster to
represent that ortholog gene. Thus, we obtained 3537 one-to-one
human-rat ortholog transcript
sequence pairs, so that humans and rats have the same number of
unique ortholog transcripts. We
found that 14 ortholog transcripts form 7 SA pairs in human while
10 ortholog transcripts form 5
SA pairs in rats.
In addition, we identified 905 one-to-one ortholog transcripts between
humans and chickens
from Ensembl MartView (http://www.ensembl.org/Multi/martview;
March 2005). We found no
SA pairs formed between the ortholog transcripts, while the same
small number of SA pairs (9
pairs) formed between the ortholog transcripts and non-ortholog
transcripts in both genomes.
Investigation of co-expression and inverse expression patterns
of putative SA pairs in the
human and mouse genomes
As described in our previous study (Chen et
al. 2005a), we evaluated the co-expression and
inverse expression of SA pairs at the whole genome level based on
their expression profiles
obtained from SAGE (serial analysis of gene expression) data (Velculescu
et al. 1995). In our
recent study (Sun et al. 2005), we have made
some modifications on our established procedures
(Chen et al. 2005a). We downloaded SAGE expression data (NlaIII
SAGE libraries) from the
NCBI GEO platform (http://www.ncbi.nlm.nih.gov/projects/geo;
December 2004), including 245
human SAGE libraries and 76 mouse SAGE libraries. For both human
and mouse, we
constructed 16 tissue/cell-type SAGE library (including brain, liver
and embryonic stem cells
that are available for both human and mouse) combination to determine
co-expression of gene
pairs, and constructed 50 comparison cases, each of which is a pair
of two states (two different
unique SAGE libraries) at different developmental, differentiation,
physiologic or pathological
stages/conditions of the same tissue, to determine inverse expression
of gene pairs (Sun et al.
2005). Tag counts were converted to counts per million (cpm) and
the expression data were
cross-linked to our genes by extracting the 3’-most NlaIII SAGE
tag for each transcript in the
genes (i.e., transcript clusters). Only tags that matched to a single
gene were taken into account.
All SAGE tags mapped to the same gene were then combined and the
sum of their cpm in a
tissue/cell represented the expression level of that gene in that
tissue/cell. To eliminate the
potential sequence errors in low-count SAGE tags (Chen
et al. 2005a), we kept only the genes
that have an expression level of at least 3 cpm across all the 16
tissues (i.e., the sum of
expression levels in the 16 tissues).
To evaluate the co-expression of a SA pair, we adopted an index of
co-expression between two
genes a and b (ICEa,b) defined by Lercher et al. (Lercher
et al. 2002) that is the number of tissues
with common positive expression, weighted by the geometric mean
of the two breadths. Note
that, unlike the conventional ‘Pearson correlation coefficient (r)’,
co-expression in this context
refers not to the extent to which levels of transcripts are correlated,
but rather to the coupled
presence or absence of the transcripts across different tissues
or cells (Chen et al. 2005a). ICEa,b
ranges from 0 (no co-expression) to 1 (perfect co-expression).
We found that it is higher than the
99% confidence intervals (i.e., P < 0.01) of the average
ICEa,b values of all the possible gene
pairs when ICEa,b ~0.6 in humans or ~0.5 in mice. Thus, we define
two genes (a and b; e.g., the
sense and antisense in a putative SA pair) to be co-expressed if
the ICEa,b ?0.6 in humans or ?
0.5 in mice (Sun et al. 2005).
To measure inverse-expression pattern in a more quantitative way
compared with that described
previously (Chen et al. 2005a), we defined
(Sun et al. 2005) a new index of inverse expression
between two genes a and b (IIEa,b) that is the number of comparison
cases in which the two
partners exhibit an inverse expression pattern between two states
(i.e., a member is expressed at a
higher level at state 1 but a lower level at state 2 compared with
its partner; and vice versa) and a
significantly greater change of the relative expression ratio of
gene a to gene b between two
states than expected by chance (i.e., exceeding the 99% confidence
interval of the mean changes
of all the randomly formed gene pairs), weighted by the geometric
mean of the two presence
breadths. A given gene with positive expression in at least one
of the two states of a comparison
case would be recognized as being presented in that case. The presence
breadth for each gene is
the number of cases in which the gene is presented. IIEa,b ranges
from 0 (no inverse-expression)
to 1 (perfect inverse-expression). Similarly, we define two genes
(a and b; e.g., the sense and
antisense in a putative SA pair) to be inversely expressed if the
IIEa,b is higher than the 99%
confidence intervals (i.e., P < 0.01) of the average IIEa,b
values of all the randomly formed gene
pairs (Sun et al. 2005).
To examine whether co-expression and inverse expression of human
and mouse SA pairs are
more frequent than expected by chance, we generated control data
sets (i.e., ‘non-expression-level-
dependent randomly-replaced’ (NEDRR) pseudo SA pair sets;
see Chen et al. 2005a) by
replacing each gene in the natural SA set with a randomly picked
gene from non-SA genes
regardless of its expression level. We compared the co-expression
or inverse expression rate of
the natural SA set with those from 100,000 pseudo SA sets.
The detailed list of the 3097 human and 1106 mouse putative SA gene
pairs with information of
evolutionary conservation, co-expression, and inverse expression
is available in online
Supplementary Table 2 and 3,
respectively.
Analysis of the proportion of SA genes/sequences in human and
mouse brain, liver and
embryonic stem cells
The 3’-most NlaIII SAGE tag was extracted from each qualified transcript
in human (371,528
transcripts in total; Table 1) and in mouse (110,076
transcripts in total; Table 1) respectively.
After removing those tags that matched to more than one gene, the
remaining extracted tags were
aligned to the real experimental SAGE tags collected in each of
the three tissue/cell-type SAGE
library combination (brain, liver and embryonic stem cells) in human
and mouse respectively.
There are a total 219,752, 143,616, 135,203, 60,684, 43,570 and
54,183 qualified transcripts
have SAGE expression data to support their expression in human brain,
human liver, human
embryonic stem cell, mouse brain, mouse liver and mouse embryonic
stem cell, respectively.
The transcripts that have SAGE tags detected in a given tissue/cell-type
were used to assemble
transcript clusters (i.e., genes) as described above for
the given tissue or cell type. A total 21,142,
10,253, 9533, 11,632, 7712 and 10,284 transcript clusters (i.e.,
genes)
were assembled in human
brain, human liver, human embryonic stem cell, mouse brain, mouse
liver and mouse embryonic
stem cell respectively, of which 2902, 628, 544, 738, 308 and 488
form SA pairs in the given
tissue/cell type respectively. In addition, we detected 4951, 2465
and 2948 one-to-one human-mouse
ortholog genes that are expressed in both species’ brain, liver
and embryonic stem cells,
respectively; of them, 3.0% (148/4951), 1.4% (35/2465) and 2.1%
(63/2948) form SA pairs in
both species in the relevant tissue respectively.
Analysis of the average intron lengths of five gene categories
in the human, mouse and fly
genomes
As described before (Chen et al. 2005b,c),
to avoid non-intron-spanning EST transcripts that
might skew the result of intron-length analysis, we only included
intron-spanning genes for the
study. 1757 A, 2929 S, 864 AL, 1630 SL and 14,851 NBD genes were
collected in humans; 642
A, 1046 S, 304 AL, 632 SL and 14,325 NBD genes were collected in
mice; 743 A, 865 S, 429
AL, 503 SL and 6919 NBD genes were collected in flies.
NetworkEditor's Perspective: "Antisense RNA transcipts are Ubiquitous and Regulatory".
In this new study by Miao Sun, Laurence Hurst, Gordon Carmichael, and Jianjun Chen, antisense RNA transcipts are widely dispersed thoughout the genomes of the animal kingdom, and are found in regulatory roles for gene transcription, especially in the brains of humans, rats, mice, chickens, and flies. Possibly because of the operon structure of the worm genome, antisense transcripts are sparsely found in this family.
1. Frenster JH, and Hovsepian JA, "Kissing Chromosomes and Paired Sense-Antisense RNA Synthesis". 71st Cold Spring Harbor Symposium on Quantitative Biology: Regulatory RNAs", Program page 62, May 31-June 5, 2006.
2. Xu N, Tsai C-L, and Lee JT, "Transient Homologous Pairing Marks the Onset of X Inactivation", Science vol. 311, no. 5764, pp. 1149-1152 (February 24, 2006).
3. Hovsepian JA, and Frenster JH, "Sense and Antisense during RNA Initiation of the DNA Transcription Bubble"., Presented at RNA2005, the Tenth Annual Meeting of the RNA Society, Banff, Alberta, Canada, May 24-29, 2005, and published in "RNA2005", p. 279, The RNA Society, Bethesda, MD 20814-3998, (2005).
4. Coudert AE, Pibouin L, Vi-Fane B, Thomas BL, Macdougall M, Choudhury A, Robert B, Sharpe PT, Berda A, and Lezot F, "Expression and regulation of the Msx1 natural antisense transcript during development", Nucleic Acid Research, vol. 33, no. 16, pp. 5208-5216 (2005).
5. Shin JT, Priest JR, Ovcharenko I, Ronco A, Moore RK, C. Burns
CG, and MacRae CA,
"Human-zebrafish
non-coding conserved elements act in vivo to regulate transcription".
6. Zapala MA, Hovatta I, Ellison JA, Wodicka L, Del Rio JA, Tennant
R, Tynan W, Broide RS, Helton R, Stoveken BS, Winrow C, Lockhart DJ, Reilly
JF, Young WG, Bloom FE, Lockhart DJ, and Barlow C,
"Adult mouse brain
gene expression patterns bear an embryologic imprint".
Links to RNA and Biological Causality:
A Brief History of Activator RNA:
Links to
Euchromatin Activator RNA Reviews:
Links to
Euchromatin Activator RNA Research:
Links to Ultrastructural
Probes of DNase I-Sensitive Sites:
Links to
RNA as a Therapeutic Agent:
Links to Hodgkin Lymphoma
Immuno-Pathology:
Links to Activated
T-Lymphocyte Immunotherapy:
Links to Medical
Systems Biology:
Links to Selective
Gene Transcription:
Links to RNA-Induced
Epigenetics:
Links to RNA-Induced
Embryogenesis:
Links to RNA and
Biological Causality:
Links to Reprogramming
and Neoplasia:
"Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".