Published in Science, vol. 308, no. 5725, pp. 1149-1154 (May 20, 2005).
Published online 24 March 2005, Science 10.1126/science.1108625
[DOI: 10.1126/science.1108625]
http://www.sciencemag.org/cgi/content/full/308/5725/1149
http://www.sciencemag.org/cgi/content/abstract/1108625v1

"Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution".

Jill Cheng 1, Philipp Kapranov 1, Jorg Drenkow 1, Sujit Dike 1, Shane Brubaker 1, Sandeep Patel 1, Jeffrey Long 1, David Stern 1, Hari Tammana 1, Gregg Helt 1, Victor Sementchenko 1, Antonio Piccolboni 1, Stefan Bekiranov 1, Dione K. Bailey 1, Madhavan Ganesh 1, Srinka Ghosh 1, Ian Bell 1, Daniela S. Gerhard 2, Thomas R. Gingeras 1, *

1 Affymetrix Inc., Santa Clara, CA 95051, USA.
2 Office of Cancer Genomics, National Cancer Institute Bethesda, MD 20892, USA.

* To whom correspondence should be addressed.
Thomas R. Gingeras , E-mail: tom_gingeras@affymetrix.com




Abstract:

Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-bp resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. 19.4%, 43.7%, and 36.9% of all transcribed sequences were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominately composed of interlaced networks of both poly A+ and A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.



The current classification of protein-coding and noncoding genomic regions is based on intron-exon structures of well-characterized protein-coding genes. Noncoding genomic regions, which account for 98 to 99% of the human genome, consist of introns found within protein-coding transcripts and the intergenic regions between them (1, 2). Recent observations indicate that noncoding regions are transcribed into polyadenylated, stable RNAs that are transported into the cytosol during development (3–8). The ENCODE consortium has suggested transcripts of unknown function (TUFs) as an unofficial name for these unannotated transcribed regions (9).
Transcribed fragments (transfrags) are used to denote array-detected regions of transcription both well-characterized protein-coding genes and TUFs.

While our understanding of poly A+ cytosolic TUFs has increased, much less is known about the synthesis sites of transcripts lacking 3´ polyadenylation (poly A–). Replication-dependent histone genes are currently considered to be the only transcripts synthesized exclusively as poly A– transcripts (10). However, 30 years ago, Milcarek et al. reported that approximately 30% of rapidly labeled polysomal associated RNA in actinomycin D inhibited HeLa cells was poly A– (11). Similarly, Salditt-Georgieff et al. reported that there were three times as many transcripts with 5´ cap structures as poly A+ containing transcripts localized with polysomes of Chinese hamster cells (12). Later studies revealed that many genes are transcribed as poly A+ RNAs, which under specific conditions are processed to reduce or totally remove the 3´ poly A sequences. Such RNAs are called “bimorphic” transcripts (13). The distribution of poly A+ and poly A– transcripts between the nucleus and cytosol is also relatively unexplored.

In this report, we examined approximately 30% of the human genome encoded in 10 human chromosomes (6, 7, 13, 14, 19, 20, 21, 22, X and Y) and mapped the sites of transcription for poly A+ cytosolic RNA derived from 8 cell lines. For one cell line (HepG2), maps were constructed for cytosolic and nuclear poly A– and poly A+ transcripts. The full-length structures of many TUFs have been determined by employing a rapid amplification of cDNA ends (RACE) technique and resolving the RACE products using high-density arrays. These studies indicate that previously considered “junk” genomic regions encode multiple overlapping poly A+ and poly A– coding transcripts and TUFs.

Overview of sites of transcription of cytosolic poly A+ RNAs along 10 human chromosomes.

High-density arrays using 25-mer oligonucleotides spaced every 5 bp on average (i.e. 20-bp overlap) provided a seven-fold increase in interrogation resolution over previous studies (3–6, 14). The consequences of conducting array-based interrogations every 5 bp include: (1) increased likelihood of detecting exons of shorter length; (2) increased statistical confidence in determining whether a region is transcribed; and (3) identification of specific hybridization patterns characteristic of 3´ ends of transcripts (fig. S1).

Five male and three female cell lines were selected as sources for mature (i.e. post-spliced) cytosolic poly A+ RNA. Maps were constructed with the lowest likelihood of signals being derived from cross-hybridization. Transfrag sequences that overlapped with pseudogene sequences or contained low complexity repeat sequences were removed (see supplementary material).

Approximately 9% of 74,180,611 total probe pairs detected transcription per cell line and per chromosome.
Average positive probe percentiles for individual chromosomes ranged from 7.1% [chromosome (chr) 13] to
14.6% (chr 19) (table S1A). This number increases to 16.5% for a cumulative map, referred to as “1 of 8 map”, in which a positive probe need appear in at least 1 of 8 cell lines. This is consistent with our previous results from chromosomes 21 and 22 (3). The average number of transfrags found per cell line and per chromosome was observed to be 16,864 (table S1B). The average and median lengths of observed transfrags were 115 and 78 nucleotides, respectively (table S1C). The number of transfrags increases to 31,443 in the 1 of 8 map,
yet the average length of a transfrag, 124 bp, remains approximately the same.

On average, 18,694,360 nucleotides (4.9% of interrogated genomic nucleotides) are transcribed as cytosolic poly A+ RNA derived from 10 chromosomes of eight cell lines. In the 1 of 8 map, the number of transcribed cytosolic poly A+ nucleotides increases to 38,656,627 (10.1%). The 2.1-fold difference (4.9% vs. 10.1%) indicates that a considerable proportion of the detected transcription is cell line specific. This observation is consistent with earlier findings (4).

Correlation of detected sites of transcription with current genome-wide annotations.

Maps created using poly A+ cytosolic RNA from 8 cell lines were compared to annotations from the UCSC genome browser database (15, 16). 56.7% of the detected cytosolic poly A+ sequences from the 1 of 8 map do not overlap with any well-characterized exon, mRNA, or EST annotation (Fig. 1).

Fig. 1. The correlation of detected transcription in 1 of 8 cell lines to annotations along each of the 10 chromosomes

Fig. 1. The correlation of detected transcription in 1 of 8 cell lines to annotations along each of the 10 chromosomes is shown for each chromosome individually and as a collective of all chromosomes. The detected transcription was determined using poly A+ cytosolic RNA from each of the 8 cell lines. The annotations used in this correlation are defined in the Supplementary Material section. The pattern code used in the central pie chart is utilized in all other pie charts.



With the exception of chromosomes 13, 19 and Y, the remaining seven chromosomes have similar proportions of assignment of transcribed nucleotides to the unannotated category. Annotation dense chromosome 19 has a lower proportion of detected unannotated (33.4%) transcripts, with the smallest amount of detected transcribed unannotated sequences in the intergenic regions (13%). Chromosomes 13 and Y have higher proportions (69.5% and 80.3%, respectively) of unannotated transcribed sequences, most of which originate from intergenic regions. In general, the genomic positions of transcribed regions track with gene and exon densities for
each of the chromosomes (fig. S2). With the exception of chromosomes Y and 19, 56.8% to 77.7% of detected
transcribed regions (1 of 8 composite analysis) are derived from within genes (exons and introns). On average, 31.8% of detected transcription originates from unannotated intergenic regions. The remainder of the 42% to 49% of the observed unannotated cytosolic poly A+ transcription (26%) is derived from the intronic regions within genes (Fig. 1).

Estimates of the fraction of positive probes interrogating well-characterized exons reveal a tendency towards a bimodal distribution with all probes being “on” (>90%) or “off” (<10%) (fig. S3). Approximately 68% of well-characterized exons (58,984) fall within one of the two peaks. Exons with partial positive probe coverage (i.e. >10% and <90%) may represent alternative exon structures compared to those described in the current annotation collections. Alternatively, the lack of correspondence between some of the exon annotations and detected transfrags may be attributed to inaccuracies in transfrag generation, sequencing errors or misassembly of the human genome (17). Fig. S4 estimates the degree of cell line specific transcription by plotting the percentage of total nucleotides within known or novel transfrags against the number of cell lines expressing that transfrag. Two dominant populations of transfrags emerge: those expressed in 1 or 2 cell lines and those expressed in all cell lines.

Characterization and structure of transcripts containing unannotated transfrags.

A combination of rapid amplification of cDNA ends (RACE), high-density arrays and cloning/sequencing techniques was used to characterize transcripts containing unannotated transfrags (see supplementary materials and fig. S5). Of 768 randomly selected unannotated transfrags, 634 (82.6%) yielded a set of
5´ and/or 3´ RACE products (table S2). 438 (57.0%) of the 768 regions yielded successful 5´ and 3´ RACE products on at least one genomic strand, and 467 (60.8%) show evidence of transcription on both strands. Thus, approximately 61% of surveyed loci show evidence of overlapping transcription on the positive and negative strands of the genome.

Among the 438 transfrags where 5´ and 3´ RACE were successful, 86 reside in intergenic regions, 145 reside in
intronic regions and 207 are adjacent to exons on either strand. To better understand the structure of putative novel transcripts found by RACE, 661 strand-specific RACE groups derived from the 438 index transfrags were analyzed against annotations of known genes and ESTs (Fig. 2 and supplementary material).

Fig. 2. A hierarchical tree describing the relationship among the RACE/array profiles

Fig. 2. A hierarchical tree describing the relationship among the RACE/array profiles derived from unannotated, array-detected regions and the annotations. A combined 5´ and 3´ RACE profile from each strand was treated as a separate RACE group. A RACE/array profile for each group is scored “intergenic” if it never overlaps the bounds of a known gene or any other annotation including an EST; “overlapping” if it overlaps any annotation or “intronic” if it is confined to the bounds of a known gene, but does not overlap an annotation
(see Supplementary Material). “Paired” is defined as overlapping transcripts based on the RACE/array analysis. An “overlapping” group is further classified into “isoform” or “nonisoform” based on the precision of RACE/array alignment with the exon-intron boundaries of known genes. If a RACE/array profile resembles at least one annotated exon boundary, it is considered an isoform of a known gene.


Of these, 547 RACE groups contain transfrags which overlap annotations on the sense or antisense
strand, 51 groups reside entirely within the intergenic regions, and 63 groups reside entirely within introns of known genes on the sense or antisense strand. Of the 547 RACE groups overlapping annotations, 118 groups contain transfrags which are nearly identical to annotated exons and, thus, are potential novel isoforms of known genes. The other 429 RACE groups contain transfrags which partially intersect annotations,
representing transcripts which overlap with the well-characterized coding transcripts on either the sense or
antisense strand. For RACE groups which overlap exons or reside within introns, approximately equivalent numbers were found to be sense or antisense to annotations. Overall, 44% of detected RACE group transcripts are paired with at least one transcript present on the opposite strand. These results, combined with the entire RACE analyses, provide a consistent picture of overlapping transcription in the human
genome (fig. S6).

RT-PCR reactions were conducted on 250 (57%) of the 438 genomic loci which produced 5´ and 3´ RACE products from at least one genomic strand. A total of 217 (87%) regions yielded successful RT-PCR products, and 178 cDNA clones were isolated from 107 of the 217 regions. An example of one intergenic unannotated TUF is depicted in fig. S7.

The average length of the isolated transcripts is 680 nucleotides (range 173 to 4650 nucleotides) which are
distributed over a range of 173 to 115,020 nucleotides in the genome. 114 of the 178 cloned transcripts are spliced (64%), with an average of 3.2 exons per transcript and an average exon length of 238 nucleotides. Of the 178 characterized transcripts, 65% have a coding capacity of less than 100 amino acids.

54% of the spliced transcripts use canonical splice sites (GT/AG). Many of the noncanonical splice sites are
previously characterized alternative splicing signals. However, a total of 26 (14.6%) of the spliced cDNAs were
obtained from antisense transcripts which are exact reverse complements of sense transcripts. Fig. S8 illustrates three transcribed regions with mirror complementary sense and antisense transcripts pairings. Such complementary transcript pairs have been observed for well-characterized coding genes (18).

Poly A+ and A– transcripts and their distribution in the nucleus and cytosol.

From the HepG2 cells, 58,874,113 (15.4%) nonredundant nucleotides were detected as transcribed and stable poly A+, poly A– or bimorphic RNAs isolated from nuclear or cytosolic compartments. All analyses describing the proportions of poly A+, A– and bimorphic transcribed sequences are based on this figure (58,874,113 bp) as a denominator, unless otherwise described. This percentage (15.4%) is nearly an order of magnitude greater than expected from the annotated exons and gene prediction. Fig. S9A illustrates the overlapping and nonoverlapping relationships among the four RNA samples. The comparison of the four RNA samples results in 15 such relationships, of which 6 represent exclusive nuclear or cytoplasmic and poly A+ and poly A– groupings (Table 1A and fig. S9).

Table 1. Number and percent of transcribed nucleotides detected in nucleus and cytoplasm as Poly A+ and A– RNA.

The number of transcribed nucleotides and percent of the total transcribed sequence of the nonrepeat sequences of 10 chromosomes is shown for each of the unique and overlapping poly A+ and A– categories of the relationships (Table 1A). Several of the overlapping relationships (2, 3, 6, 7, 8, 11, 12, 14, and 15) signify that the same detected sequences appear to be bimorphic with respect to the presence of poly A+ and A– sequences (Table 1B). The detected transcribed nucleotides present in poly A+ RNA samples (Fig. S9B) and poly A– samples (fig. S9C) and the transfrag sequences found exclusively in the nucleus (fig. S9D) and cytosol (fig. S9E) reveal several characteristics of the composition and compartmentalization of the human transcriptome.

(i) Overall, there is approximately 2.2-fold more uniquely poly A– (43.7%) transcribed sequences than uniquely poly A+ (19.4%). Thus, 63.1% of the detected transcribed nucleotides are uniquely poly A+ or A– (Table 1B) with 36.9% comprising the bimorphic class of transcripts.

(ii) A large proportion of the sequences found in the nuclear and cytosolic compartments appear to be exclusive to these compartments. The amount of poly A+ sequences (9.7%) exclusively detected in the nucleus is almost three-fold lower than the amount of poly A– (31.0%) sequences (Table 1A). Bimorphic detected sequences found exclusively in the nucleus amount to 10.6%. Approximately 25% and 34% of poly A+ nuclear sequences (9.7%) are associated with well-annotated exons and introns, respectively (Fig. 3).

Fig. 3. Distribution of poly A+ and poly A– transcription in the nucleus and cytosol

Fig. 3. Distribution of poly A+ and poly A– transcription in the nucleus and cytosol with respect to genome annotations. A four circle Venn diagram representing proportions of transcribed base pairs in cytosolic poly A+ (cyan), cytosolic poly A– (black), nuclear poly A+ (red) and nuclear poly A– (dark blue). Numbers indicate percent total transcription detected in each unique compartment (fig. S9, Table 1). Pie charts illustrate the distribution of transcribed base pairs detected in each indicated unique compartment among various classes of annotations. The annotations used in this correlation are described in the Supplementary Material
section.


The remaining 41% are associated with unannotated intergenic regions of the genome. In total, 75% of the exclusively nuclear detected poly A+ sequences (9.7%) are unannotated. Similarly, 18% and 57% of poly A– exclusive nuclear sequences (31.0%) are associated with well-characterized exons and introns, respectively, while the remaining 25% are located in unannotated intergenic regions of the genome. These data indicate that 82.0% of the exclusive nuclear poly A– sequences are unannotated.

Poly A+ (3.1%) sequences exclusively detected in the cytosol are almost two-fold less abundant than detected poly A– (6.5%) sequences (Table 1A). Bimorphic detected sequences found exclusively in the cytosol amount to 0.6%. Approximately 43% and 22% of poly A+ cytosolic sequences (3.1%) are associated with well-annotated exons and introns, respectively (Fig. 3). The remaining 34% are associated with unannotated intergenic regions of the genome. In total, 56% of the exclusively cytosolic detected poly A+ sequences (3.1%) are unannotated. We find 16% and 36% of poly A– exclusive cytosolic sequences (6.5%) are associated with well-characterized exons and introns, respectively, while the remaining 48% are located in unannotated intergenic regions of the genome. A total of 84.0% of the exclusive cytosolic poly A– sequences are unannotated.

(iii) A comparison of exclusively nuclear or cytosolic transcribed nucleotides shows a five-fold difference in
sequence complexity detected in the nucleus (51.3%) compared to the cytoplasm (10.3%) of HepG2 cells (fig. S9, D and E). Such a difference is not unexpected given the enrichment of transcribed intron sequences that appear to remain in the nucleus. The Warner (syndrome) helicase interacting protein 1 (WHIP1) (19) on chromosome 6 illustrates how the intronic and exonic sequences of this gene are enriched in the nucleus and cytoplasm, respectively (fig. S10A). Transcribed intronic sequences, however, are not always found to be enriched in the nucleus. Serine (or cysteine) proteinase inhibitor, clade D (heparin cofactor) (20), member 1 on chromosome 22 is an example of a gene in which the intronic transcription detected in intron 4 is enriched in the cytosol although other intron sequences for this gene are enriched in the nucleus (fig. S10B).

On a chromosomal scale, maps identifying the locations of transcription of poly A+ and A– transcripts found in the nucleus and cytosol provide a set of interesting contrasts (Fig. 4).

Fig. 4. Density distributions of transcription observed in cytosolic poly A+, poly A–, nuclear poly A+ and poly A–RNA fractions

Fig. 4. Density distributions of transcription observed in cytosolic poly A+, poly A–, nuclear poly A+ and poly A–RNA fractions in the 10 human chromosomes: (A) chromosomes 6, 7, 13, 14; (B) chromosomes 19, 20, 21, 22, X, Y. The fraction of base pairs found in transfrags are calculated every 6 kb in an overlapping 60-kb window for cytosolic poly A+ (red), poly A– (blue), nuclear poly A+ (mauve) and poly A– RNA (green) and plotted for 10 human chromosomes alongside the base pair density of exons (black) from Ref Seq, UCSC Known Genes and GenBank mRNAs. The densities of cytosolic poly A+ versus cytosolic poly A– and nuclear poly A+ versus nuclear poly A– are compared in the top panels and the densities of cytosolic poly A+ versus
nuclear poly A+ and cytosolic poly A– versus nuclear poly A– are compared in the bottom panels for each chromosome.

Fig. 4B:



Paired density plots were computed using a 60-kb sliding window for: (1) nuclear poly A+ and A– versus cytosolic poly A+ and A– transcribed regions, (2) cytosolic and nuclear poly A+ versus cytosolic and nuclear poly A– and (3) well-characterized annotations. These maps provide two overriding impressions. First, at the resolution of 60,000 bp, the density of the synthesis sites of poly A+ and A– transcripts found in the nucleus and cytosol generally reflects the annotation density for each of the ten chromosomes. However, several annotation dense regions on chromosomes 6, 7, 13 and 21 appear to be more sparsely transcribed in the HepG2 cell line. Second, the annotation densities and detected transcribed regions differ in many positions along each chromosome, which indicates that additional regions of transcription are observed in annotation dense locations. The chromosomal map positions for poly A+ and A– designated transfrags are available in table S3.

(iv) The exclusively poly A– and a portion of the bimorphic transcripts found in the nucleus and cytosol would
most likely not be identified using customary cDNA cloning approaches. Interestingly, almost half of the exclusive cytosolic poly A– detected transcripts (6.5% of the total detected) and a quarter of the exclusive nuclear poly A–transcripts (31.0% of the total detected) appear to be derived from intergenic regions of the genome. Thus, intergenic noncoding regions of the genome are a rich source of transcripts which are predominantly unannotated and underrepresented in our understanding of the composition of the transcriptome. Evaluation of the protein coding potential of poly A– transcribed sequences awaits efficient methods to
copy and clone these types of transcripts.

Conclusions.

Recent empirical experiments have provided consistent evidence that a larger percentage of the human,
mouse, fly and Arabidopsis genomes are being transcribed than can be accounted for by the current state of genome annotations. These observations were first described in tiling array-based studies which searched large parts or entire genomes for sites of transcription (3, 4, 6, 21–23) and then by approaches aimed at isolation and characterization of full-length cDNAs (24–28) and of shorter cDNAs (ESTs and SAGE tags) (18, 29, 30). These studies used primary tissues and cell lines as RNA sources. Collectively, these empirical and computational observations point to several underappreciated characteristics of the human transcriptome:

(i) The human transcriptome is composed of an interlaced network of overlapping transcripts.

The use of arrays in combination with 5´ and 3´ RACE reactions indicate that transcripts encoded on both strands often utilize the same sequences. Such overlapping transcription is observed in almost 50% of the investigated cases (Fig. 2 and figs. S6 and S8). This estimate we believe to be an under representation. Striking examples of this class are pairs of complementary RNA molecules that appear to utilize both canonical (GT/AG) and complementary to canonical signals (CT/AC) at their splice junctions. The possibility that cDNA clones derived from the complementary transcripts came only from the sense strand was shown to be unlikely, since subsequent strand specific RT-PCR reactions have produced cDNA products of expected lengths from the noncoding strand (data not shown).

The existence of such complementary transcripts raises the question of how such transcripts are produced. One
possibility is that the human cell splicing machinery uses complementary sequences as alternative signals. This seems unlikely given the extent to which the consensus signals are missing from the same transcripts. A second possibility is that these transcripts are cRNA copies synthesized by an RNA-dependent RNA polymerase (RdRP). Such activity has been associated with the synthesis of siRNAs which act as trans-acting regulatory molecules in Arabidopsis and C. elegans (31, 32). Thus, this second possibility predicts that RdRP activities are likely to be found in human cells.

A second implication of the extensive transcription observed in unannotated genomic regions relates to the
genotype-phenotype correlations. Such correlation experiments will require extensive analysis of the
transcriptional activity of regions mapped as possible loci for genetic mutations.

(ii) Poly A– RNAs potentially comprise almost half of the human transcriptome.

A variety of radiolabeling and sequence complexity studies have indicated that, in addition to histone mRNA transcripts, a large class of poly A–transcripts exists in human cells (11–13, 33, 34). The majority of studies using in vitro translation approaches, however, have not supported the idea of a separate set of protein
products derived from the poly A– RNA fraction (35). Our results indicate that transcribed sequences exclusively associated with poly A– transcripts are two-fold larger than sequences transcribed exclusively as poly A+.

Of the exclusive poly A– sequences found in nuclear and cytosolic compartments (43.7% of all transcription), more than half of nuclear poly A– sequences are derived from intronic regions (Table 1). Clearly, some of these poly A– sequences are introns of spliced coding gene transcripts and may or may not have further biological function once removed from the primary transcripts. However, in the cytosol, the amount of exclusively poly A– sequences is still two-fold greater than poly A+ sequences (Table 1A) indicating that there are processed mature poly A– transcripts.

Finally, a total of 36.9% of transcribed sequences are detected as poly A– and poly A+ (Table 1B). These
bimorphic sequences are distributed between the two subcellular compartments. It is important to note that detected bimorphic transcribed sequences may be two different transcripts, since transfrags do not identify the strand or specific full-length transcript. However, the presence of such a large proportion of bimorphic transcribed sequences suggests that novel regulatory mechanisms may be involved in the identification of transcripts whose polyadenylation states are altered as a means of regulation. Many of the detected bimorphic sequences are well-characterized coding genes found on the 10 analyzed chromosomes (table S3).

The observations derived from these studies provide some pause as to the state of our understanding concerning where and how the information from the human genome is organized. Many of these and other published observations indicate that our current understanding of the repertoire of transcripts made by the human genome is still evolving. A critical question which applies to both poly A– and poly A+ TUFs centers on the biological functions of these transcripts. Biochemical and genetic experimentatal approaches are
currently being utilized to answer this question. Until these experiments are completed, systematic identification, mapping and characterization of as many types of TUFs as possible will assist in understanding and appreciating the complexity of the human transcriptome.

References and Notes

1. E. S. Lander et al., Nature 409, 860 (2001).

2. J. C. Venter et al., Science 291, 1304 (2001).

3. P. Kapranov, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SPA, and Gingeras TR, "Large-Scale Transcriptional Activity in Chromosomes 21 and 22"., Science 296, 916 (2002).


4. D. Kampa, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G, Tammana H and Gingeras TR, "Novel RNAs Identified From an In-Depth Analysis of the Transcriptome of Human Chromosomes 21 and 22"., Genome Res. 14, 331 (2004).


5. S. Cawley, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, Wheeler R, Wong B, Drenkow J, Yamanaka M, Patel S, Brubaker S, Tammana H, Helt G, Struhl K, and Gingeras TR, "Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs"., Cell 116, 499 (2004).


6. J. L. Rinn et al., Genes Dev. 17, 529 (2003).

7. R. Yelin et al., Nat. Biotechnol. 21, 379 (2003).

8. R. Martone et al., Proc. Natl. Acad. Sci. U.S.A. 100, 12247 (2003).

9. I. H. G. Consortium, Science 306, 636 (2004).

10. M. L. Birnstiel, M. Busslinger, K. Strub, Cell 41, 349 (1985).

11. C. Milcarek, R. Price, S. Penman, Cell 3, 1 (1974).

12. M. Salditt-Georgieff, M. M. Harpold, M. C. Wilsone, J. E. Darnell, Mol. Cell Biol. 1, 177 (1981).

13. P. K. Katinakis, A. Slater, R. H. Burdon, FEBS Letters 116, 1 (1980).

14. P. Bertone et al., Science 306, 2242 (2004).

15. D. Karolchik et al., Nucleic Acids Res. 31, 51 (2003).

16. W. J. Kent et al., Genome Res. 12, 996 (2002).

17. I. H. G. S. Consortium, Nature 431, 931 (2004).

18. J. Chen et al., Proc. Natl. Acad. Sci. U.S.A. 99, 12257 (2002).

19. Y. Kawabe et al., J. Biol. Chem. 276, 20364 (2001).

20. R. C. Inhorn, D. M. Tollefsen, Biochem. Biophys. Res. Commun. 137, 431 (1986).

21. D. D. Shoemaker et al., Nature 409, 922 (2001).

22. K. Yamada et al., Science 302, 842 (2003).

23. V. Stolc et al., Science 306, 655 (2004).

24. Y. Okazaki et al., Nature 420, 563 (2002).

25. T. Ota et al., Nat. Genet. 36, 40 (2004).

26. Genome Res. 14, 2121 (2004).

27. T. Imanishi et al., PLoS Biol. 2, 856 (2004).

28. M. Seki et al., J. Exp. Bot. 55, 213 (2004).

29. S. Saha et al., Nat. Biotechnol. 20, 508 (2002).

30. H. Bono et al., Genome Res. 13, 1318 (2003).

31. A. Peragine, M. Yoshikawa, G. Wu, H. L. Albrecht, R. S. Poethig, Genes Dev. 18, 2368 (2004).

32. F. Vazquez et al., Mol. Cell 16, 69 (2004).

33. M. Edmonds, M. G. Caramela, J. Biol. Chem. 244, 1314 (1969).

34. B. J. Snider, M. Morrison-Bogorad, Brain Res. Brain Res. Rev. 17, 263 (1992).

35. T. E. Geoghegan, G. E. Sonenshein, Brawerman, Biochemistry 17, 4200 (1978).

36. The authors would like to thank S. Cawley, C. Schaefer, and J. Manak for helpful discussions, M. Mittmann, and D. Le for design of photolithographic masks, D. Bartell for software, R. Wheeler for assistance on annotation database, H. Caley, H. Gorrell, and B. Wong for database support, J. Stevens for administrative support, and K. Kong for manuscript editing and management assistance.

All sequenced transcripts have been submitted to GenBank (accession numbers: AY927468-AY927642).

The supplemental materials, CEL file, graph file, transfrag, and RACE data are available at:
http://cgap.nci.nih.gov/Info/2005.1


Visual representations of the graph and transfrag data are available at:
http://genome.ucsc.edu/cgi-bin/hgTracks.

This project has been funded in part with federal funds from the National
Cancer Institute, National Institutes of Health, under
Contract No. N01-C0-12400, and by Affymetrix, Inc. The
content of this publication does not necessarily reflect the
views or policies of the Department of Health and Human
Services, nor does mention of trade names, commercial
products, or organization imply endorsement by the U.S.
government.

Supporting Online Material
http://www.sciencemag.org/cgi/content/full/1108625/DC1

Materials and Methods
Figs. S1 to S10
Tables S1 and S2
References

13 December 2004; accepted 15 March 2005
Published online 24 March 2005; 10.1126/science.1108625
Include this information when citing this paper.




Additional References:

1. Kiyosawa H, Mis N, Iwase S, Hayashizaki Y,  and Abe K, "Disclosing hidden transcripts: Mouse natural sense-antisense transcripts tend to be poly(A) negative and nuclear localized".

2. Storz G, Altuvia S, and Wassarman KM, "An Abundance of RNA Regulators".

3. Buskirk AR, Landrigan A, and Liu DR, "Engineering a Ligand-Dependent RNA Transcriptional Activator".

4. Love R, "Distribution of Ribonucleic Acid in Tumor Cells during Mitosis", Nature vol. 180, no. 4598, pp. 1338-1339 (December 14, 1957).

5. De Carvalho S, "Effect of RNA from Normal Human Marrow on Leukaemic Marrow In-Vivo", Nature, vol. 197, no. 4872, pp. 1077-1080, (March 16, 1963).


Links to RNA and Biological Causality:



Further Topics in:  Euchromatin,  active DNA, and  RNA  ribo-regulators:

Links to Euchromatin Activator RNA Reviews:
Links to Euchromatin Activator RNA Research:
Links to Ultrastructural Probes of DNase I-Sensitive Sites:
Links to RNA as a Therapeutic Agent:
Links to Hodgkin Lymphoma Immuno-Pathology:
Links to Activated T-Lymphocyte Immunotherapy:
Links to Medical Systems Biology:
Links to Selective Gene Transcription:
Links to RNA-Induced Epigenetics:
Links to RNA-Induced Embryogenesis:
Links to RNA and Biological Causality:
Links to Reprogramming and Neoplasia:

A Brief History of Activator RNA:

"Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA". (PowerPoint Presentation).




Top of Page - Euchromatin Network - Current Research - Forums - Other Sites - Future Events -


For Further Information and Feedback:
E-mail: frenster@euchromatin.net
Phone:  +1 650 367 6483
Fax:  +1 650 364 1773


euchromatin: "the most active portion of the genome within the cell nucleus".