Published in: Nature Genetics vol. 38, no. 6, pp. 608 - 609 (June, 2006)
doi:10.1038/ng0606-608
http://www.nature.com/ng/journal/v38/n6/full/ng0606-608.html

"The multitasking genome".

Thomas R Gingeras

Affymetrix, Inc., 3380 Central Expressway, Santa Clara, California 95051, USA
tom_gingeras@affymetrix.com,


The identification of hundreds of thousands of clusters of transcriptional start sites, many located within internal exons of protein-coding genes, indicates that promoter sites are common and that transcriptional organization is complex. This transcriptional architecture implies that most genomic regions serve multiple functions.

When is an exon not an exon?

Isolating and characterizing the full complement of transcripts made by each differentiated and undifferentiated cell of an organism has been the long-standing goal of both individual laboratories and various consortia. In humans, these efforts have conservatively yielded an estimated 1.5 full-length transcripts, on average, per gene locus. The vast majority of such loci are estimated to code for proteins [1], although data from the Encyclopedia of DNA Elements (ENCODE) project [2] indicates that this is a very conservative estimate (unpublished data). More recently, interest in noncoding transcripts has risen sharply. Several independent groups have noted that there may be as much as an order of magnitude greater transcription than can be accounted for by current protein coding annotations [3, 4, 5]. Most of these newly identified transcripts seem to have highly reduced protein coding potential. More of the genome is now known to be transcribed. On page 626 of this issue, Carninci and colleagues [6] in the FANTOM consortium provide some additional insights into the properties of protein-coding and noncoding transcription.

When it's a promoter

Using data primarily from cap analysis of gene expression (CAGE), but also including extensive cDNA sequencing and paired-end (PET) ditag analysis, the authors report the location and properties of the transcriptional start sites (TSSs) of transcripts derived from 145 mouse libraries and 41 human libraries. They then used clusters of overlapping 20- to 21-nucleotide CAGE tags ('tag clusters') as markers to study the organization of transcriptional units across the genomes and the sequence characteristics and properties of the 5' candidate promoter regions proximal to each tag cluster. A total of 159,075 mouse and 177,563 human tag clusters containing two or more tags were confidently identified and used for these studies.

Locating the TSSs across the human and mouse genomes revealed several striking features of both annotated and previously unannotated transcripts. For example, 67% of the mouse protein-coding transcription units have a tag cluster within 20 nucleotides of the reported 5' end of the full-length cDNA, and, interestingly, most of these genes also have a second smaller tag cluster located at or near the 3' UTR. The authors also found that internal coding exons are often sites for other CAGE tags. These observations might be considered unexpected, but, in fact, they confirm and help clarify previous findings that protein-coding regions of genomes house additional overlapping protein-coding and noncoding transcripts on both strands as well as binding sites for several transcription factors and chromatin modifications found at the 5' ends of transcripts [7, 8, 9, 10]. That said, the observations by Carninci et al. [6] raise several additional questions: for example, how is activation of the promoters within a genic region regulated and coordinated with activation of promoters at the 5' end of the protein-coding transcript? Are both promoters operating at the same time? If not, how is coordination achieved between the two sets of promoters? If they are active in the same cells at the same time, is there any transcriptional interference?

The lay of the land

Arguably, the most striking set of observations made in this manuscript concerns the characteristics of the identified promoter regions. Based on tag clusters with more than 100 CAGE tags, four patterns of tag clustering are observed at the proximal 5' sequences. The patterns in each tag cluster show that in addition to an expected pattern of a single dense cluster of tags, there are three other more genomically distributed patterns of tags in the tag clusters. Two of these resemble the single-cluster class in that they have a dense concentration of one (PB) or multiple (MU) peaks of tags across a wider genomic region. The third category consists of tags within tag clusters that occupy a general broad region without an apparent dense peak of tags (BR). This observation about the patterns of tag clustering led the authors to several astute and interesting correlations about (i) the promoter sequences and the sequences of TSSs that are proximal to each of the tag clusters discussed, (ii) the use of bidirectional promoters and (iii) tissue-specific expression. One of these correlations stands out because of its implications with respect to the large amount of overlapping transcription characteristic of mammalian genic regions. Although it has been reported previously that promoter regions containing CpG islands are associated with bidirectional transcription activity [11], the data from Carninci et al. indicate that, in fact, there seem to be two patterns of bidirectional transcription. In one case, there are two separate tag clusters with no overlap; in the second case, bidirectional transcriptional initiation is characterized by overlapping transcription. In the latter case, there seem to be two promoters firing in opposing directions, thus generating overlapping tag clusters. As bidirectional transcription is not thought to be a rare property of mammalian promoters, the prevalence of overlapping promoters prompts several questions about the regulation of these promoters and the RNAs that are produced. The full-length cDNAs have been isolated from such loci, as indicated by the authors, and as such it would seem that transcriptional interference is somehow mitigated, assuming that both promoters are activated in the same cell. How and why is this strategy of convergent transcription used?

Curiouser and curiouser

Although the transcriptional network described by the authors is complex and, at first blush, daunting, the degree of complexity is likely to be even greater than indicated, for two reasons: first, the 159,075 mouse and 177,563 human tag clusters that were principally studied contain at least two TSS sites at each tag cluster. The authors report that there are hundreds of thousands more single TSSs that have been identified. Such single TSSs are often viewed with lower confidence, but a significant percentage of sampled sites have been validated and also coincide with other independent lines of evidence. Such single-site TSSs are likely to represent low–copy number transcripts or 5' termini that are poorly recovered by the CAGE analysis. Second, as CAGE tags consist of 20–21 nucleotides, there is no inherent information about the structure of the transcripts to which these tags are connected. An important assumption used throughout the study is that the transcript associated with each tag cluster is the nearest annotation. Additional independent lines of evidence [12, 13, 14] indicate that the characteristic of multiple tissue-specific TSSs described by the authors within MU and BR promoters may be extended to an extreme. The use of TSSs to start transcription for genes other than the those most proximal seems to be a not-uncommon strategy for most protein-coding and noncoding genes. The distances of such tissue-specific alternative TSSs to the parent transcript can be greater than tens or hundreds of thousands of base pairs in the genome. This raises the likelihood that a proportion of the mapped tag clusters may serve as start sites for other distal protein-coding and noncoding transcripts, further complicating the transcriptional architecture of mammalian cells.

The data generated by Carninci et al. paint a picture of a mammalian transcriptome that is a highly complex overlapping network of transcription. An important implication is that larger proportions of genome sequences have multiple functional roles. Exons and introns can harbor promoters, introns can contain alternative exons for overlapping transcripts on the same or opposite strand, and intergenic regions are likely to constitute a smaller percentage of the genome. As such, correlating phenotypes to genotypes will require greater understanding of the all of the functional roles a genomic locus may provide (Fig. 1).

Figure 1: Transcript-centric view of a gene. (vs. protein-centric view).

Figure 1: Transcript-centric view of a gene. (vs. protein-centric view).

The conventional (protein-centric) view of the genome is that protein-coding regions of the genome have few elements that serve more than one function (exon, intron, promoter, enhancer regions).

Mutation reference points ( 1, 2, 3, 4) in Red in this text.
In a more transcript-centric view, mutations in distal 5' genomic regions (1) may affect a distal 5' exon or alternative promoter; mutations in the intronic regions (2) may affect exons or promoters of overlapping antisense or same-strand transcripts; and mutations in exonic regions (3) or 3' intergenic regions (4) may affect promoters and exons from antisense transcripts. Importantly, there may be multiple overlapping transcripts within a genic region, thus resulting in many multifunctional genomic regions in a single gene. The phenotypes demonstrated by each of these multifunctional regions may be similar (as shown) or varied.




References:

   1. International Human Genome Sequencing Consortium. Nature 431, 931–945 (2004).

   2. ENCODE Project Consortium. Science 306, 636–640 (2004). http://genome.gov/encode/

   3. Bertone, P. et al. Science 306, 2242–2246 (2004).

   4. Carninci, P. et al. Science 309, 1559–1563 (2005).

   5. Kapranov, P. et al. Science 296, 916–919 (2002).

   6. Carninci, P. et al. "Genome-wide analysis of mammalian promoter architecture and evolution", Nature        Genetics, vol. 38, no. 6, pp. 626-635 (June, 2006). http://www.nature.com/ng/journal/v38/n6/abs/ng1789.html

   7. Bernstein, B.E. et al. Cell 120, 169–181 (2005).

   8. Cawley, S. et al. Cell 116, 499–509 (2004).

   9. Cheng, J. et al. Science 308, 1149–1154 (2005).

  10. Kim, T.H. et al. Nature 436, 876–880 (2005).

  11. Trinklein, N.D. et al. Genome Res. 14, 62–66 (2004).

  12. Akiva, P. et al. Genome Res. 16, 30–36 (2006).

  13. Kapranov, P. et al. Genome Res. 15, 987–997 (2005).

  14. Parra, G. et al. Genome Res. 16, 37–44 (2006).

Additional References:

1. Kioussis D, "Gene Regulation: Kissing Chromosomes",  Nature vol. 435, no. 7042, pp. 579-580 (June 2,  2005). http://www.nature.com/nature/journal/v435/n7042/full/435579a.html

2. Frenster JH, and Hovsepian JA, "Kissing Chromosomes and Paired Sense-Antisense RNA Synthesis".
71st Cold Spring Harbor Symposium on Quantitative Biology", Program page 62, May 31-June 5, 2006.


Links to RNA and Biological Causality:

A Brief History of  Activator RNA:



Further Topics in:  Euchromatin,  active DNA, and  RNA  ribo-regulators:

Links to Euchromatin Activator RNA Reviews:
Links to Euchromatin Activator RNA Research:
Links to Ultrastructural Probes of DNase I-Sensitive Sites:
Links to RNA as a Therapeutic Agent:
Links to Hodgkin Lymphoma Immuno-Pathology:
Links to Activated T-Lymphocyte Immunotherapy:
Links to Medical Systems Biology:
Links to Selective Gene Transcription:
Links to RNA-Induced Epigenetics:
Links to RNA-Induced Embryogenesis:
Links to RNA and Biological Causality:
Links to Reprogramming and Neoplasia:

"Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".



Top of Page - Euchromatin Network - Current Research - Forums - Other Sites - Future Events -

For Further Information and Feedback:
Phone:  +1 650 367 6483
E-mail: frenster@euchromatin.net



euchromatin: "the most active portion of the genome within the cell nucleus".