"Reverse Serial Analysis of Gene Expression (SAGE) Characterization of Orphan SAGE Tags from Human Embryonic Stem Cells Identifies the Presence of Novel Transcripts and Antisense Transcription of Key Pluripotency Genes".
Mark Richards a, Siew-Peng Tan a , Woon-Khiong Chan b , and Ariff Bongso a
a Department of Obstetrics and Gynaecology, National University
of Singapore, National University Hospital, Singapore;
b Department of Biological Sciences, National University
of Singapore, Singapore
Correspondence:
Woon-Khiong Chan, Ph.D., Department of Biological Sciences, National
University of Singapore, 14 Science Drive 4, Singapore 117543. Telephone:
65-6516-8096; Fax: 65-6779-2486;
e-mail: dbscwk@nus.edu.sg
Ariff Bongso, Ph.D., D.Sc., Department of Obstetrics and Gynaecology, National University of Singapore, National University Hospital, Singapore 119074. Telephone: 65-6772-4129; Fax: 65-6779-4753; e-mail: obgbongs@nus.edu.sg
Serial analysis of gene expression (SAGE) is a powerful technique for the analysis of gene expression. A significant portion of SAGE tags, designated as orphan tags, however, cannot be reliably assigned to known transcripts. We used an improved reverse SAGE (rSAGE) strategy to convert human embryonic stem cell (hESC)-specific orphan SAGE tags into longer 3' cDNAs. We show that the systematic analysis of these 3' cDNAs permitted the discovery of hESC-specific novel transcripts and cis-natural antisense transcripts (cis-NATs) and improved the assignment of SAGE tags that resulted from splice variants, insertion/deletion, and single-nucleotide polymorphisms. More importantly, this is the first description of cis-NATs for several key pluripotency markers in hESCs and mouse embryonic stem cells, suggesting that the formation of short interfering RNA could be an important regulatory mechanism. A systematic large-scale analysis of the remaining orphan SAGE tags in the hESC SAGE libraries by rSAGE or other 3' cDNA extension strategies should unravel additional novel transcripts and cis-NATs that are specifically expressed in hESCs. Besides contributing to the complete catalog of human transcripts, many of them should prove to be a valuable resource for the elucidation of the molecular pathways involved in the self-renewal and lineage commitment of hESCs.
Pluripotent human embryonic stem cells (hESC) cell lines are derived from fibroblast feeder layers via the isolation and extended serial propagation of the inner cell mass from supernumerary 5-day-old blastocysts [13]. They have offered much hope by promising to revolutionize the future of regenerative medicine through the provision of novel cell replacement therapies to treat a variety of debilitating diseases, such as myocardial infarcts, diabetes, and Parkinsons disease [4, 5]. The molecular mechanisms controlling pluripotency and self-renewal in hESCs are presently not well understood [6]. Transcriptome profiling studies using DNA microarrays [711], serial analysis of gene expression (SAGE) [12], expressed sequence tag (EST) enumeration [13], and massively parallel signature sequencing (MPSS) [14, 15] have elucidated gene networks and putative signaling pathways that are believed to be essential in the maintenance of the hESC phenotype. Recent studies have implicated that the WNT and transforming growth factor-ß/activin/nodal pathways are involved in the maintenance of pluripotency in hESCs [16, 17]. Transcriptome studies have shown that key components of these two pathways are active or highly expressed in hESCs. In addition, SAGE and other gene expression profiling studies have suggested that stem cells, in particular hESCs, express numerous uncharacterized or novel transcripts, many of which are likely to represent novel genes [1215, 18, 19].
SAGE is a sequence-based transcriptome profiling approach that provides qualitative and quantitative assessment of gene expression [20]. The underlying principle assumes that a short nucleotide sequence, or SAGE tag, located at the last anchoring enzyme (Cmost) site contains sufficient information to represent a specific transcript. Often the NlaIII restriction enzyme is used, and the length of the SAGE tag could range from 14 (SAGE) to 21 (LongSAGE) or 26 base pairs (bp) (SuperSAGE), depending on the tagging enzymes used [2022]. The digital nature of SAGE tags means that cumulative SAGE data can easily be merged, allowing large-scale comparisons between independent libraries. The sequencing of concatemerized SAGE tags also permits a high-throughput determination of the transcriptome compared with EST sequencing. Besides being a robust method that reflects accurately the actual relative levels of mRNA transcripts, SAGE also allows transcripts that are expressed at low levels to be efficiently detected [23, 24]. However, the reliance on short sequence tags for gene identification imposes limitations on the precision and accuracy of gene identification. For instance, a SAGE tag may match multiple mRNA transcripts making gene assignment difficult, although with the advent of LongSAGE and SuperSAGE, this problem has been largely solved. A more daunting problem is that many SAGE tags do not appear to match known mRNA transcripts or genes. In poorly characterized transcriptomes, such as those from hESCs [12] and hematopoietic stem cells [18, 19], such orphan SAGE tags could reach as much as 40%. A recent study has shown that approximately 70% of orphan SAGE tags are indeed derived from bona fide transcripts [24], reinforcing the view that SAGE is indeed a powerful method for novel gene discovery. This suggests that a large number of the orphan SAGE tags that we have uncovered in the hESC transcriptome are true representatives of novel genes, transcripts, or splice variants [12], although the total number of genes present in the human genome is estimated at a conservative 30,00040,000 [25, 26].
Another major source of uncertainty in SAGE tag-to-transcript assignment lies in the widespread presence of single-nucleotide polymorphisms (SNPs) within the human genome; SNPs occur as frequently as once every 100300 bases [27, 28]. Occurrence of SNPs within the SAGE tag sequence or within the tagging restriction enzyme site will result in the assignment of an alternative SAGE tag. In a recent large-scale study of the SAGE database, at least one SNP-associated alternative SAGE tag was observed for 8.6% of all known human genes when the influence of SNPs and small insertion/deletion polymorphisms on SAGE tags was taken into consideration [29]. Indeed, the presence of this class of alternative SAGE tags has led to an underestimation of the expression of certain genes (e.g., GAL) and erroneously identified others (e.g., BTF3) as being specific to hESCs [12].
Naturally occurring antisense transcripts (NATs) have been recently reported in a variety of metazoan species [30, 31], and it is likely that a significant portion of the hESC orphan SAGE tags are derived from NATs. There are two main classes of NATs. The cis-encoded NAT (cis-NAT) is transcribed from the opposite strand of the same genomic locus and has the potential to form long complementary duplex with the sense RNA transcript. In contrast, trans-encoded NAT (trans-NAT) is transcribed from another distinct genomic locus, possibly a pseudogene [31], and is generally short and forms imperfect duplex with its sense transcript. The human genome has been shown to express NATs widely [3234], with as many as 20% of human genes forming sense-antisense (SA) transcript pairs [35]. For instance, hESCs have been reported to express a unique set of microRNAs, which belongs to a class of trans-NAT [36]. A recent large-scale EST project has provided an important resource of full-length cDNAs for hESCs [13]. But like the >5 million ESTs that are available [37], they are difficult to use to verify the expression of NATs because many ESTs have not been directionally cloned [3132]. In contrast, SAGE tags are directionally reliable, as they are generated from well-defined restriction sites at the 3' end of each RNA transcript. Thus, large SAGE datasets contain latent information on both sense and antisense transcription [38]. Interestingly, tags matching mRNAs or ESTs in antisense orientation were first observed in SAGE libraries constructed from Plasmodium falciparum [39, 40].
Without additional sequence information, it is difficult to characterize
orphan SAGE tags from hESCs and identify the transcripts they represent.
Several polymerase chain reaction (PCR)-based strategies have been
developed, including reverse SAGE (rSAGE) [41, 42],
generation of longer cDNA fragments from SAGE tags for gene identification
(GLGI) [43, 44], and rapid analysis of unknown
SAGE-tag-PCR [45]. In this report, we have modified
the original rSAGE protocol [41, 42], which is also
similar to the GLGI [43, 44], and used it to obtain
additional 3' cDNA sequence information for a select group of orphan SAGE
tags that are expressed specifically in hESCs. Our results identified novel
transcripts unique in their expression to hESCs, transcripts that displayed
alternative polyadenylation, and novel splice variants of known genes.
More importantly, we found NATs for several pluripotency genes,
including POU5F1 and NANOG. Collectively, the unique 3' ESTs
derived from orphan hESC SAGE tags (HESTs) will be an important resource
in downstream functional analyses and the concerted dissection of molecular
pathways critical to the pluripotent phenotype of hESCs.
MATERIALS AND METHODS
Culture of hESCs
hESCs (HES3 line, passages 1925; ES Cell International, Singapore, http://www.escellinternational.com) were cultured on a feeder layer of mitomycin-C inactivated mouse embryonic fibroblasts (MEFs) as described previously [2]. HES3 cell colonies were passaged by mechanically cutting small clumps of undifferentiated HES3 (UD-HES3) cells and transferring these fragments to fresh MEF feeders at 78-day intervals [2, 46]. Differentiated HES3 (D-HES3) cells were obtained by prolonged (20-day) high-density culture on MEFs [12].
Total RNA Isolation
Total RNA was extracted from hESCs using TRIZOL (Invitrogen, Carlsbad, CA, http://www.invitrogen.com), whereas total RNA from the various somatic and fetal tissues were obtained commercially (Clontech, Palo Alto, CA, http://www.clontech.com). Prior to rSAGE library construction or reverse transcription (RT)-PCR, total RNA was treated with DNase I (Ambion, Austin, TX, http://www.ambion.com) to remove any residual genomic DNA contamination, and PCR using ß-actin primers (forward, 5'-GATGCAGAAGGAGATCACTGC-3'; reverse, 5'-CACCTTCACCGTTCCAGTTT-3'), designed to span the last intron-exon boundary of the gene, was carried out to confirm the absence of genomic DNA.
cDNA Synthesis, NlaIII Digestion, and Linker Ligation
A schematic for the rSAGE library construction with all primer and
linker sequences is depicted in Figure 1. cDNA synthesis
was carried out using the Superscript II double-stranded cDNA synthesis
kit (Invitrogen) with 10 µg of total RNA from HES3 cells and a biotinylated
primer was used (5'-biotin-ATTGGCGCGCCGCGAGCACTGAGTCAATACGAT30VN- 3'; Integrated
DNA Technologies, Coralville, IA, http://www.idtdna.com).
Double-stranded cDNA was digested with NlaIII (New England Biolabs, Ipswich,
MA, http://www.neb.com) to generate 3'
overhangs. The biotinylated cDNAs were immobilized on streptavidin-magnetic
beads (Invitrogen). Annealed linkers, A1 (5'-AAGCAGTGGTATCAACGCAGAGTCATG-3')
and A2 (5'-phosphate-ACTCTGCGTT-GATAC-CACGCTT-aminoC7-3') were ligated
to the 5' end of NlaIII-digested cDNA before AscI (New England Biolabs)
digestion was performed to release the 3' cDNA fragments from the streptavidin-magnetic
beads.
Figure 1. Schematic diagram of the modified rSAGE protocol.
Figure 1. Schematic diagram of the modified rSAGE protocol.
Briefly, mRNA was isolated, and cDNA synthesis was performed with an anchored biotin-labeled RT primer. cDNAs were digested with NlaIII to reduce complexity of the library. An rSAGE linker was next ligated to cleaved 3' cDNAs bound to streptavidin beads, following which AscI digestion was performed to release the cDNAs. rSAGE library scale-up amplification was performed with the rSAGEF1 and rSAGER1 primers. An aliquot of the amplified rSAGE library was used in rSAGE amplifications with a serial analysis of gene expression tag-specific primer and the common Rev1 reverse primer. Abbreviations: HEST, human embryonic stem cell serial analysis of gene expression tag; PCR, polymerase chain reaction; rSAGE, reverse serial analysis of gene expression; RT, reverse transcription; TSP, tag-specific primer.
PCR Scale-Up of rSAGE Library
Amplification of the primary rSAGE library was performed with 1 µl of the NlaIII-digested cDNAs, 5 U of Platinum Taq Polymerase (Invitrogen), rSAGEF1 (5'-AAGCAGT-GGTAT-CAACGCAGAGT-3') and rSAGER1 (5'-GCGAGCACT-GAGTCAATACGC-3') primers (350 ng each). After an initial denaturation at 94°C for 2 minutes, PCR was carried out for 25 cycles at 94°C for 45 seconds, 57°C for 1 minute, and 72°C for 1 minute, with a final extension at 72°C for 5 minutes.
Selection of Orphan SAGE Tags and Design of Tag-Specific rSAGE Primers
The 200 orphan SAGE tags selected for rSAGE were identified through a pairwise comparison of HES3 SAGE data against pooled data from 21 human SAGE libraries [12]. The SAGE tag-to-gene database used for gene identification was based on UniGene Build 160 (http://www.ncbi.nih.gov/SAGE/). The majority of the orphan SAGE tags selected were upregulated in HES3 compared with the pooled human SAGE libraries (p < .001; fold difference >4). A table describing the SAGE tags, sequences of the SAGE tag-specific rSAGE primers (TSRPs), and their respective frequencies in tags per million (tpm) in the pooled human, HES3 and HES4 SAGE libraries, is provided as supplemental online Table 1. For those HES SAGE tags where LongSAGE tags were available, which were obtained through comparison with a HES3 LongSAGE library, the TSRPs were designed using the Primer3 software (http://frodo.wi.mit.edu) [46]. Typically, they included the entire 21 bases of the Long-SAGE tag or they included additional four to eight bases of the common linker (CGCAGAGT) and up to 19 bp of the Long-SAGE tag. If no appropriate LongSAGE tag was available (Tag IDs 177), the TSRPs were designed with seven bases of the common linker sequence (GCAGAGT) and the entire 14 bases of the SAGE tag, with the exception of Tag IDs 30 and 72.
rSAGE Amplification Reaction and Characterization of 3' rSAGE Fragments
Touchdown PCRs were performed using an initial denaturation cycle at 94°C for 2 minutes, followed by four cycles at 94°C for 45 seconds, 63°C for 1 minute, and 72°C for 1 minute; four cycles at 94°C for 45 seconds, 60°C for 1 minute, and 72°C for 1 minute; 25 cycles at 94°C for 45 seconds, 58°C for 1 minute, and 72°C for 1 minute; and a final extension step at 72°C for 5 minutes. The reaction setup for rSAGE PCR was as follows: 1 µl of amplified rSAGE library, 1 U of Platinum Taq Polymerase, 350 ng of TSRP and rSAGER1 primer. The PCR products were run on 1.2% TAE agarose gel, and the bands were excised and purified using QIAquick Gel Extraction Kit (Qiagen, Valencia, CA, http://www.qiagen.com). Purified PCR products (24 µl) were ligated into the pGEM-T Easy Vector (0.5 µl) (Promega, Madison, WI, http://www.promega.com) using T4 DNA ligase. The ligation reaction was incubated overnight at 16°C and resuspended in 8 µl of sterile water. Electroporation was performed using 1 µl of the ligated products and 25 ml of pTOP10 cells (Invitrogen). The transformants were plated on selective media, and two to four clones were picked for each rSAGE PCR product. Plasmid DNA was extracted using QIA-prep Spin Miniprep Kit (Qiagen). Sequencing reactions were carried with Big Dye v3.1 (Applied BioSystems, Foster City, CA, http://www.appliedbiosystems.com) and M13 Forward primer. The sequenced products were analyzed on an ABI 3100 DNA Sequencer (Applied BioSystems).
Sequence Analysis and Identification of Genuine rSAGE PCR Products
A bona fide 3' rSAGE product was defined as possessing the entire SAGE tag sequence, the rSAGER1 primer sequence and a poly(A) tract of >10 adenine residues. Sequences that lacked any one of the three were considered nonspecific amplification artifacts and omitted from further analysis. The rSAGE 3' EST sequences were searched against the GenBank Database (NR, dbEST, and human genome) using BLASTN (http://www.ncbi.nlm.nih.gov/BLAST/), the University of California Santa Cruz human genome browser database (May 2004 build) using the BLAT program (http://genome.ucsc.edu/cgi-bin/hgBlat) and the EMBL database using a web interface-based batch BLAST program (http://biomedicum.csc.fi:8010/cgi-bin/batchblast.cgi) [20].
An rSAGE sequence was classified as novel if no matches to a transcript sequence (known gene, mRNA, or EST) were found. A sequence was considered to represent a known gene if it matched a full-length transcript sequence with >95% similarity in the same orientation. A sequence was classified as known EST if it matched an EST or open reading frame (ORF) with >95% similarity in the same orientation. A sequence was classified as an SNP alternative tag if it contained a single-bp mismatch within the SAGE tag sequence or NlaIII site. A sequence was classified as an insertion/deletion if it contained an insertion or deletion of fewer than three nucleotides within the SAGE tag sequence. A sequence was classified as an anti-sense transcript if it matched with high similarity to known transcripts in the opposite orientation. A sequence was classified as poly(A) if it was near the end of the poly(A) tract. Finally, a sequence was considered an alternative isoform if it matched the middle of known full-length transcripts in the same orientation and contained a poly(A) track immediately downstream of the matched region. Genomic coordinates of the 3' SAGE ESTs were annotated based on the University of California Santa Cruz genome browser annotation database (http://genome.ucsc.edu/).
RT-PCR Confirmation of Novel 3' cDNAs
First-strand synthesis was performed using the SuperScript first-strand synthesis system (Invitrogen). One µl of first-strand reaction was used for each PCR together with 50 pmol of forward and reverse primers. Initial denaturation was carried out at 94°C for 2 minutes, followed by 30 cycles of PCR (94°C for 30 seconds, 55°C for 30 seconds, 72°C for 1 minute), and a final extension cycle at 72°C for 5 minutes. PCRs were loaded on a 1.5% agarose gel and size fractionated. In instances where the 3' cDNA sequence obtained was short and no suitable primer pairs could be found, additional 5' genomic sequences were used to anchor the forward primers. In all cases, the reverse primer primed from the rSAGE 3' cDNA sequence. Primers used were as follows. ACTB: product 400 bp, 5'-TGGCACCACACCTTTCTACAAT-GAGC-3', 5'-GCACAGCTTCTCCTTAATGTCACGC-3'; POU5F1: product 247 bp, 5'-CGRGAAGCTG GAGAAG-GAGAAGCTG-3', 5'-CAAGGGCCGCAGCTTACACAT-GTTC-3'; HEST97: product 160 bp, 5'-CCTTTGTCATGAGC-CCTTGT-3', 5'-GGAATGAAAGAATGGTTG CTC-3'; HEST101: product 119 bp, 5'-AAGAGCCTGCTACG-GAACTG-3', 5'-TCACTAGAGGTTTCCAACACACTT-3'; HEST120: product 159 bp, 5'-AAATTTGGTGCTGTGAC TCG-3', 5'-GCGGGCTGAGTCGGATTT-3'; HEST123: product 200 bp, 5'-GGGTTATGT GTAGAAACCAAGTGA-3', 5'-TCTTAGAACTTATGATACACCCAGTTG-3'; HEST127: product 218 bp, 5'-GGGAAAAGATGGCAAGGTTA-3', 5'-AATATATTCGAGTCACATCA TGACA-3'; HEST146: product 171 bp, 5'GATGCCATCACTCAAACTAGACC-3', 5'-GACGTCCTATGCAGGCATTT-3'; HEST147: product 205 bp, 5'GGGGATTCGAGGTTC CTGTA-3', 5'-CATTTCAAG-GCACAATTTTAATAGC-3'; HEST149: product 196 bp, 5'-CCCAGGCTGAAGTGTAGTGA-3', 5'-CATTTACAATGGTA-CAAGGAGCA-3'. The universal reference RNA sample was obtained from Stratagene (La Jolla, CA, http://www.stratagene.com), and somatic tissue RNA samples were obtained from Clontech.
Orientation-Specific RT-PCR
To detect the NATs for POU5F1, NANOG, LIN28, TALE, TERF1, and TERA,
orientation-specific first-strand cDNA synthesis was carried with the appropriate
sense primers. Thereafter, Superscript II RT was heat-inactivated at 95°C
for 15 minutes. PCR was performed with 3 µl of the 20-µl first
strand mix as described. Control experiments without reverse transcription
(RT controls) for each of the three antisense primers were performed to
detect genomic DNA contamination. The primers used were as follows. POU5F1
NAT: product 184 bp, 5'-AGTTTGTGCCAGGGTTTTTG-3', 5'-TGTGTCCCAG-GCTTCTTTATTT-3';
NANOG NAT: product 278 bp, 5'-TCGGTATTGTTTGGGATTGG-3', 5'-TCATCGAAAC-ACTCGGTGAA-3';
LIN28 NAT: product 178 bp, 5'-GGAGGCCAAGAAAGGGAATA-3', 5'-CCGCCCCATA-AATT
CAAGAT-3'; TALE NAT: product 80 bp, 5'-TTTTCA-GACTGTGCAATA CTTAGAGAA-3',
5'-TTAGACAG-TATGTGGGCATCC-3'; TERF1 NAT: product 169 bp, 5'-TGCGGAGT AGATGAGATGGA-3',
5'-AAGGCAATG-GAAAACAGGTAAA-3'; TERA NAT: product 131 bp, 5-TTT-TGGCTGCAGTATTGGTG-3',
5'-CATCCTACAGGC-AAAGAGAGG-3'.
RESULTS
rSAGE Amplification, Specificity, Efficiency, and Size Distribution of 3' cDNAs
The original rSAGE (Kinzler/Vogelstein laboratories) [41, 42], the GLGI-SAGE protocol [43, 44], and our modified rSAGE strategy share several key features (Fig. 1). However, we have made several modifications to increase the efficiency of 3' cDNA conversion. For instance, changes in the design of the universal primers allowed the rSAGE library scale-up and the subsequent TSRP PCR amplification to be carried out at an increased melting temperature (Tm). The introduction of a longer poly(T) tract (T30) and the inclusion of VN dinucleotides in first strand RT-PCR primer allowed a better trapping and synthesis of full length mRNAs at their 3' ends, compared with a shorter poly(T) tract (T10) as used in the GLGI strategy that might result in the primer binding to internal poly(A) residues within mRNA transcripts. Finally, increasing the Mg2+ concentration when no distinct rSAGE band was observed in the first round of PCR could occasionally enhance the specificity of the rSAGE amplification reaction.
Of the 200 HES3 orphan SAGE tags that were selected for rSAGE conversion
(supplemental online Table 1), 168 (84.0%) yielded PCR
amplification products (Fig. 2A). The conversion rate
of orphan LongSAGE tags into longer 3' cDNA fragments was much higher (93.4%)
than that of the SAGE tags (69.2%). We attributed these improvements to
the availability of additional sequences from the LongSAGE tags for the
design of TSRPs, as well as better-designed universal primers (rSAGEF1
and rSAGER1) in our strategy (Fig. 1). In particular,
we found the universal M13 primer used as the antisense primer in the original
rSAGE strategy [41, 42] was unsatisfactory for rSAGE
because of its low Tm.
Figure 2. Results of reverse serial analysis of gene expression
(rSAGE) amplification for 200 orphan serial analysis of gene expression
(SAGE) tags.
Figure 2. Results of reverse serial analysis of gene expression (rSAGE) amplification for 200 orphan serial analysis of gene expression (SAGE) tags.
(A): Pie chart shows the distribution of rSAGE products (3' EST: 131; Non-Specific: 37; No PCR band: 32).
(B): rSAGE reactions were carried out using the tag-specific rSAGE and rSAGER1 primers, the products were analyzed on an agarose gel, and the bands were visualized with ethidium bromide. Most lanes show a single distinct amplified rSAGE band. A 100-bp ladder (M) was used as a molecular weight marker. The numbers at the top of the gel represent the SAGE Tag ID. Abbreviations: EST, expressed sequence tag; M, molecular weight marker; PCR, polymerase chain reaction.
A representative agarose gel showing the rSAGE products is shown in Figure 2B. It is noteworthy that the majority (~90%) of the TSRPs yielded only a single distinct rSAGE band. Our results also support the notion that there is no strict correlation between the efficiency of target template amplification and the abundance of the SAGE tag [29], unlike earlier reports on GLGI-SAGE [44, 45]. Other variables, such as SAGE tag length and primer sequence, may be equally important parameters influencing the efficiency of target amplification. As shown in Figure 2B, the rSAGE amplification generally generated intense bands that were easily gel-purified, although amplification of SAGE tags with a lower copy number (<20 tpm) yielded lesser PCR products and in some cases (Tag IDs 156 and 169) contained one or multiple faint bands that were difficult to gel purify; these bands were not analyzed. When two or more distinct rSAGE bands were obtained (Tag IDs 126, 141, and 148), they usually turned out to be discrete 3' cDNA fragments. In most GLGI reports, conversion to 3' cDNAs is usually attempted for SAGE tags with a high copy number [18, 19]. In contrast, a large proportion (68%) of the orphan SAGE tags we attempted to convert to 3' cDNAs were present at lower frequencies (50 tpm). We also managed to obtain genuine rSAGE products for SAGE tags with frequencies of as low as 5 tpm, which is equivalent to the detection of a singleton in the HES3 SAGE library (HESTs 79, 147, and 174; supplemental online Table 1). In conclusion, it appears that our modified rSAGE protocol has some improvements over the original rSAGE protocol [41, 42] and was as efficient as GLGI-SAGE [43, 44] and GLGI-MPSS [47].
From the 168 SAGE tags that yielded PCR amplification products, a total of 196 rSAGE products were cloned and sequenced. Of these, 148 (75.5%) were confirmed as specific rSAGE products following DNA sequencing, BLAST and BLAT confirmation (supplemental online Table 2). These 148 rSAGE 3' cDNA fragments have been submitted to GenBank (accession numbers DN604327 [GenBank] DN604453 [GenBank] ), and we will refer to these cDNA sequences hereafter as HESTs. When TSRPs were designed using the LongSAGE tags, the overall amplification specificity reached 80.5% compared with GLGI-SAGE specificities that varied between 60% for low-copy SAGE tags and 80% for high-copy SAGE tags [43, 44]. Many of the nonspecific rSAGE fragments lacked a poly(A) tract and the rSAGER1 primer and were generated mainly because of mispriming at the 3' ends (supplemental online Table 3). Finally, although the hESC lines used in our earlier SAGE study [12] and for the present rSAGE library construction were grown on MEF feeders, we did not find contaminating murine RNA transcripts a significant problem in our 3' rSAGE conversion attempts.
Overall, 16.0% of rSAGE reactions failed to give distinct amplification products. Taken together with the nonspecific rSAGE results, our main conclusion is that a SAGE tag does not always provide an ideal sequence for the design of thermodynamically favorable TSRPs for the efficient amplification of 3' cDNA by rSAGE. Thus, orphan SAGE tags that were AT-rich or contained sequences that were self-complementary often failed to generate specific rSAGE 3' cDNA fragments. Although it is possible that when the expression level of targeted templates is very low, partial annealing of the TSRPs with other highly expressed templates may result in nonspecific amplification [44], the availability of additional sequences through the generation of LongSAGE or even SuperSAGE tags [22] would allow most of the remaining orphan SAGE tags to be converted into longer 3' cDNA fragments for gene identification.
Analysis of 3' HESTs Generated from HES3 Orphan SAGE Tags
The size distribution of the 148 HESTs ranged from 36 to 538 bp,
with 56.7% of them longer than 100 bp, which matched well to the reported
data from GLGI-SAGE studies [18, 19, 43,
44]. A small number of the TSRPs [14] gave two or
more distinct rSAGE bands. The majority of them were mapped to distinct
transcripts (HEST31, 52, 53, 65, 98, 99, 148, and 170; supplemental
online Table 2), whereas those for HEST126 and 141 were the result
of alternative polyadenylation sites. Previous GLGI-SAGE reports have relied
on BLAST searches to determine the identity of the 3' cDNA fragments [18,
19, 43, 44]. We used both BLAT and BLAST searches
to establish the identity of rSAGE cDNA sequences (Fig. 3A).
Indeed, the BLAT transcript viewer made it easier to visualize and quickly
identify NATs, novel introns, and new splice variants of known transcripts
and to confirm SNPs within the SAGE tags. For several SAGE tags, rSAGE
extension resulted only in poly(A) sequences, as a result of the NlaIII
site occurring just adjacent to the poly(A) tract, and would require the
use of a different tagging enzyme to reveal their true identity. More importantly,
our rSAGE results have clearly identified 59 of these rSAGE 3' cDNA fragments
as novel rSAGE 3'ESTs and 30 NATs, all of which are identified for the
first time (Fig. 3A).
Figure 3. Identity of the 148 rSAGE 3' cDNA fragments.
Figure 3. Identity of the 148 rSAGE 3' cDNA fragments.
(A): The distribution of the various categories of rSAGE products is summarized as a pie chart.
(B): Human embryonic stem cell (hESC)-specific expression of eight HESTs were verified with semiquantitative reverse transcription-polymerase chain reaction (PCR) using total RNAs prepared from several peripheral adult tissues and fetal brain, universal reference RNA (Stratagene), undifferentiated HES3 and HES4 hESC lines, and D-HES3 cells.
(C): Quantitative real-time PCR results for GJA1 SNP analysis. Abbreviations: bp, base pairs; CT, threshold cycle; EST, expressed sequence tag; FAM, 6-carboxyfluorescein; INDEL, insertion/deletion; rSAGE, reverse serial analysis of gene expression; SNP, single-nucleotide polymorphism.
The majority of the novel rSAGE 3'ESTs that mapped to specific chromosomal
locations also contained the canonical polyadenylation signal, AATAAA or
its functional variant [48], and are likely to represent
bona fide transcripts from previously undescribed human genes. As shown
in Table 1, the majority of these 18 novel rSAGE 3'ESTs
are underrepresented in the nonhuman embryonic stem (ES) SAGE libraries
and are found mainly in SAGE libraries constructed from cancer cell lines
or carcinomas. They are likely to represent transcripts that are expressed
specifically in hESCs. For instance, HEST94 and 147 are represented only
in hESCs and could turn out to be an excellent marker for the "stemness"
phenotype of hESCs. To confirm the validity of these rSAGE 3' cDNAs and
whether they were indeed restricted only to undifferentiated hESCs, RT-PCR
was performed for several selected HESTs (97, 146, 147, and 149) across
a selected tissue panel (testis, brain, heart, skeletal muscle, fetal brain,
and stomach), undifferentiated hESCs (HES3 and HES4), and differentiated
HES3 cells (Fig. 3B). Like the well-established hESC
marker POU5F1, the RT-PCR products for these four novel rSAGE 3'
ESTs were detected only in the hESC lines and were absent in the other
somatic tissues examined. The expression of HEST149 was also completely
abrogated in differentiated HES3 cells (Fig. 3B) and
could, like Oco90 [12], prove to be a reliable
marker for monitoring the early differentiation of hESCs. Indeed, HEST149
expression was undetectable in the universal reference RNA sample, which
is an RNA pool from several cancer tissues (Fig. 3B)
and absent in several embryonal carcinoma lines such as GCT-27C4, GCT-27X1,
and GCT-44 (unpublished results).
Table 1. Chromosomal location and SAGE library representation
of 18 novel 3' reverse SAGE expressed sequence tags with authentic polyadenylation
signal.
In addition, a number of the novel 3' rSAGE cDNA fragments (e.g., HESTs 73, 92, 102, and 126) could not be matched reliably to the human genome and were also not the products of contaminating MEF cDNAs. Perhaps these HESTs represent transcripts from novel hybrid RNAs with a regulatory function or as yet undiscovered genes. The presence of consensus polyadenylation sites on several of these HESTs (e.g., 92, 102, and 126) is a good indication that these are authentic transcripts.
Interestingly, four HESTs (112, 120, 128, and 170) showed high sequence similarity to the WiCell hESC ESTs [13]. HEST2 and 146, classified as novel sequences, did not overlap with known hESC ESTs but mapped to genomic regions proximal to chromosomal sites where several WiCell hESC ESTs appear to be transcribed from. Obtaining 3' cDNA sequences that matched WiCell ESTs [13] indicated that our modified rSAGE protocol was working well. In addition, our RT-PCR data also confirmed that the expression of HEST120, 127, and 146 were confined to hESCs, although HEST120 (and to a lesser extent HEST127) was also detected in the fetal brain (Fig. 3B). Unfortunately, although these ESTs are highly restricted in their expression to hESCs, as demonstrated either by RT-PCR or by their representation in human ESC SAGE libraries [12], their exact functional role is unknown.
The impact of SNPs on the correct assignment of SAGE tags to specific transcripts [29] is also illustrated by our rSAGE results. For instance, HEST49 matched the CHD8 with almost 100% sequence similarity and is the result of an SNP that created a new NlaIII restriction site upstream of the AATAAA polyadenylation site. The full-length cDNA sequence of CHD8 is 8,160 bp long, and this SNP would generate the C-most SAGE tag. The original C-most SAGE tag for CHD8 is GGC-CCCATTG (nts 73117320), which is also represented in the HES3 SAGE library (5 tpm). We also detected an SNP within the C-most SAGE tag of GJA1, which encodes the gap junction protein connexin 43. The putative C-most SAGE tag is TGT-TCTGGAG (nts 29162925). The rSAGE conversion of the orphan SAGE tag, TGTTTTGGAG, resulted in HEST113, which displayed a 97% sequence similarity to the 3' terminal region of the GJA1 coding region. Careful examination of corresponding EST and genomic DNA sequences indicated that this orphan tag most likely represented an SNP in the canonical GJA1 SAGE tag and not the hypothetical protein FLJ10407 as suggested by the predicted tag-to-gene mapping of SAGEGenie. The GJA1 SNP was verified using 6-carboxyfluorescein (FAM)- and VIC-labeled Taqman probes that were specific to the polymorphism (Fig. 3C).
The generation of longer 3' cDNA sequences by rSAGE has also helped to resolve some of the ambiguities in tag to gene assignments, at least in HES3 cells. For example, HEST119 (AGTGAGGATA) matched the hypothetical protein FLJ35155 (C3orf21), which is restricted in expression to hESC lines and tissues of cancerous origin. In addition, the SAGE tag for HEST114 (CATCCAAAAA) was incorrectly assigned to NPY and CEP2 by SAGEGenie and SAGEMap, respectively. Instead, rSAGE conversion confirmed that HEST114 matched to the hypothetical protein FLJ10884, a hypothetical protein restricted in its expression to the testis, placenta, and hESC lines, instead of NPY.
Antisense Transcription in hESCs
BLAT and BLAST searches revealed that many of the HESTs were the
products of antisense transcription. Interestingly, cis-NATs for several
important ES-specific genes, such as NANOG (HEST16), POU5F1
(HEST88), and LIN28 (HEST168), were identified by our rSAGE results
(supplemental online Table 2). Analyzing
the chromosomal location of these cis-NATs and the corresponding sense
tags from the HES3 library revealed the presence of sense-antisense
(SA) gene pairs [34, 35, 38].
Table
2 is a list of 18 SA SAGE tag pairs and the corresponding antisense
HESTs that were experimentally obtained with rSAGE. Although several SA
SAGE tag pairs can be mapped in trans to remote genomic loci, other
pairs mapped in cis on contiguous oppositely oriented DNA strands (Fig.
4A). Besides POU5F1, NANOG, and LIN28, a number
of other highly expressed hESC-specific genes, like TGIF/TALE (HEST109),
ERH
(HEST151), TERA (HEST155), and TERF1 (HEST193.2), also expressed
cis-NATs.
Furthermore, the representation of many of these co-expressed SA SAGE tag
pairs decreased upon differentiation of the hESCs (Table
2). The SAGE tags for NANOG (TCATTACGAT) and POU5F1 (ATGTGGGATT)
cis-NATs
were found only in hESC SAGE libraries, indicating that the expression
pattern of
cis-NATs for NANOG and POU5F1 are even
more restricted than their sense transcript counterparts.
Table 2. Sense-antisense SAGE tags pairs of antisense HESTs.
Figure 4. Confirmation of natural antisense transcription in HES3 cells.
(A): Illustration of the cis- and trans-serial analysis of gene expression AS tag pair concept.
(B): Expression of POU5F1, NANOG, LIN28, TALE, TERA, and TERF1 cis-natural antisense transcripts (NATs). For amplification of cis-NATs, sense-specific primers were used for reverse transcription (RT) instead of oligo(dT) primer. During the subsequent polymerase chain reaction amplification, sense and antisense primers were used. Total RNA that had not been reverse-transcribed was used as a template control for genomic DNA contamination (RT). Abbreviations: AS, antisense; bp, base pairs.
To validate that the cis-NATs for POU5F1, NANOG, LIN28, TALE, TERF1, and TERA were specifically in hESCs, orientation specific RT-PCR [33, 49] was carried out using total RNA isolated from HES3, a universal reference RNA sample (Stratagene), testis, and stomach (Fig. 4B). First strand cDNAs were prepared using primers specific to POU5F1, NANOG, LIN28, TALE, TERF1, and TERA, respectively. Specific RT-PCR products for all three cis-NATs were detected only when RT was included, thus confirming that these cis-NATs were specifically expressed in hESCs and not due to spurious PCR amplification.
HEST115 and 168 appeared to represent spliced SA transcripts from
ILF2
and LIN28, respectively. Nucleotides (nts) 142 of HEST115 matched
the ILF2 coding region in the antisense orientation (Chr1[+]:
150447872150447913), whereas nts 24222 matched the sense orientation
(Chr1[[: 150447587150447785). Likewise, nts 1133 of HEST168 matched
the LIN28 coding region in the antisense orientation (Chr1[]:
2643991826440050), whereas nts 131171 matched the sense orientation (Chr1[+]:
2644031026440350). This novel sense-antisense RNA hybrid structure
is originally reported for the cardiac troponin I gene in rat hearts [50].
The structure the cardiac troponin I "hybrid RNA," which the authors themselves
have tentatively concluded to be formed from the transcription of the troponin
mRNA in the cytoplasm, is very similar to what we have described for ILF2
and LIN28. The functional significance of these hybrid RNAs is currently
unknown.
DISCUSSION
Unlike DNA microarray, SAGE does not require prior knowledge of the sequences to be analyzed. Hence, SAGE libraries provide discreet and unbiased directional gene expression data that are ideally suited for gene discovery and SA expression analysis [35, 38]. Although MPSS [51] is capable of deeper coverage of the gene expression profile, it requires specialized reagents and equipment, and this has restricted the availability of MPSS libraries for various human tissues and cell types, including those for hESCs. On the other hand, SAGE comprises several standard molecular biology techniques and can be adapted for microanalysis [52, 53]. This has resulted in the construction of SAGE libraries from a large variety of human cell types and tissues, and they are an important resource for the discovery of novel genes and NATs [38, 54, 55].
Although the human transcriptome is necessarily less complex than the human genome, it is quite apparent that transcriptome complexity has been underestimated [34, 35, 38, 44]. Noncoding RNA, regulatory RNA, NATs, and novel splice variants add to the multifaceted nature of the transcriptome. In the present study, we have used a modified rSAGE strategy to convert selected orphan SAGE tags from hESCs into longer 3' cDNAs. It has facilitated the identification of isoforms due to splicing, alternative polyadenylation and SNPs. A large number of novel hESC-specific genes have also been identified, indicating that the hESC transcriptome is indeed poorly characterized [12]. This is also the first description of cis-NATs from several key pluripotent genes that are involved in the maintenance of hESC self-renewal, suggesting that SA transcript pairing might be a key regulatory mechanism [31].
A recent study reported that 41.5% of SA transcript overlaps occurred in the last exon or untranslated region (UTR) of the coding sequence [34]. We have found that overlaps between the cis-NAT of LIN28, NANOG, and POU5F1 and their corresponding sense transcripts occurred in the 3' UTR of the coding sequence as well. Although the exact significance of this positional overlap is unknown, UTRs are believed to contribute toward the localization, stability, and translational control of mRNA transcripts. Indeed, the finding that >30% of vertebrate mRNAs show orthologue-specific conservation of 3' UTRs suggests a possible functional or regulatory role for UTR sequences [56]. The recent finding that many of the human SA gene pairs are also detected in mouse, rat, and fugu and are probably conserved throughout the course of vertebrate evolution [57] lends some support to the notion that cis-NATs are not due to a "leakage" of the transcriptional apparatus but rather that their abundance is the result of active transcription. For POU5F1 and NANOG, we have ruled out the possibility that their cis-NATs are due to the insertion of L1 retrotransposon [58]. However, because there are several pseudogenes for POU5F1 and NANOG, the possibility of trans-NATs from these genomic loci remains to be determined.
Several reports have hinted that the contribution of NATs in the human genome has been underestimated [34, 35] and that up to 25% of human transcripts might form natural SA pairs. Although initial studies indicated that there was no correlation between NATs and their function or localization [34], a more recent survey of SA pairs confirmed that they are predominant for genes involved in translation regulator activity, DNA damage response, and cell growth, whereas non-SA transcripts were found to have a significantly different functional distribution [35]. Several of the human ES NATs and SA gene pairs we have identified are representative of genes that code for transcription factors and RNA-binding proteins, whereas SA gene pairs for ubiquitously expressed genes, such as glyceraldehyde-3-phosphate dehydrogenase and ACTB, were not present in the HES3 SAGE library. The fact that SA transcripts have a significantly higher probability of involvement in translation regulator activity and are more frequently located in both the nucleus and cytoplasm [35] is compatible with a role in antisense-mediated gene regulation occurring in both the nucleus and cytoplasm and at the transcription and translation levels [31].
Although certain human miRNAs (miR-1 and miR-124) have been recently
demonstrated to influence and define tissue-specific gene expression profiles
in HeLa cells [59], the functional roles of the cis-NATs
in similar context have not been previously reported. Since cis-NATs
are also capable of regulating gene expression through RNA masking, transcriptional
or RNA interference [31, 32], the identification of
cis-NATs
for POU5F1 and NANOG prompted us to determine whether cis-NATs
might be commonly expressed for other key regulators that are involved
in the maintenance of pluripotency in hESCs. Both the mouse and the human
SAGE libraries were searched for the presence of SAGE tags representing
the cis-NATs for ES-specific genes [12,
60].
We failed to find SAGE tags representing UTF1,
REX1, LEFTB,
and GDF3 cis-NATs in human and mouse SAGE libraries. However,
we detected cis-NATs for a number of key ES-specific genes (e.g.,
FGFR1,
FGFR2,
TDGF1,
SOX2)
in HES3 and SAGE libraries constructed from other hESC lines (Table
3). In addition, SAGE tags representing
pou5f1,
nanog,
tera,
and lin28 were also detected in mouse embryonic stem cells (mESCs).
In summary,
cis-NATs for a number of ES-specific genes, such as
POU5F1
and NANOG, were shown to be expressed in both hESCs and mESCs, and
it is possible that some of these cis-NATs might have a role in
maintaining the
"stemness" phenotype of ES cells.
Table 3. Occurrence of SAGE tags of cis-natural
antisense transcripts of selected embryonic stem-specific genes in human
and mouse embryonic stem cell SAGE libraries.
Our study further underscores the importance of obtaining longer
3' cDNAs from orphan SAGE tags and the versatility of rSAGE as a powerful
complementary tool to SAGE expression libraries for gene discovery. Lastly,
the hESC-specific transcripts that we have described are clear targets
for further study, and the conversion of the remaining orphan SAGE tags
from HES3 and other hESCs would likely provide additional valuable resources,
mainly in terms of novel transcripts, and uncover additional cis-NATs
for the in-depth functional dissection of the molecular pathways involved
in the self-renewal of pluripotent hESCs and their subsequent lineage commitment
to their differentiated progenies.
Supplemental Data:
Supplementary Table 1. List of SAGE orphan tags and primers used
for reverseSAGE.
http://stemcells.alphamedpress.org/cgi/content/full/24/5/1162/DC1
This study was supported by Embryonic Stem Cell International Pte.
Ltd. grant R-174-000-081-592 and National University of Singapore Academic
Research Fund grant R-154-000-179-112.
DISCLOSURES
The authors indicate no potential conflicts of interest.
REFERENCES
1. Thomson JA, Itskovitz-Eldor J, Shapiro SS et al. Embryonic stem
cell lines from human blastocysts. Science 1998;282:11451147.
2. Reubinoff BE, Pera MF, Fong CY et al. Embryonic stem cell lines
from human blastocysts: Somatic differentiation in vitro. Nat Biotechnol
2000; 18:399404.
3. Richards M, Fong CY, Chan WK et al. Human feeders support prolonged
undifferentiated growth of human inner cell masses and embryonic stem cells.
Nat Biotechnol 2002;20:933936.
4. Mayhall EA, Lugassy N, Zon LI. The clinical potential of stem
cells. Curr Opin Cell Biol 2004;16:713720.
5. Edwards RG. Stem cells today: A. Origin and potential of embryo
stem cells. Reprod Biomed Online 2004;8:275306.
6. Rao M. Conserved and divergent paths that regulate self-renewal
in mouse and human embryonic stem cells. Dev Biol 2004;275:269286.
7. Loring JF, Porter JG, Seilhammer J et al. A gene expression profile
of embryonic stem cells and embryonic stem cell-derived neurons. Restor
Neurol Neurosci 2001;18:8188.
8. Sato N, Sanjuan IM, Heke M et al. Molecular signature of human
embryonic stem cells and its comparison with the mouse. Dev Biol 2003;260:404413.
9. Sperger JM, Chen X, Draper JS et al. Gene expression patterns
in human embryonic stem cells and human pluripotent germ cell tumors. Proc
Natl Acad Sci U S A 2003;100:1335013355.
10. Abeyta MJ, Clark AT, Rodriguez RT et al. Unique gene expression
signatures of independently-derived human embryonic stem cell lines. Hum
Mol Genet 2004;13:601608.
11. Bhattacharya B, Miura T, Brandenberger R et al. Gene expression
in human embryonic stem cell lines: Unique molecular signature. Blood 2004;103:29562964.
12. Richards M, Tan SP, Tan JH et al. The transcriptome profile
of human embryonic stem cells as defined by SAGE. STEM CELLS 2004;22:5164.
13. Brandenberger R, Wei H, Zhang S et al. Transcriptome characterization
elucidates signaling networks that control human ES cell growth and differentiation.
Nat Biotechnol 2004;22:707716.
14. Brandenberger R, Khrebtukova I, Thies RS et al. MPSS profiling
of human embryonic stem cells. BMC Dev Biol 2004;4:10.
15. Wei CL, Miura T, Robson P et al. Transcriptome profiling of
human and murine ESCs identifies divergent paths required to maintain the
stem cell state. STEM CELLS 2005;23:166185.
16. Sato N, Meijer L, Skaltsounis L et al. Maintenance of pluripotency
in human and mouse embryonic stem cells through activation of Wnt signaling
by a pharmacological GSK-3-specific inhibitor. Nat Med 2004; 10:5563.
17. James D, Levine AJ, Besser D et al. TGFß/activin/nodal
signaling is necessary for the maintenance of pluripotency in human embryonic
stem cells. Development 2005;132:12731282.
18. Lee S, Zhou G, Clark T et al. The pattern of gene expression
in human CD15+ myeloid progenitor cells. Proc Natl Acad Sci U S A 2001;98:
33403345.
19. Zhou G, Chen J, Lee S et al. The pattern of gene expression
in human CD34(+) stem/progenitor cells. Proc Natl Acad Sci U S A 2001;98:
1396613971.
20. Velculescu VE, Zhang L, Vogelstein B et al. Serial analysis
of gene expression. Science 1995;270:484487.
21. Saha S, Sparks AB, Rago C et al. Using the transcriptome to
annotate the genome. Nat Biotechnol 2002;20:508512.
22. Matsumura H, Reich S, Ito A et al. Gene expression analysis
of plant host-pathogen interactions by SuperSAGE. Proc Natl Acad Sci U
S A 2003;100:1571815723.
23. Boon K, Osorio EC, Greenhut SF et al. An anatomy of normal and
malignant gene expression. Proc Natl Acad Sci U S A 2002;99:1128711292.
24. Chen J, Sun M, Lee S et al. Identifying novel transcripts and
novel genes in the human genome by using novel SAGE tags. Proc Natl Acad
Sci U S A 2002;99:1225712262.
25. Venter JC, Adams MD, Myers EW et al. The sequence of the human
genome. Science 2001;291:13041351.
26. Lander ES, Linton LM, Birren B et al. Initial sequencing and
analysis of the human genome. Nature 2001;409:860921.
27. Wang DG, Fan JB, Siao CJ et al. Large scale identification,
mapping, and genotyping of single-nucleotide polymorphisms in the human
genome. Science 1998;280:10771082.
28. Sachidanandam R, Weissman D, Schmidt SC et al. A map of human
genome sequence variation containing 1.42 million single nucleotide polymorphisms.
Nature 2001;409:928933.
29. Silva AP, de Souza JE, Galante PA et al. The impact of SNPs
on the interpretation of SAGE and MPSS experimental data. Nucleic Acids
Res 2004;32:61046110.
30. Kumar M, Carmichael GG. Antisense RNA: Function and fate of
duplex RNA in cells of higher eukaryotes. Microbiol Mol Biol Rev 1998;62:
14151434.
31. Lavorgna G, Dahary D, Lehner B et al. In search of antisense.
Trends Biochem Sci 2004;29:8894.
32. Lehner B, Williams G, Campbell RD et al. Antisense transcripts
in the human genome. Trends Genet 2002;18:6365.
33. Shendure J, Church GM. Computational discovery of sense-antisense
transcription in the human and mouse genomes. Genome Biol 2002; research:0044.10044.14.
34. Yelin R, Dahary D, Sorek R et al. (2003) Widespread occurrence
of antisense transcription in the human genome. Nat Biotechnol 2003;21:
379386.
35. Chen J, Sun M, Kent WJ et al. Over 20% of human transcripts
might form sense-antisense pairs. Nucleic Acids Res 2004;32:48124820.
36. Suh MR, Lee Y, Kim JY et al. Human embryonic stem cells express
a unique set of microRNAs. Dev Biol 2004;270:488498.
37. Schuler GD, Boguski MS, Stewart EA et al. A gene map of the
human genome. Science 1996;274:540546.
38. Quere R, Manchon L, Lejeune M et al. Mining SAGE data allows
large-scale, sensitive screening of antisense transcript expression. Nucleic
Acids Res 2004;32:e163.
39. Patankar S, Munasinghe A, Shoaibi A et al. Serial analysis of
gene expression in Plasmodium falciparum reveals the global expression
profile of erythrocytic stages and the presence of anti-sense transcripts
in the malarial parasite. Mol Biol Cell 2001;12:31143125.
40. Gunasekera AM, Patankar S, Schug J et al. Widespread distribution
of antisense transcripts in the Plasmodium falciparum genome. Mol Biochem
Parasitol 2004;136:3542.
41. Polyak K, Xia Y, Zweier JL et al. A model for p53-induced apoptosis.
Nature 1997;389:300305.
42. Yu J, Zhang L, Hwang PM et al. Identification and classification
of p53-regulated genes. Proc Natl Acad Sci U S A 1999;96:1451714522.
43. Chen JJ, Rowley JD, Wang SM. Generation of longer cDNA fragments
from serial analysis of gene expression tags for gene identification. Proc
Natl Acad Sci U S A 2002;97:349353.
44. Chen J, Lee S, Zhou G et al. High-throughput GLGI procedure
for converting a large number of serial analysis of gene expression tag
sequences into 3' complementary DNAs. Genes Chromosomes Cancer 2002;33:252261.
45. van den Berg A, van der Leij J, Poppema S. Serial analysis of
gene expression: Rapid RT-PCR analysis of unknown SAGE tags. Nucleic Acids
Res 1999;27:e17.
46. Richards M, Tan S, Fong CY et al. Comparative evaluation of
various human feeders for prolonged undifferentiated growth of human embryonic
stem cells. STEM CELLS 2003;21:546556.
47. Silva AP, Chen J, Carraro DM et al. Generation of longer 3'
cDNA fragments from massively parallel signature sequencing tags. Nucleic
Acids Res 2004;32:e94.
48. Tian B, Hu J, Zhang H et al. A large-scale analysis of mRNA
polyadenylation of human and mouse genes. Nucleic Acids Res 2005;33:201212.
49. Rosok O, Sioud M. Systematic identification of sense-antisense
transcripts in mammalian cells. Nat Biotechnol 2004;22:104108.
50. Bartsch H, Voigtsberger S, Baumann G et al. Detection of a novel
sense-antisense RNA-hybrid structure by RACE experiments on endogenous
troponin I antisense RNA. RNA 2004;10:12151224.
51. Brenner S, Johnson M, Bridgham J et al. Gene expression analysis
by massively parallel signature sequencing (MPSS) on microbead arrays.
Nat Biotechnol 2000;18:630634.
52. Datson NA, van der Perk-de Jong J, van den Berg MP et al. MicroSAGE:
A modified procedure for serial analysis of gene expression in limited
amounts of tissue. Nucleic Acids Res 1999;27:13001307.
53. Vilain C, Libert F, Venet D et al. Small amplified RNA-SAGE:
An alternative approach to study transcriptome from limiting amount of
mRNA. Nucleic Acids Res 2003;31:e24.
54. Boheler KR, Stern MD. The new role of SAGE in gene discovery.
Trends Biotechnol 2003;21:5557.
55. Dinel S, Bolduc C, Belleau P et al. Reproducibility, bioinformatic
analysis and power of the SAGE method to evaluate changes in transcriptome.
Nucleic Acids Res 2005;33:e26.
56. Lipman DJ. Making (anti)sense of non-coding sequence conservation.
Nucleic Acids Res 1997;25:35803583.
57. Dahary D, Elory-Stein O, Sorek R. Naturally occuring antisense:
Transcriptional leakage or real overlap. Genome Res 2005;15:364368.
58. Han JS, Szak ST, Boeke JD. Transcriptional disruption by the
L1 retrotransposon and implications for mammalian transcriptomes. Nature
2004;429:268274.
59. Lim LP, Lau NC, Garrett-Engele P et al. Microarray analysis
shows that some microRNAs downregulate large numbers of target mRNAs. Nature
2005;433:769773.
60. Pera MF, Trounson AO. Human embryonic stem cells: Prospects
for development. Development 2004;131:55155525.
In this innovative study by Mark Richards, Siew-Peng Tan, Woon-Khiong Chan, and Ariff Bongso, a new technique of reverse-SAGE analysis has revealed the extensive synthesis of antisense RNA species by several key genes in human and mouse embryonic stem cells.
1. Sun M, Hurst LD, Carmichael GG, and Chen J,
"Evidence for variation
in abundance of antisense transcripts between multicellular animals, but
no relationship between antisense transcriptionand organismic complexity".
2. Frenster JH, and Hovsepian JA, "Kissing Chromosomes and Paired Sense-Antisense RNA Synthesis".
3. Ivanova N, Dobrin R, Lu R, Kotenko I, Levorse J, DeCoste C, Schafer X, Lun Y, and Lemischka IR, "Dissecting self-renewal in stem cells with RNA interference".
4. Mollica LR, Crawley JTB, Liu K, Rance JB, Cockerill PN, Follows GA, Landry J-R, Wells DJ, and Lane DA, "Role of a 5'-enhancer in the transcriptional regulation of the human endothelial cell protein C receptor gene".
Links to RNA and Biological Causality:
A Brief History of Activator RNA:
Links to
Euchromatin Activator RNA Reviews:
Links to
Euchromatin Activator RNA Research:
Links to Ultrastructural
Probes of DNase I-Sensitive Sites:
Links to
RNA as a Therapeutic Agent:
Links to Hodgkin Lymphoma
Immuno-Pathology:
Links to Activated
T-Lymphocyte Immunotherapy:
Links to Medical
Systems Biology:
Links to Selective
Gene Transcription:
Links to RNA-Induced
Epigenetics:
Links to RNA-Induced
Embryogenesis:
Links to RNA and
Biological Causality:
Links to Reprogramming
and Neoplasia:
"Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".