dict.md logo

More active human L1 retrotransposons produce longer insertions

The vast majority of L1 insertions are 5′ truncated and thus inactive. Yet, the mechanism of 5′ truncation is unknown. To examine whether the frequency of L1 retrotransposition is directly correlated with the length of genomic L1 insertions, we used a cell culture assay to measure retrotransposition frequency and a PCR-based assay to measure L1 insertion length. We tested five full-length human L1 elements that retrotranspose at different frequencies: LRE3, L1RP, L1.3, L1.2A and L1.2B. Our data suggest that L1 insertion length correlates with L1 retrotransposition frequency for insertions >1 kb in length. For two elements, L1RP and L1.2A, we found that swapping the reverse transcriptase domains had little effect. Instead, we found that genomic insertion length and retrotransposition frequency are substantially affected by amino acid substitutions at positions 363, 1220 and 1259 in ORF2. We suggest that the region containing residues 1220 and 1259 may be important in the binding of ORF2p to L1 RNA to facilitate reverse transcription.

The plasticity and expansion of the human genome are due, in part, to the replicative activity of mobile genetic elements (1), the most active of which is the L1 retrotransposon (2,3). L1 (LINE-1 or long interspersed element) is an autonomous non-long terminal repeat (non-LTR) retrotransposon that makes up 17% of the genome, distributing DNA copies of itself to new genomic locations (2,4,5). A functional human L1 element is ∼6 kb in length and has two open reading frames (ORF1 and ORF2) that encode proteins required for its mobilization (69). Retrotransposition involves transcription, reverse transcription and integration into a new genomic location. Reverse transcription and integration are coupled in a target primed reverse transcription reaction that takes place on genomic DNA (10,11). In addition to their own replication, L1s may shuffle exons by carrying genomic flanking sequences with them when they move (1214). They can also provide the machinery for processed pseudogene formation and mobilization of Alu elements, further diversifying the genome (1518).

A study of 5′ truncation is relevant to L1 biology and important for our understanding of genome evolution. Over 95% of genomic L1 sequences and approximately two-thirds of recent L1 insertions are 5′ truncated (1924). The mechanistic basis for 5′ truncation is unknown. One commonly offered explanation is that the L1 reverse transcriptase (RT) enzyme disengages from the L1 RNA template before completing the full-length cDNA sequence (21). However, premature termination of reverse transcription cannot in and of itself account for the fact that genomic L1s form a bimodal distribution of insertion lengths with short (<1 kb) and full-length (6 kb) insertions encountered most often (2226).

The extent to which an L1 element truncates during retrotransposition limits its ability to successfully colonize the host genome. Because L1 proteins act preferentially to mobilize the RNA that encoded them (cis-preference), a truncated genomic L1 insertion will not efficiently be mobilized in trans (5,15,16,27,28). Since essentially only full-length L1 insertions are capable of retrotransposition (16,2729), L1 elements that produce a higher number of full-length copies are less susceptible to extinction. Should a parental L1 be subjected to a lethal, inactivating mutation, its previously disseminated, full-length progeny can continue its genetic legacy. Conversely, full-length insertions may be counter-selected over evolutionary time for their deleterious effects on the expression of neighboring genes (30). By studying de novo L1 insertions in cultured cells, we can characterize insertions without the confounding influence of negative selection over evolutionary time. Lastly, an improved understanding of why 5′ truncation occurs may facilitate the design of more active L1 elements for biomedical research. Highly active L1 elements could be harnessed as insertional mutagens, cell lineage markers or gene delivery vehicles (3133).

In this study, we characterized the retrotransposition activity and insertion lengths of a series of cloned human L1 elements. We found that human L1 elements with higher activity in a cultured cell assay of retrotransposition produce longer genomic insertions. Surprisingly, we also found that three amino acid changes within the ORF2 protein, outside of the RT domain, contribute significantly to insertion length.

All of the human L1 elements that were used in these assays were swapped into the RJD99-RP-Neo plasmid using Not1 (which cuts just upstream of the L1 5′UTR) and BstZ17i (which cuts in the L1 3′UTR just upstream of the neomycin resistance gene cassette). RJD99 is a derivative of pCEP4 (Invitrogen) that lacks the CMV promoter-containing BglII fragment. RJD99-RP-Neo was constructed as described previously (29). The RT domains were excised from L1.2A and L1RP using EcoRV and swapped by subcloning the elements into pBluescript (Stratagene). The derivations of the various mutants are as follows: L1.2A I1220/S1259: L1RP Spe1-BstZ17i fragment plus L1.2A BstZ17i-Spe1; L1.2A I1220: L1RP Spe1-Nco1 fragment plus L1.2A Nco1-Spe1; L1.2A S1259: L1RP Nco1-BstZ17i fragment plus L1.2A BstZ17i-Nco1; L1RP M1220/L1259: L1.2A Spe1-BstZ17i fragment plus L1RP BstZ17i-Spe1; L1RP M1220: L1.2A Spe1-Nco1 fragment plus L1RP Nco1-Spe1; L1RP L1259: L1.2A Nco1-BstZ17i fragment plus L1RP BstZ17i-Nco1.

LRE 3 was cloned as detailed in Brouha et al. (34) and sub-cloned into RJD 99.

ORF2 residue 363 in L1RP and L1.2 were mutated using the QuikChange mutagenesis kit (Stratagene) and the following primer sets: L1RP G363 sense, 5′-AAT CAA TGA ATC CGG GAG CTG GTT TTT TGA AAG G-3′; anti-sense, 5′-AAC CAG CTC CCG GAT TCA TTG ATT TTT TGA AGG G-3′; L1.2 R363 sense, 5′-AAT CAA TGA ATC CAG GAG CTG GTT TTT TGA AAG G-3′; anti-sense, 5′-AAC CAG CTC CTG GAT TCA TTG ATT TTT TGA AGG G-3′. Sequence analysis of a ∼700 bp region flanking the insertion confirmed the presence of the desired mutation. Four independent clones of each mutant were run in the cultured cell assay to compensate for any potential PCR errors outside of the sequenced region.

HeLa cells (human cervical carcinoma cell line) were cultured and transfected with L1 constructs as described previously (6,35). Details of the cell culture conditions are given in Figure 1a.

G418 resistant clones were expanded in duplicate 24 well plates in 400 µg/ml G418 (Gibco/LTI) and genomic DNA was prepared as described previously (36). AmpliTaq-Gold polymerase (Roche) was used for reactions under 1 kb. For all other amplifications, FailSafeR PCR enzyme mix was used (Epicentre Technologies). Each reaction contained 250 ng of genomic DNA, 100–200 ng of each primer, 1× reaction buffer (either buffer A with 15 mM MgCl2 from Roche for amplifications with Taq Gold or buffers D, E or F for FailSafeR amplifications), 0.2 mM dNTPs (when using the Taq buffer; dNTPs are included in the FailSafe buffer) and 1.5–2.0 U of polymerase in a volume of 50 µl. Typical amplification conditions were: 94°C, 15 min; (94°C, 30 s; 54–62°C, 30 s; 72°C, 1 min per kilobase) × 35–40 cycles; 72°C, 10 min; 4°C hold on a Peltier thermocycler (Hybaid or MJ research). Oligonucleotides used for PCR included the following: com (Neo anti-sense anchor, 7819), 5′-ATT GAA CAA GAT GGA TTG CAC GC-3′; 1 kb sense (nucleotide 5892), 5′-ATA GCA TTG GGA GAT ATA CC-3′; 2 kb sense (nucleotide 4819), 5′-AGA AAG CTG AAA CTG GAT CCC-3′; 3 kb sense (nucleotide 3961), 5′-CAG GGA TGC CCT CTC TCA CCG-3′.

Nucleotide positions of the oligonucleotides are based on the sequence of L1RP (GenBank accession no. AF148856) and the sequence of the L1RP-mneoI cassette (29). PCR assays were validated by sequence analysis of the spliced and unspliced L1-mneoI amplicons.

Ten micrograms of genomic DNA from selected NeoR clones was digested overnight with AseI or EcoRI (New England BioLabs) and prepared for Southern hybridization using standard methods. The probe for hybridization was a 1370 bp BamH1–EcoR1 fragment containing the neomycin phosphotransferase exon (gift of John Moran).

To determine if there is a correlation between the activity of an L1 element and the length of the insertions that it generates, we relied on two parallel assays (Fig. 1a). In one, we performed a retrotransposition assay in which cells were selected for stable expression of the hygromycin resistance gene on the pCEP4 plasmid that contains the L1-mneoI retrotransposition cassette (6). Then, hygromycin resistant cells were subjected to G418 selection. The fraction of G418 resistant (G418R) cells divided by the number of hygromycin resistant cells gave an estimate of retrotransposition frequency. In the second assay, we performed a transient retrotransposition assay (without hygromycin selection) (37). The transient assay was used to obtain clones with L1 insertions rapidly.

The ruler PCR assay (Fig. 1b) measures the minimum length of the L1-mneo insertion. (Hereafter we refer to the intron-containing retrotransposition construct as L1-mneoI and the insertion as L1-mneo.) The assay uses a reverse primer anchored in the neomycin phosphotransferase gene (mneoI, beyond the intron) and a series of forward primers that reside in different locations within the L1 element. For an insertion to be positive in this assay, it must be at least as long as the neomycin resistance gene. We include the neomycin marker in our estimates of the insertion length. The 1 kb sense L1 primer resides at nucleotide position 5892 (using L1RP as a reference) (29) in the L1 3′UTR, within 50 bp of the mneoI cassette. In combination with the neomycin cassette, this PCR amplifies a ∼1 kb product. Greater than 95% of the G418R clones with sufficient DNA were positive for the L1-mneo (spliced intron) product in this 1 kb ruler PCR assay (n = 816 clones). All of the clones typed by this ruler PCR assay for 2 or 3 kb insertions are positive for the 1 kb insertion.

If a clone is positive in a ruler PCR assay, we conclude that it has an insertion that truncates upstream of the L1 sense strand primer. In some cases it is possible to further ‘size’ the insertion. For example, an insertion that truncates between the first and the second kilobase of L1-mneo sequence will be positive in the 1 kb PCR, but negative in the 2 kb PCR. On the other hand, an insertion that is positive in all of the PCR assays is at least 3 kb in length. A gel of the 3 kb ruler PCR assay is shown in Figure 1c. This gel shows that 9/10 L1RP-mneo clones and 2/6 L1.2A-mneo clones have insertions that are at least 3 kb in length. It also shows that several of the clones have a weak band corresponding to the plasmid template (which contains L1-mneoI). In multiple experiments, the fraction of clones that had a plasmid band was not correlated with the fraction of clones that had an insert band. For example, in one 3 kb ruler PCR experiment, L1.2A had 5/64 positive clones and 38 of these 64 (59%) clones had a plasmid band, while L1RP had 19/22 positive clones and 14 of these clones (64%) had the plasmid band (data not shown). In general, the plasmid amplicon is disfavored in these assays because (i) antibiotic selection was not maintained for the plasmid and (ii) the extension time was kept short in order to reduce amplification of the plasmid relative to the insertion. Since all of the clones shown in Figure 1c were positive in the 2 kb ruler PCR assay (data not shown), the lack of amplification in clones 31, 99, 100 and 102 is not due to poor DNA quality.

Figure 1c suggests that G418R clones from L1RP have longer insertions than G418R clones from L1.2A. Since L1RP is approximately 40–60 times more active in the cultured cell assay than L1.2A (29), these results suggest that L1 insertion length and retrotransposition activity are positively correlated. However, an alternative explanation is a copy number bias towards more insertions per clone with L1RP. If a single HeLa clone had multiple L1 insertions, the ruler PCR assay would preferentially detect the longest. Thus, an L1RP clone with a large number of insertions would be more likely to have a long insertion than a L1.2A clone with one or two insertions, even if the frequency distributions of insertion lengths for L1RP and L1.2A were identical. We examined the possibility of a copy number bias by Southern blot (see Materials and Methods). The Southern blots revealed that L1RP and L1.2A clones had approximately equal numbers of insertions (between one and two insertions per clone on average; data not shown). Using the same transient cultured cell assay, Wei and colleagues have also observed that G418R clones often have more than one L1 insertion (37).

Having preliminary evidence for a correlation between L1 retrotransposition frequency and insertion length, we next sought to determine whether this correlation held for more than two human L1 elements (Fig. 2). Five cloned human L1 elements were chosen for these studies: LRE3 (34), L1.2A (38), L1.2B (38), L1RP (29) and L1.3 (39). Although they had very similar nucleic acid sequences (see Fig. 2), these elements were known to exhibit very different levels of mobility in the cultured assay of retrotransposition (LRE3 > L1RP > L1.3 ∼ L1.2B > L1.2A). All five elements are members of the youngest group of the Ta subset (40), Ta-1d (24).

Insertion lengths correlated with retrotransposition frequencies for these five elements (Fig. 3). Because of the inherent variation between retrotransposition assays, we restricted comparisons of the retrotransposition frequencies to elements tested within the same experiment (see Supplementary Material). To allow for comparisons between experiments, we derived a normalized retrotransposition frequency (nRF). The nRF uses the absolute retrotransposition frequency of the least active element in our series, L1.2A, as a basis for comparison. For example, in Figure 3, the absolute frequency for L1.2A is 1/838, which corresponds to a nRF of 1.0. The absolute frequency of L1RP is 1/21, corresponding to a nRF of 40. Other experiments performed with the same L1 elements have produced consistent nRFs (see Figs 46 and Supplementary Material).

Based on proposed models of retrotransposition, we wondered if the difference in L1RP and L1.2A insertion length resided in the RT domain. To test this hypothesis, the RT domains of L1RP and L1.2A were swapped using flanking EcoRV sites (EcoRV sites are found at nucleotides encoding residues 399–400 and 1111–1112 in the L1 ORF2p). Surprisingly, we found that swapping the RT domains between L1RP and L1.2A did not have a significant effect on retrotransposition frequency or insertion length (Fig. 4).

Another potential source of the lower retrotransposition frequency and shorter insertions associated with L1.2A is glycine 363 of ORF2. This highly conserved residue is located between the endonuclease and RT domains. Sequences neighboring residue 363 are conserved among active human L1 elements, and residue 363 is conserved evolutionarily as a basic residue. Human, mouse and rabbit L1s encode an arginine, while rat, medaka and slow loris (a prosimian) have a lysine residue at position 363. We used site-directed mutagenesis to create L1.2 R363 and L1RPG363 (Fig. 5). Residue 363 has a 2–3-fold effect on the retrotransposition frequency of L1.2A and a small effect on insertion length (Fig. 5). However, the data did demonstrate a statistically significant linear regression for the relationship between nRF and the fraction of insertions >3 kb (P < 0.005).

Given the relatively small contributions of the RT domain and residue 363 of ORF2p, we searched for other sequence differences among the elements that could account for their differences in retrotransposition frequencies and insertion lengths. Since L1.2A and L1.2B differ by only two amino acids (at positions 1220 and 1259 at the COOH terminus of ORF2p) yet vary by ∼15-fold in retrotransposition frequency, we sought to determine the individual contributions of these amino acids. As with the more active L1.2 element (L1.2B), all other highly active elements in our panel (L1RP, L1.3 and LRE3) have an isoleucine at position 1220 and a serine at position 1259 (Fig. 2). To assess the individual contributions of these two amino acid residues to L1 retrotransposition frequency and insertion length, an allelic series of mutants was generated and tested in the cell culture assay and by ruler PCR. We found that both amino acids 1220 and 1259 contribute to retrotransposition frequency and insertion length, with reciprocal effects on L1RP and L1.2A (Fig. 6). When serine 1259 of L1RP is mutated to leucine (S1259L), the retrotransposition frequency drops to ∼40% of L1RP levels. S1259L accounts for two-thirds of the effect of the double mutant (S1259L/M1220I) on insertion length (Fig. 6). Conversely, changing the leucine at 1259 in L1.2A to a serine increases retrotransposition frequency and insertion length. The L1259S mutation is at least twice as effective in increasing both the nRT and insertion length as the M1220I substitution (Fig. 6). Thus, both residues affect retrotransposition frequency and insertion length, their effects are additive, and approximately two-thirds of their effect appears to be due to residue 1259.

We then carried out a linear regression analysis of the combined data of Figures 5 and 6 (Supplementary Material, experiments 2 and 3). The direct relationship of nRF with the fraction of insertions >3 kb was statistically significant (P < 0.0001, r2 = 0.80; Fig. 7).

We analyzed five active, young human L1 retrotransposons for their retrotransposition frequency and insertion lengths in a cultured cell assay (Fig. 1). There is a positive correlation between insertion length and retrotransposition frequency (Fig. 3). Over 85% of insertions of highly active elements (with retrotransposition frequencies greater than one event in every 50 transfected cells) are over 3 kb in length. On the other hand, fewer than 20% of insertions of elements with lower retrotransposition frequencies (approximately 1 in 500–2000 transfected cells) are >3 kb in length. All told, five natural L1s and nine mutant constructs show a general correlation between retrotransposition frequency and insertion length. One exception is L1.2B, which has a higher retrotransposition frequency than L1.3, but approximately the same proportion of clones with 2 and 3 kb insertions (Fig. 3).

We considered four potential sources of bias in our data on L1 insertion length. First, the basis for the correlation is not simply a higher copy number of L1 insertions in cells with the more active element. Rather, the copy numbers of insertions in cells containing a highly active element, L1RP, versus a less active element, L1.2A, were similar.

Secondly, our assay system minimizes the detection of insertions with 3′ transduction events. 3′ Transduction arises due to the use of downstream polyadenylation signals in flanking DNA sequences. Since the ruler PCR is rooted in the neomycin resistance gene, the lengths of any insertions that carry downstream flanking sequences due to 3′ transduction will be underestimated. However, we believe that very few, if any, of the insertions we characterized contained 3′ transduced sequences because all of the L1 elements studied were cloned into a mammalian expression vector (pCEP4) that has a very strong polyadenylation signal (SV40 late poly A) just downstream of L1. Indeed, no 3′ transductions beyond the SV40 poly A have been observed in over 80 characterized insertions in cultured cells when the SV40 poly A signal was present (6,41,42).

Thirdly, our assay is biased against the detection of inversions. Approximately 20% of L1 Ta insertions have inversions, the majority of which begin in the 3′-most kilobase of L1 sequence (43). If inversions arise in a similar location in L1-mneo insertions, they will likely disrupt the neomycin resistance gene. We looked for inversions in G418R clones that failed to amplify in the 1 kb ruler PCR (n = 12). No evidence of inversion was obtained using the reverse complementary primer sequence to the 1, 2 and 3 kb primers and the same anti-sense neomycin primer (data not shown). These assays do not address the possibility of insertions in the vast majority of clones that do amplify in the 1 kb ruler PCR.

Fourthly, we noted that our measurement of L1 insertion length is biased against very short insertions (<1 kb in length) because they do not confer resistance to G418 (Fig. 1b). Yet, in one survey of the human genome, L1 Ta insertions <1 kb were common, accounting for 29% of insertions (24). These findings are consistent with data on 5′ truncation of insertions in a cultured cell assay of retrotransposition in which over half were more than 80% truncated (42). Thus, by imposing G418 selection, we are missing 30–50% of the potential insertions. Since this study focuses on the distribution of insertions that exceed 1 kb, we have no data on whether the distributions of very short insertions are different in active versus less active L1 elements.

Our data are consistent with other studies of L1 insertions in cultured cells (6,41,42). Previously, we characterized four insertions of L1.2A in cell culture that were G418R and found that all were under 2.5 kb in length (6). Gilbert et al. found that, using L1.3, 40% of insertions of at least 2329 bp (due to the size of their retrotransposition marker) were longer than 3 kb (41). Furthermore, Symer et al. found that of L1.3 integrants longer than 1 kb, 60% were longer than 2 kb, and that 45% of the latter were longer than 3 kb (42). These data are comparable with our data for L1.3 in which ∼50% of inserts longer than 2 kb are also longer than 3 kb.

Several recent studies have documented an unexpectedly high fraction (∼30%) of full-length L1 insertions among the Ta subfamily of L1 elements (2326). The proportion of full-length elements decreases with increasing age of the L1 subfamily (2224). This is suggestive of a more general correlation between L1 retrotransposition frequency and insertion length. Although substitution of L for S at 1259 of ORF2p reduces both retrotransposition frequency and insertion length, it is unlikely that this substitution accounts in a simple way for the 5′ truncation typical of L1 elements in the genome. This is because nearly all of the truncated insertions over the past 40 million years were generated by L1s with S1259 (44). We speculate that a more active L1 element can generate a higher fraction of full-length copies, and thereby have a greater chance of colonizing the genome. However, it is also possible that highly active L1s are counter-selected due to the increased risk that their insertions will cause damaging mutations (30). One could also argue that there is a bias favoring the detection of full-length Ta1 sequences over the (shorter) period of time that Ta-1s have existed compared with Ta-0s.

To explore the basis for the correlation between insertion length and retrotransposition frequency, we adopted a molecular approach. First, we swapped the RT domains of L1RP and L1.2A, but found little effect on retrotransposition frequency or insertion length (Fig. 4). In support of this, previous studies with Ty1-L1ORF2 fusion constructs in yeast did not reveal an obvious correlation between RT activity and retrotransposition frequency (19,45).

Since most of the difference in insertion length and retrotransposition frequency between L1RP and L1.2A did not appear to reside in the RT domain, we focused on conserved residues because their conservation among active elements may indicate functional importance. This approach is simplistic because it ignores differences in more variable residues (which may, nevertheless, be functionally important) and nucleic acid sequences (which may be important for RNA secondary structure). We narrowed our search to three candidate residues in ORF2p: 363, 1220 and 1259 (Fig. 2).

The R363G substitution in ORF2p contributes to retrotransposition activity and insertion length (Fig. 5). This previously uncharacterized residue and the adjacent amino acid sequence may constitute a novel structural domain in the human L1 element. When the L1 sequences are analyzed in secondary structure prediction programs, specifically COILS (46) and PHD (47), the output consistently indicates the presence of a helically rich region spanning amino acids 313–365. Interestingly, the region from amino acids 313 to 365 is 35% identical to HoxB1 and homology modeling using a HoxB1 template and Modeller (48) has a root mean squared deviation of <1 Å.

The contributions of residues M1220 and S1259 of ORF2p to both retrotransposition frequency and insertion length are both striking and unexpected (Fig. 6). Although both residues lie outside of the RT domain, it is possible that they affect RT function. We speculate that these residues facilitate the interaction of the C-terminal region of ORF2p with L1 RNA during reverse transcription. In order to affect both retrotransposition frequency and insertion length, these residues may promote both the initiation of binding and the continued interaction or anchoring of ORF2p with L1 RNA. Alternatively or in addition, they may stabilize the L1 RNA or possibly an RNA–DNA hybrid structure in conjunction with the zinc knuckle domain of ORF2p (49). Finally, this region could be important in protecting the ORF2p from degradation during the reverse transcription process.

Supplementary Material is available at NAR Online.