dict.md logo

Sequence permutations in the molecular evolution of DNA methyltransferases

DNA methyltransferases (MTases), unlike MTases acting on other substrates, exhibit sequence permutation. Based on the sequential order of the cofactor-binding subdomain, the catalytic subdomain, and the target recognition domain (TRD), several classes of permutants have been proposed. The majority of known DNA MTases fall into the α, β, and γ classes. There is only one member of the ζ class known and no members of the δ and ε classes have been identified to date. Two mechanisms of permutation have been proposed: one involving gene duplication and in-frame fusion, and the other involving inter- and intragenic shuffling of gene segments.

Two novel cases of sequence permutation in DNA MTases implicated in restriction-modification systems have been identified, which suggest that members of the δ and ζ classes (M.MwoI and M.TvoORF1413P, respectively) evolved from β-class MTases. This is the first identification of the δ-class MTase and the second known ζ-class MTase (the first ζ-class member among DNA:m4C and m6A-MTases).

Fragmentation of a DNA MTase gene may result from attack of nucleases, for instance when the RM system invades a new cell. Its reassembly into a functional form, the order of motifs notwithstanding, may be strongly selected for, if the cognate ENase gene remains active and poses a threat to the host's chromosome. The "cut-and-paste" mechanism is proposed for β-δ permutation, which is non-circular and involves relocation of one segment of a gene. The circular β-ζ permutation may be explained both by gene duplication or shuffling of gene fragments. These two mechanisms are not mutually exclusive and probably both played a role in the evolution of permuted DNA MTases.

DNA of prokaryotic and eukaryotic cells and their viruses is often modified by methylation, carried out by S-adenosyl-L-methionine (AdoMet)-dependent DNA methyltransferases (MTases). Since a particular nucleotide sequence may exist in its methylated or unmethylated form, methylation can be regarded as an increase of the information content of DNA, which serves a wide variety of biological functions. In Eukaryota, DNA methylation plays a role in crucial regulatory processes, such as regulation of gene expression, embryonic development, genomic imprinting, and carcinogenesis (reviewed in ref. [1]). In Prokaryota, DNA methylation can be involved in DNA mismatch repair, regulation of gene expression, and control of timing of DNA replication (reviewed in ref. [2]). However, the majority of prokaryotic MTases are paired with a restriction endonuclease of cognate sequence specificity, together forming restriction-modification (RM) systems. RM systems are thought to serve as defense mechanisms that protect the cell against invasion of foreign genetic elements such as phages and plasmids [3]. It has been also suggested that RM systems are maintained in evolution because they participate in generating bacterial diversity by promoting homologous recombination [4] or because they act as as "selfish" genetic elements that undergo extensive horizontal transfer [5]. These three hypotheses are contrasting, but not mutually exclusive.

MTases can be divided into three different groups on the basis of the chemical reactions they catalyze: generating N6-methyladenine (m6A), N4-methylcytosine (m4C), and C5-methylcytosine (m5C). It has been suggested that m4C and m6A MTases (collectively termed "N-MTases") may be more closely related to each other than to m5C MTases [6]. Nevertheless, subsequent analyses showed that the relationships between these groups of proteins are quite complicated and their evolution may have involved several independent conversions of the reaction specificity [7-9]. Amino acid sequence alignments of DNA MTases revealed several conserved motifs, of which I-VIII and X are common to most subfamilies, and a region of essentially higher variability [6,10]. Based on the results of X-ray crystallography of members of all three groups and structure-based multiple sequence alignment, motifs IV-VIII were assigned to the active-site subdomain, motifs X and I-III to the AdoMet-binding subdomain, and the variable region was recognized as a separate domain, implicated in recognition of the target sequence (and termed TRD for target-recognition domain) (reviewed in refs. [11,12]). Structural studies on m5C, m4C, and m6A MTases demonstrated that the TRDs of these proteins are structurally dissimilar and most likely were acquired in independent gene fusion events [11].

DNA MTases have been subdivided into 6 classes (α, β, γ, ζ, and the hypothetical δ and ε ; Figure 1a) according to the possible linear arrangements of three modules: the AdoMet-binding subdomain, the active site subdomain, and the variable TRD [6]. All of the enzymes preserve the same spatial arrangement of the motifs and α, δ, and ε form a circularly permuted set as does β, γ, and ζ (Figure 1b). The majority of DNA N-MTases fall into the α, β, and γ classes, with no bona fide γ-m4C MTases known. M.NgoMXV [13] and its close homolog M.LmoA118I [14] are the only experimentally characterized m4C MTases, whose architecture is very similar to γ-m6A MTases. Nevertheless, these remarkably small MTases (153 aa), as well as their uncharacterized homologs identified in sequence databases, lack both the classical TRD and the region corresponding to motif X and therefore can be regarded as "minimal" members of the family [13]. Another group of small (~175 aa) MTases, which are similar to γ-m6A MTases but lack motif X and the TRD, are the atypical Dam (m6A) MTases encoded by several phages, including HP1, VT-2, and T1 [15].

Most m5C MTases resemble the N-MTases of the γ class, with the only difference being their motif X localized at the C-terminus instead of the N-terminus. However, a few m5C MTases have been described with unusual permutations (Figure 1). To my knowledge, M.BssHII is the only DNA MTase, for which the ζ architecture has been convincingly confirmed by experiment [16] and homology modeling [17]. In addition, a novel class of permuted m5C MTases typified by DRM2 have been recently identified in plants; this prediction has also been supported by threading of the permuted sequence onto the common fold [18]. Some DNA MTases contain terminal extensions and various insertions [6], but it has never been demonstrated that they are related to known or predicted TRDs in other proteins.

DNA MTases are only one of the numerous families of remotely related enzymes that exhibit a common fold [11,19]. Other members of the superfamily methylate a variety of chemically diverse molecules, including various RNAs, proteins, lipids, small molecules, etc. While all members of the superfamily share the same structural core, only DNA MTases vary in the order of conserved motifs. The rest show the order X, I, II, III, IV, V, VI, VII, VIII and only protein-arginine MTases lack the last three motifs [19]. The question thus arises, why do DNA MTases, but not MTases in general, exhibit sequence permutations within the same structural framework?

Two models have been proposed to explain sequence permutations arising during the evolution of N-MTases [9,20]. Jeltsch argued that the process of domain permutation needs duplication and in-frame fusion of the MTase gene, producing one enzyme with two catalytic domains [20]. Subsequent introduction of a new start codon in the middle of the first gene copy and a stop codon at the equivalent position in the second gene copy would then result in a circularly permuted variant. For instance, the ζ- or β-like permutants could arise from a hypothetical tandem γγ-class MTase. This model corresponds to the widely accepted concept that a permuted protein may arise naturally from tandem repeats by extraction of the C-terminal portion of one repeat together with the N-terminal portion of the subsequent repeat, so long as the protein's N and C termini are in close spatial proximity (reviewed in ref. [21]). Although the idea itself offers a plausible explanation for the origin of permutants within many protein families, the only known duplicated m6A MTases are the type IIS enzymes of the αα-class, whose permutation would eventually produce enzymes of the δ or ε class that have not been identified to date. The TRDs of known MTases from β and γ classes are unrelated [22,23], hence it is unlikely that simple conversions "from γγ to β " or "from ββ to γ " have occurred in nature. Furthermore, the N- and C-termini of M.TaqI, the only γ-m6A MTase whose 3D structure is known, are quite distant in space [22]. Still, this scenario may be valid for enzymes that have not been identified yet, or whose sequences have not been studied in enough detail.

It has been hypothesized (ref. [9]) that the permuted DNA MTase variants were generated by intra- or intergenic rearrangements of gene fragments (i.e. "module shuffling"; reviewed in refs. [24,25]) that left no evidence of duplication intermediates. However, only in one case (M.BssHII) has it been possible to reconstruct a possible evolutionary history of shuffled fragments [17]. Moreover, no examples of N-MTases in different classes are known, whose TRDs are markedly similar. Hence, no convincing examples of permutations of an entire N-MTase molecule have been identified to date. The reported permutations in N-MTases concern only the segments within the catalytic domain, while the unrelated TRDs were acquired or evolved independently in distinct classes [20]. Identification of closely related DNA MTases with homologous TRDs that nevertheless lie in different classes might help decide between the two given hypotheses and shed light on how the different classes of DNA MTases arose.

Candidates for N-MTases in the midst of the process of permutation were sought amongst all DNA MTases, whose sequences were available from REBASE [26]. Those sequences exhibiting deviations from the typical spacing between the conserved motifs or unusual extensions at the termini were chosen for detailed analysis. Within some extensions, several known or predicted DNA-binding domains were identified (to be published elsewhere), however only two pairs of MTase sequences were found that fit the criteria of similarity between the AdoMet-binding subdomain, the active site subdomain, and the TRD, without conservation of their co-linearity.

One MTase that emerged from the analysis was the M.MwoI protein, which recognizes an interrupted palindrome GCNNNNNNNGC and methylates one of the cytosines in each strand, generating m4C [27]. M.MwoI (GenBank record 2961238) was earlier classified as a member of the β-class, however it exhibits quite unusual length of 668 aa, which is approximately twice the length of a typical β-class MTase [6], and it lacks the variable insertion between motifs VIII and X, corresponding to the TRD in β-class MTases [9].

BLAST searches and threading analysis (see Methods) revealed that the N-terminal part of M.MwoI (aa 1–270) aligns very well with the catalytic and the AdoMet-binding subdomains of β-class MTases. However, in β-class MTases these subdomains are separated by the TRD, which in M.MwoI is replaced by only a few residues (Figures 2a, 3a). Comprehensive results of the threading analysis of the M.MwoI sequence are available at the URL http://bioinfo.pl/meta/target.pl?id=4139, the alignments of the N-terminal region are essentially identical to those reported previously [9]. Remarkably, The C-terminal region of M.MwoI revealed no similarities to other sequences, with one prominent exception, namely the predicted TRD of the m4C MTase M.SfiI (GenBank entry 2761010) with the BLAST expectation (e) value of 10-3. Further database searches using the sequence of M.SfiI and the isolated fragments of putative TRDs confirmed that the three major subdomains of M.MwoI and M.SfiI exhibit significant sequence similarity, but the linear order of these elements differs between them. If this prediction is correct, M.MwoI should be classified as the first member of the δ-class, rather than the β-class (Figures 1, 2a). It is noteworthy that using either PSI-BLAST or threading, no significant sequence or structural similarities of the TRD of M.MwoI and M.SfiI could be detected to TRDs of other MTases and generally to sequences of other proteins. The additional sequence region of M.MwoI (aa 434–497), which may be regarded as a linker between the N-terminal catalytic domain and the newly identified C-terminal TRD, also showed no matches to any sequences in the database. Hence, the determination of which arrangement of subdomains, that of M.MwoI or M.SfiI, corresponds to the ancestral state must await discovery of their homologs.

M.SfiI recognizes the sequence GGCCNNNNNGGCC, which belongs to a broader set of sequences recognized by the GCNNNNNNNGC-specific M.MwoI. It is not known, how these enzymes recognize such a lengthy sequence with a non-specific spacer. Nonetheless, it can be imagined that the TRD of M.SfiI evolved from the TRD of M.MwoI by acquisition of new contacts to bases outside and inside the GC pair (N->G)GC(N->C)NNNNN(N->G)GC(N->C) or conversely, the stringent DNA-recognition specificity of the M.SfiI-like TRD was relaxed to give rise to the less specific M.MwoI. In the absence of protein-DNA co-crystal structures for the β-class of MTases and lack of suitable structural templates for modeling the TRD structure in M.SfiI and M.MwoI, prediction of the detailed protein-DNA contacts is unfeasible. However, I hope that the finding reported herein will prompt mutagenesis experiments – it is tempting to speculate that swapping the predicted TRDs between M.MwoI and M.SfiI will result in an exchange of specificities.

A second MTase that came from the initial screen was M.TvoORF1413P, interpreted as a member of the γ-class in ReBase [26]http://rebase.neb.com/rebase/enz/M.TvoORF1413P.html, but exhibiting an extension of over 150 aa located N-terminally to motif X instead of the C-terminal extension after motif VIII required by the structure of the γ-class. In a BLAST search initiated with the M.TvoORF1413P sequence, M.ThaI was reported as the best hit, with a highly scored alignment of the AdoMet-binding region (e-value 5*10-14) and a quite poorly scored (0.049) alignment of the catalytic region. However, M.ThaI is a member of the β-class and these two regions of similarity are swapped in the primary sequences of the two MTases (Figures 2b, 3b). BLAST searches initiated with the N- and C-terminal parts of M.TvoORF1413P showed that its C-terminal region scores better (e-value 6*10-4) when aligned with the catalytic domain of another CGCG-specific β-m4C MTase, M.TmaI (GenBank record 4980829), a close relative of M.ThaI (data not shown).

Threading analysis of the M.ThaI sequence revealed its perfect compatibility with the M.RsrI and M.PvuII structures (results are available at http://bioinfo.pl/meta/target.pl?id=4134, allowing homology modeling of the M.ThaI structure (Figures 3b, 4). These results reveal that the TRD of M.ThaI is shorter than TRDs of M.RsrI and M.PvuII, which may be an indication that in this enzyme some DNA-binding residues migrated to other loops [9]. On the other hand, threading of M.TvoORF1413P revealed that its C-terminus corresponds to the last β-strand of the common MTase core (motif VIII; Figure 1), strongly arguing against its assignment to the γ-class, which requires the presence of the TRD C-terminal to this element (threading results are available at http://bioinfo.pl/meta/target.pl?id=4133, http://bioinfo.pl/meta/target.pl?id=4144 and http://bioinfo.pl/meta/target.pl?id=4145 with the three entries corresponding to the full length sequence, the N-terminal part and the C-terminal parts, respectively). Instead, all threading algorithms reported that the N-terminus of M.TvoORF1413P matches perfectly the additional β-strand (motif IX-N) in M.RsrI and M.PvuII and this region, along with the predicted TRD, aligns quite well with motif IX-N and the TRD of M.ThaI (Figure 3b). It is noteworthy, that for the core regions, the predicted secondary structure agreed very well both between M.ThaI and M.TvoORF1413P, and between these MTases and the experimentally determined structures of M.RsrI and M.PvuII (for details see the above mentioned links to MetaServer results). For the 42 N-terminal residues of M.TvoORF1413P no similarity to known sequences or structures could be demonstrated, and modeling based on N-terminally extended threading alignments resulted in misfolded structures; it is therefore possible that this region forms an elaboration of the common fold, which is unique to M.TvoORF1413P.

The MTase activity of M.TvoORF1413P remains to be demonstrated. However, its close homolog has been recently identified, which exhibits a genuine DNA:m4C MTase activity (Drs. M.A. Abdurashitov and S.K. Degtyarev, personal communication). M.BstF5I-4, whose sequence remains unpublished, is evidently homologous to M.TvoORF1413P over the region including the predicted N-terminal TRD, as well as motifs IX-N, X, and I-VIII (BLAST e-value 3*10-20, 26% identical plus 23% conservatively substituted residues; with 60% identical residues in the predicted TRD, i.e. the 10 aa loop preceding motif IX-N; data not shown). It cannot be ruled out that the small TRD of M.ThaI, M.BstF5I-4, and M.TvoORF1413P harbors only a fraction of specificity determinants and that other loops on the catalytic face of the protein contribute to specific DNA recognition. Nevertheless, according to the classical definition of the TRD (the variable region between motifs VIII and X [6,28]), the presented results of sequence analysis and structure prediction suggest that the common ancestor of ζ-class MTases M.BstF5I-4 and M.TvoORF1413P evolved from M.ThaI (β-class member) by sequence permutation.

Sequence analysis resulted in identification of two novel cases of sequence permutation in DNA MTases, and demonstration for the very first time, that DNA:m4C MTases of different classes may exhibit significant sequence similarity not only in the catalytic domain, but also in the TRD. This finding suggests that the analyzed gene pairs diverged relatively recently, permitting a test of the hypothesis that the observed rearrangements occurred according to the "permutation-by-duplication" model [20] or to the alternative model, involving intragenic relocation of gene segments. If sequences resembling fragments of one of the DNA MTase analyzed herein were identified in its own neighborhood, this would provide strong evidence that gene duplication occurred. It would also suggest that this particular MTase is a permuted version of its homolog, whose neighborhood is free from duplicated fragments, rather than the opposite.

Regrettably, the neighborhood of the SfiI and MwoI RM systems is unknown, however the context of ThaI RM system and M.TvoORF1413P can be analyzed using the complete genome sequences of Thermoplasma acidophilum[29] and T. volcanium[30], respectively. The genome sequences of both Thermoplasma species flanking M.ThaI and M.TvoORF1413P (10 000 base pairs in each direction) were compared using the BLAST-family programs at the level of DNA and putative translations in all open reading frames (with stop codons translated as missing characters, e.g. "X"). No evidence of sequences similar to the genes encoding these two MTases were found, except for another putative DNA:m4C MTase M.TvoORF1416P located 2640 bp 5' to M.TvoORF1413P. M.TvoORF1416P is a typical α-class member and exhibits significantly higher similarity to α-MTases such as M.PspGI (BLAST e-value 2*10-73) or M.MvaI (e-value 3*10-67) than to M.TvoORF1413P (insignificant e-value 5.3). Therefore, the two MTases should be regarded as remote homologs of each other and as members of different phylogenetic lineages [9] that met rather accidentally in the T. volcanium chromosome rather than as products of recent duplication of one gene.

The lack of evidence supporting the gene duplication mechanism in the case of MTases from Thermoplasma is not entirely convincing, especially since no direct evidence supporting the alternative "cut-and-paste" mechanism can be provided by sequence analysis. Hence, the events leading to M.SfiI(β)-M.MwoI(δ) and M.ThaI- (β) M.TvoORF1413P (ζ) permutations were reconstructed based on both mechanisms (Figures 5, 6, 7, and 8). For the sake of simplicity, it was assumed that the unique δ and ζ-class members were in these cases generated by permutation of common β-class members, however an analogous reconstruction could be carried out assuming the opposite directionality of rearrangements, leading to similar conclusions.

Figures 5 and 6 shows possible histories leading to both permutations according to the gene duplication mechanism. The order of the sequence motifs remains conserved between M.SfiI and M.MwoI, but these two MTases differ in that a large segment bearing the TRD appears in the middle of the former but in the C-terminus of the latter, hence permutation of these two proteins is not circular (Figure 5). To produce M.MwoI, duplication of M.SfiI would have to be followed by both creation of a novel stop codon to eliminate the region encoding the "new" catalytic subdomain of the C-terminal repeat and deletion of the regions corresponding to the "old" TRD and the "new" AdoMet-binding subdomain. It is quite unlikely that all these changes occurred in a single event, and their occurrence in a series of steps would seem inevitably to produce a nonfunctional intermediate, which would require two steps to regain activity. If these changes did occur gradually, the product of gene duplication, in which one repeat retained the AdoMet-binding subdomain but lost the catalytic subdomain (or conversely), would expose the hydrophobic core of the remaining nonfunctional subdomain to the solvent. Folding and enzymatic function of such "1 & ½" mutant would be probably heavily compromised. The function of M.SfiI and M.MwoI is to protect the chromosome from being cleaved by the cognate ENase. Hence, it seems rather unlikely that the host cell would survive such a series of unlikely events passing through functionally compromised intermediates.

Compared to M.SfiI and M.MwoI, evolution of M.TvoORF1413P from M.ThaI seems more likely (Figure 6), since in this "classical" case of circular permutation requires only removal of the terminal regions by formation of new start and stop codons. However, in this case deletion of terminal subdomains, or their large parts, would also have to be concurrent, otherwise nonfunctional intermediates could arise, leading to cell death due to insufficient protection against the cognate ENase. If only one of the repeats in the original tandem ββ fusion protein is damaged, deletion of the remaining nonfunctional part would most likely restore the highly active, single copy version of the parent MTase. It is noteworthy that the tandem duplication mechanism offers no stage, at which evolutionary pressure would result in optimization of a poorly active intermediate specifically towards the permuted version.

Another problem with the scenario involving tandem ββ MTase fusion is that such fusions have never been reported to occur. In the X-ray structure of the β-class member M.RsrI, the two identical subunits make quite extensive contacts (a loss of 1799.3 Ang**2 of solvent accessible surface area per chain upon complex formation; see URL: http://pdb-browsers.ebi.ac.uk/pdb-bin/macmol.pl?filename=1eg2, suggesting that the dimeric structure of this MTase is biologically relevant [23]. The TRD and the active site reside on opposite sides of the M.RsrI monomer, but the unique dimeric configuration brings the TRD of one subunit near the active site of the other, indicating that dimerization may be required for recognition and methylation to occur. The N- and C-termini of M.RsrI are located close to each other in the monomer (8.8 A), but the C-terminus of one monomer is located on the opposite side of the dimer in respect to the N-terminus of the other monomer, separated by a distance of 74 A in a straight line. If the configuration observed in the crystal structure of M.RsrI is representative for other members of the β-class that use two cooperating MTase domains, covalent joining of the termini of these domains would require a very long linker peptide, looping around the dimer. Hence, tandem fusion seems disrupting for cooperation of two β-class MTases within the dimer.

An alternative "cut-and-paste" mechanism (Figures 7, 8), inspired by the observed genomic rearrangements associated with the presence of restriction endonucleases [5,31], involves generation of a functional gene from fragments. In one scenario (Figures 7 and 8, left panels), the fragments may be generated due to combined action of various endo- and exonucleases that partially degrade the DNA fragment encoding the MTase gene, thereby producing recombingenic ends. If degradation occurs, at least two copies of the MTase gene must be present in the cell in order to reconstruct all important regions. Another scenario (Figures 7 and 8, right panels) involves precise action of a sequence-specific endonuclease, which fortuitously cleaves the MTase gene in the regions corresponding to linkers between the TRD and the two subdomains of the catalytic domain. In this case, one or more copies of the MTase gene may be present in the cell. These two scenarios are not mutually exclusive, provided that the fragments resulting from any type of cleavage or degradation span all regions necessary for the MTase activity, and are able to recombine with each other or use "sticky ends" to ensure ligation.

The "cut-and-paste" mechanism (Figures 7, 8) differs from the "permutation-by-duplication" mechanism (Figures 5, 6) in that it involves a momentary stage at which there are no intact, active MTase gene copies in the cell. The accompanying ENase gene might be fragmented as well, leading to elimination of the RM system from the cell. However, if the ENase remains active for a certain period of time, only those cells survive in which the MTase gene is restored from fragments. Such MTase may exhibit various deletions, duplications of certain regions and rearrangements, as long as these modifications allow the protein to provide protection against the ENase. The selection pressure will result in rapid optimization of the MTase function, most likely to the nearest maximum in the fitness landscape. With a certain probability, the permuted gene copy will arise, and under such "all or nothing" conditions, its sequence will be optimized towards the modification activity sufficient to protect the host's chromosome. It is worth mentioning that in the short term the ENase may remain active and provide selective pressure for restoring expression of the MTase even if its own gene has been destroyed.

A hybrid mechanism can be envisaged in which the complete, fully functional MTase gene undergoes recombination with a fragment of a MTase gene (not shown). This mechanism has limitations similar to that of the "permutation-by-duplication" mechanism in that the newly fused fragment must not compromise the function of the original protein. However, there is no specific reason why a part of the original domain should be deleted with a higher frequency than the new fragment and why the latter scenario should be selected for, unless the alternative fragments are not identical and the new fragment encodes a function, which may increase the fitness of the protein.

The scenarios shown in Fig. 5a require that the TRD can function autonomously, outside of its original structural environment. For some TRDs, at least, this is the case (reviews: [2,12]). Not only is the movement of TRDs within a MTase plausible, but the exchange of TRDs between MTases can provide different specificities and thereby functional advantage [32]. Indeed, DNA:m5C MTases that methylate more than one specific DNA target owing to the presence of several TRDs at various locations of the enzyme have been identified [33,34], and shuffling of TRDs have been suggested to occur among both mono- and multispecific DNA:m5C MTases [35]. However, while the unrelated TRDs of many structurally characterized MTases form structurally autonomous domains, the TRDs of two members of the β-class, M.RsrI [23] and M.PvuII [36], form only an amendment of the common fold, which is quite unlikely to behave as an independently folded and functionally autonomous unit (review: [12]).

The putative TRD of M.TvoORF1413P and M.ThaI is most likely too small and too poorly structured to be regarded as an independent domain (Figures 3b, 4). However, the putative TRD of M.SfiI and M.MwoI is much longer and comprises at least four predicted helices (Figure 3a), therefore it cannot be excluded that it may form an independently folded, functional unit. It has been demonstrated that DNA:m5C MTase M.AquI comprises two independent polypeptides corresponding to the catalytic domain and the TRD, which associate in solution to form a functional enzyme [37]. Correspondingly, one of the scenarios of evolution of the M.MwoI enzyme involves temporary separation of the gene fragments encoding the catalytic domain and the TRD (Figure 7). According to this scenario, the linker region between the catalytic domain and the TRD originated from the initially non-coding sequence that initially separated the two functional units. It seems likely that covalent linkage of the two domains by the newly established linker increased the fitness of the rearranged MTase.

The "cut-and-paste" scenario offers an explanation for the sequence permutations being observed in DNA MTases and not among ENases. This results from the asymmetry of selection for restoring methylation function and restriction function after the corresponding genes are fragmented. In other words, the newly permuted proteins are probably poor enzymes, but the ENase provides strong selective pressure for optimization of the MTase function, while in the second case selective pressure is relatively weak (functional ENases probably provide only a minor selective advantage and they are not required for protection of the host "against" the MTase). However, it cannot be excluded that ENases are simply not amenable to any sequence permutation for structural reasons, since these proteins exhibit different fold than MTases.

Interestingly, sequence permutations have been observed only amongst DNA MTases [11], but not in other MTase families (enzymes acting on RNA, proteins, small molecules, etc.), despite their common structure. To date, no explanation has been offered for this peculiarity, even though it raised considerable interest in the field [11]. It is tempting to speculate that the rearrangements observed amongst DNA MTases but not other MTases are induced [5] by the increased exposure of their genes to the repertoire of various nucleases encoded by different hosts during horizontal transfer events common among the RM systems (J. Elhai, personal communication). High frequency of such events was inferred from sequence analyses [5,38]. If this hypothesis is correct, interaction with the mechanisms of defense against alien genetic elements encoded by various Prokaryotes may be also responsible for permutations of entire domains within type I and type III RM systems and a plethora of combinations of various domains in many "non-classical" RM system subtypes (review: [12]).

Analyses of the evolutionary scenarios presented herein favor the "cut-and-paste" mechanism or the hybrid mechanism (fusion of an intact MTase with the TRD) for the M.SfiI(β)-M.MwoI(δ) rearrangement and the original "permutation-by-duplication" mechanism or the "cut-and-paste" mechanism (rather than their hybrid) for the M.ThaI-(β) M.TvoORF1413P(ζ) rearrangement. Even though in these and probably other cases, certain scenarios may seem more likely than the other, none of them can be ruled out completely. The presented mechanisms are not mutually exclusive, and all have probably played significant roles in the generation of permuted MTases.

The PSI-BLAST algorithm [39] was used to search the non-redundant version of current sequence databases (nr) and the publicly available complete and incomplete genome sequences at the NCBI website http://www.ncbi.nlm.nih.gov. All genuine and putative N-MTase sequences available from REBASE [26] were submitted as queries with default parameters. Protein structure prediction was carried out using the MetaServer available at http://bioinfo.pl/meta/, which combines several secondary structure prediction and threading methods (ref. [40] and references therein). These threading methods compare the query sequence (the target) with a library of structures (templates) and return 10 alignments that scored best according to the implemented criterion of compatibility. The results are evaluated by the Pcons server [41], which compares the models and the associated scores and produces a ranking of potentially best predictions (target-template alignments). Based on the results produced by the MetaServer, homology modeling was carried using the SWISS-MODEL/PROMOD II server [42]. Model evaluation was carried out using the PROSA II [43] program integrated with PROMOD II, suggesting that the stereochemistry and energetic parameters of the models were acceptable.