dict.md logo

Pyrosequencing positions nucleosomes precisely

Determining where nucleosomes are positioned across eukaryotic genomes, and why, has long been one of the major aims of chromatin research. A recent study in Nature [1] has advanced the field greatly by providing the most accurate map of nucleosomes to date. Historically, a range of different strategies has been used to map their positions. Many relied on the ability of nucleosomes to protect DNA from digestion by nucleases. Indeed, digestion with micrococcal nuclease (MNase) was responsible for originally defining the nucleosome as containing 146 bp of DNA [2]. Combining nuclease digestion with Southern blotting proved a powerful way of mapping regions that are hypersensitive to nuclease digestion. Improvements to the resolution of this approach eventually enabled the positions of individual nucleosomes to be mapped [3]. Nucleosome positions could be assigned with even greater accuracy by using the nucleosomal DNA as a substrate for primer extension [4].

More recently, it has become possible to assess nucleosome positioning over large regions using high-density tiled microarrays to identify the undigested protected DNA. This approach has enabled some general features of the nucleosomal landscape to be identified; for example, in budding yeast the region in the vicinity of the transcription start site is depleted of nucleosomes [5,6]. A limitation of this approach has been that the spacing between oligonucleotides on the arrays and the length of the oligonucleotides themselves limit the accuracy with which nucleosome positions can be assigned. All these approaches rely on characterizing chromatin following treatment of the cells to make them permeable to nucleases, with the risk that alterations to chromatin structure could occur during this procedure. In an attempt to avoid this, a range of strategies has been used to map chromatin structure in living cells [7].

An alternative approach is to use bioinformatics to predict nucleosome positions on a genome-wide scale. On the assumption that nucleosome positions are, at least in part, determined by the interactions with the underlying DNA, a relatively small collection of nucleosomes can be used to build a model of nucleotide sequences preferentially occupied by nucleosomes. This approach has the potential to assign nucleosome positions with 1-bp resolution. It has proved particularly informative to look at the distribution of individual dinucleotide steps around the nucleosome. It turns out that in nucleosomes isolated both from native chromatin and from in vitro reconstitution assays, characteristic dinucleotide steps occur with an approximately 10-bp periodicity at specific positions of the nucleosome core particle (NCP) DNA [8,9]. On the basis of such a model it has been possible to predict nucleosome organization over the entire yeast genome, and the predictions in many instances appear to coincide with experimentally determined positions. In certain regions of the genome, however, nucleosomes are actively being displaced from sequences with the most favorable nucleosome-DNA interactions [10]. Such changes cannot be predicted by current models, meaning that there is still a requirement to measure nucleosome occupancy experimentally.

In a paper recently published in Nature, Albert et al. [1] have overcome the loss of resolution attributable to the length and spacing of oligonucleotides on arrays by directly sequencing the products of a MNase digestion of Saccharomyces cerevisiae chromatin, which enables the ends of each MNase-trimmed fragment to be identified with base-pair precision. This approach also involves sequencing very large numbers of nucleosomal fragments, which is greatly facilitated by the use of the recently developed parallel pyrosequencing technology [11] (Figure 1). To concentrate the search on functionally important regions of the genome, the authors determined the positions of nucleosomes containing the histone variant H2A.Z, a histone typically enriched in nucleosomes at promoter regions.

Albert et al. [1] purified nucleosomes from the mixture of nucleosomes generated in a yeast strain in which the gene htz1, which encodes H2A.Z, had been epitope-tagged. Chromatin fragments containing H2A.Z nucleosomes were purified by chromatin immunoprecipitation and mononucleosomal DNA was isolated from the H2A.Z-enriched chromatin by gel purification. Altogether, 322,000 reads of DNA from nucleosomes enriched in the histone variant H2A.Z were made.

As H2A.Z-enriched nucleosomes represent a small proportion of all nucleosomes, it proved possible to obtain many reads for most of the highly enriched nucleosomes [1]. This would be expected to result in a series of sequence reads for each nucleosome with a normal distribution about a predominant location. Sequence reads of the most heavily H2A.Z-enriched nucleosomes do indeed fit well to a normal distribution. Further confidence in the assignment of these positions can be gained from the observation that the centers of these distributions match up well when the data obtained from both DNA strands are compared: the observed median error is 4 bp.

Another important point to bear in mind is that even in a homogeneous population of yeast cells, the positions of nucleosomes are likely to vary from cell to cell. Although a single type of nucleosome organization may dominate in some parts of the genome, in others, several mutually exclusive organizations with equally favorable nucleosome-DNA interactions may be possible. Albert et al. [1] used the enhanced resolution of their technique to address this variability in nucleosome position. Decomposing their raw data with narrower normal distributions, they obtained evidence for several translational settings (that is, positions along the DNA) of an individual H2A.Z nucleosome that typically occur with a 10-bp periodicity, which corresponds to a full turn of the DNA double helix. This is in accord with the concept of rotational phasing, whereby the same face of the DNA double helix makes contact with binding sites that are regularly spaced around the surface of histone octamers.

Even the new mapping may have reached a limit here: the resolution of this approach depends not only on the accuracy of the sequencing step but also on the sensitivity with which MNase detects the edges of nucleosomes. MNase has some sequence specificity itself, and to reduce the influence of artifacts attributable to this, an algorithm can be applied. A more substantial problem, however, is that the boundaries of nucleosomes are inherently dynamic. The outer turns of DNA are prone to transient dissociation via a process known as site exposure [12,13]. In addition, nucleosome-binding proteins may associate with nucleosomes to increase the amount of DNA protected to more than the 146 bp of the canonical nucleosome. These factors are likely to contribute to the relatively broad distribution of fragment lengths that result from genomic MNase digestion (see Supplementary Figure 2b of [1]), which in turn makes it difficult to assign the location of the central base pair of the nucleosomal DNA, especially with the relatively short sequence reads used in this study.

The availability of nucleosome-positioning information at this resolution provides an opportunity to search for DNA sequences that are preferentially incorporated into H2A.Z-containing nucleosomes. Albert et al. [1] observed that AA/TT and GC dinucleotides are distributed around the 8,000 best-positioned nucleosomes with a striking 10-bp periodicity. This distribution is similar to that previously observed both in vitro and in vivo [8,9]. A difference from previously observed patterns is that a deficiency of AT dinucleotides was detected 3-4 bp from the nucleosome border [1]. Further studies will be required to determine whether this has been picked up as a result of the increased resolution of this approach or whether it is a unique feature of H2A.Z-containing nucleosomes. It is likely, however, that the high resolution of this new approach will provide an opportunity to gain new insights into the rules that determine which DNA sequences favor nucleosome occupancy.

The limitations notwithstanding, the power of the current sequencing technology is breathtaking. The genome-wide map of H2A.Z-enriched nucleosomes generated by Albert et al. [1] is available in a highly accessible format via the authors' website [14]. It has been possible to assign trends in the location of H2A.Z-containing nucleosomes at a number of different classes of genes [1]. At the majority of genes transcribed by RNA polymerase II, H2A.Z-enriched nucleosomes were [1] detected immediately downstream of the start site for transcription, beyond the nucleosome-depleted promoter region, with lower levels in the coding region. A similar arrangement was observed at genes transcribed by RNA polymerase I, but differed subtly at TATA-containing genes transcribed by polymerase II and at genes transcribed by polymerase III. Some of these observations can be considered extensions to previous studies of H2A.Z nucleosome localization [15-19], but with superior spatial resolution. In future, it may be equally important to assess the duration of H2A.Z nucleosome occupancy. Recent observations suggest that a subset of nucleosomes undergo rapid cycles of assembly and removal and that these include nucleosomes enriched in H2A.Z [20,21]. Indeed, the dynamic nature of H2A.Z nucleosomes may underlie some of the discrepancies among previous studies.

While the sequencing technology will no doubt be pushed further, even at its current level it offers an exciting new way to study the distribution of DNA sequences across genomes. As it becomes cheaper and more widely available, there seems no reason why it should not offer an alternative to high-density microarrays. While Albert et al. [1] used parallel sequencing for nucleosome mapping, the same approach could be used to monitor mRNA levels, to sequence small RNAs [22] or to sequence the products of chromatin immunoprecipitations in higher eukaryotes as well as in yeast. The dawn of the pyrosequencing era could represent the beginning of the end for high-density DNA arrays.