dict.md logo

Linking gene regulation and the exo-metabolome: A comparative transcriptomics approach to identify genes that impact on the production of volatile aroma compounds in yeast

For this purpose, the gene expression levels of five different industrial wine yeast strains that produce divergent aroma profiles were established at three different time points of alcoholic fermentation in synthetic wine must. A matrix of gene expression data was generated and integrated with the concentrations of volatile aroma compounds measured at the same time points. This relatively unbiased approach to the study of volatile aroma compounds enabled us to identify candidate genes for aroma profile modification. Five of these genes, namely YMR210W, BAT1, AAD10, AAD14 and ACS1 were selected for overexpression in commercial wine yeast, VIN13. Analysis of the data show a statistically significant correlation between the changes in the exo-metabome of the overexpressing strains and the changes that were predicted based on the unbiased alignment of transcriptomic and exo-metabolomic data.

The data suggest that a comparative transcriptomics and metabolomics approach can be used to identify the metabolic impacts of the expression of individual genes in complex systems, and the amenability of transcriptomic data to direct applications of biotechnological relevance.

Commercial wine yeast strains have been selected to meet specific requirements of wine producers with regard to phenotypical traits such as fermentation performance, general stress resistance, the profile of aromatic compounds produced, the ability to release enzymes or mannoproteins of oenological relevance and many more [1]. As a result, more than 200 different yeast strains, almost exclusively of the species Saccharomyces cerevisiae are currently produced and sold in the global industry. Many research and development programs have focused on improving specific aspects of wine yeast strains [1]. However, many of the relevant traits are of a polygenic nature, and our understanding of the genetic and molecular regulation of complex, commercially relevant phenotypes is limited [2]. In this paper, we investigate the possibility of using a holistic systems biology approach to identify genes that impact on volatile aroma compound production during fermentation. The approach is based on combining comparative transcriptomics and aroma metabolomics of five commercial wine yeast strains that produce significantly different aroma profiles.

During alcoholic fermentation, Saccharomyces cerevisiae strains convert sugars to ethanol, but also produce a large number of volatile aroma compounds, including fatty acids, higher alcohols and esters (table 1). Many of these compounds are important flavor and aroma compounds in wine and beer, and different strains of S. cerevisiae are well known to impart significantly different aroma profiles to the final product.

The metabolic pathways responsible for the production of these compounds are responsive to many factors including the availability of precursors, different types of stress, the cellular redox potential and the energy status of the cell [3-11]. These pathways are not linear, but rather form a network of interlinked reactions converging and diverging from shared intermediates (figure 1). Moreover, intermediates are not only shared between the different 'branches' of aroma compound production, but also with other pathways related to fatty acid metabolism, glycolysis, stress tolerance and detoxification to name a few.

Most of the genes encoding the enzyme activities of the aroma network are also co-regulated by transcription factors that are related to total nitrogen and amino acid availability [12]. Thus the nutritional status of the cell as well as the nutrient composition of the growth media throughout fermentation plays a vital role in determining the aroma profile produced by the fermenting yeast. A further complication is due to the fact that very little is known about the kinetics of individual enzymes involved in these pathways. What is clear is that a number of these enzymes are capable of catalyzing both the forward and reverse reactions, depending on the ratios of substrates to end products, as well as the prevailing redox balance of the cell [13-15]. The various dehydrogenase- catalyzed reactions which are integral to most branches of aroma production are particularly sensitive to the ratios of enzyme co-factors such as NAD and NADH, with obvious ramifications regarding the directionality of various key reactions [16]. This intricate lattice of chemical and biological interactions makes interpretation of individual gene and enzyme contributions problematic in the context of aroma compound production as a whole (figure 1). Indeed, individual parts of the system can combine and interact in unexpected ways, giving rise to emergent properties or functions that would not be anticipated by studying a single part of the system. Such systems are thus irreducible, and cannot be understood by dissection and analysis of a single part at a time. In recognition of the complex and intricate nature of this process we have sought to follow an 'omic' approach in the study of aroma compound production.

In the present study our goal was to compare the aroma-relevant exo-metabolomes of five industrial yeast strains at three different stages of fermentation, and to align these data with gene expression data obtained through microarray-based genome-wide transcription analysis. This enabled the incorporation of gene expression levels and aroma compound production into multivariate statistical models. By using these models as a predictive tool various genes were identified as potential candidates for overexpression in order to increase/decrease the levels of key aroma compounds during fermentation. To verify whether genes whose differential regulation appeared most strongly linked to the differences observed in the aroma profiles of different strains were indeed impacting on aroma compound metabolism, five of these genes were individually overexpressed in one of the industrial strains. The data indicate that these genes indeed impacted significantly on the aroma profiles produced by the modified strains. Moreover, the pattern of changes observed was significantly correlated to the pattern predicted through the comparative analysis of transcriptome and metabolome. The data therefore clearly support our hypothesis that direct comparative analysis of transcriptomes and metabolomes can be used for the identification of genes that affect specific metabolic networks and for predicting the impact of the expression of such genes on these networks.

Fermentation behaviour of all five strains in our conditions followed typical wine fermentation patterns. All five strains fermented the synthetic must to dryness within the monitored period, broadly followed similar growth patterns (figure 2) and showed similar rates of fructose and glucose utilization as well as ethanol and glycerol production (figure 3). This is to be expected, as all five strains are widely used in the wine industry and are optimized for fermentation performance.

On the other hand, the strains did show significant variability regarding the volatile organoleptic compounds produced during fermentation (tables 2, 3, 4), suggesting that these 'secondary' pathways of higher alcohol and ester production are less conserved between different strains.

In general, the aroma compounds produced all showed a steady increase in concentration in the synthetic must over time, although the most active period of aroma compound accumulation appears to be in the earlier stages of fermentation. For the most part, compounds such as methanol, isoamyl alcohol, butanol, ethyl caprylate are only detectable in the fermentation media by day 5 of fermentation (table 3), whereas others such as diethyl succinate can only be detected at the end of fermentation (table 4). In general, the higher alcohols and their corresponding esters are present throughout fermentation at the highest concentration in the medium (tables 2, 3, 4). The aroma profiles of the DV10 and EC1118 strain are very similar, while the BM45 and 285 strains also produce similar exometabolomic signatures. The aroma compounds that are proportionally the most variable between strains are propanol, isobutanol, ethyl caprylate, acetic acid, propionic acid, butyric acid, ethyl caprate, diethyl succinate, valeric acid, 2-phenylethyl acetate, octanoic- and decanoic acid, as well as ethyl lactate, which is completely absent in the BM45 and 285 strains (table 4).

The divergent aroma profiles of the different strains were mirrored by variable gene expression patterns. Since the Affymetrix DNA chips used for the analysis were designed based on the sequence of the laboratory yeast BY4742, a primary concern related to the quality of the microarray data. Both the internal controls and the expression of housekeeping genes were in keeping with international MIAME compliancy standards. Most notably, variation between independent biological repeats was negligible, giving us confidence in the reliability and reproducibility of our microarray analysis. Furthermore, changes in gene expression during the course of fermentation matched up well to data from related microarray analysis for the EC1118 [17] and VIN13 strains [18].

Between different time points approximately 1000–1500 genes significantly increased or decreased in expression (within the criteria specified in the materials and methods section) for the five yeast strains in our study. At the time points considered, the variation in gene expression between the different strains was in the range of about 50–400 transcripts. Strains that appear to be most similar to one another on a gene expression level were the EC1118 and DV10 strains, as well as the BM45 and 285 strains. The VIN13 strain was least similar to any of the other four strains. This pattern is in line with the differences observed in aroma production for all of these strains.

Numerous and substantial changes in the expression of genes involved in pathways that lead to the production of volatile aroma compounds were evident both between strains at comparable stages of fermentation and for individual strains at different fermentative stages. To identify relevant transcriptional variation in the context of aroma compound production, PCA analysis and PLS1 and PLS2 models were constructed for the compounds in tables 2, 3, 4 using the transcriptomic data as X variables. Transcriptomic data from days 2 and 5 were used for modeling purposes as these time points represent the period when the accumulation rate of most aroma compounds is at a maximum. From these models, transcripts with a strong positive or negative loading were selected for further in depth statistical analysis. The corresponding ORFs, together with a brief annotation, are listed in the additional data files [see Additional data file 1].

The general intrastrain trend revealed a decrease in the transcript levels of enzymes involved in the synthesis of aromatic and branched-chain amino acids, while transcript levels encoding aldehyde and alcohol dehydrogenases, as well as certain acetyltransferases were generally increased. Fold changes for differentially expressed transcripts, both between different strains at either day 2 or day 5 of fermentation and between day 2 and day 5 in individual strains, can be viewed as additional material [see Additional data file 2].

Figure 4 shows a PLS2 plot which depicts the variation/relationships between all the measured aroma compounds as well as the 70 genes selected for multivariate modeling purposes. These genes were selected due to their varying expression levels between different strains as well as different time points during fermentation. Also, we selected genes whose annotation suggested that they may have a role in aroma compound production, such as enzymes whose sequence suggests a role in redox reactions, central carbon metabolism, and amino acid uptake and metabolism (GO and MIPS classification).

The X-Y scores and loading plots (figure 4) are clearly useful in representing the overall 'structure' of the entire dataset, and are pointing out possible connections between specific compounds/groups of compounds and certain genes. Likewise, scores plots proved a neat way of validating the general design and data generated by our experimental setup/process (figure 5). The samples of independent biological repeats for each of the 5 strains group together closely at both time points. All five strains also clearly segregate into two clusters based on the stage (time point) of fermentation. For example, in the first frame it is clear that the stage of fermentation is the major source of variation (PC1) and strain identity is the source of the second-greatest explained variation (PC2), while this pattern is reversed in frame B.

Of the 22 volatile aroma compound measured in this study, 13 were amenable to PLS1 modeling (using transcriptome data) based on our selected criteria for model validation (slope > 0.8; Y-var explained > 75%). The details of these models are summarized in a table that can be viewed as additional material [see Additional data file 3].

Of the genes listed in the tables presented in the supplementary material, five were chosen for in-depth analysis due to their significant contributions to the respective prediction models for several of the important higher alcohols and esters, as well as their amenability to easy cloning and vector construction. These genes were BAT1, AAD10, AAD14, ACS1 and YMR210W. AAD10 and AAD14 encode aryl alcohol dehydrogenases which are believed to be responsible for the putative role of degrading the complex aromatic compounds in grape must into their corresponding higher alcohols [7]. BAT1 encodes a mitochondrial branched-chain amino acid aminotransferase that is involved in catalyzing the first transamination step of the catabolic formation of fusel alcohols via the Ehrlich pathway [19]. The YMR210 gene codes for a putative acyltransferase enzyme (similar to EEB1 and EHT1) and is believed to play a role in medium-chain fatty acid ethyl ester biosynthesis. Lastly, the ACS1 gene (encoding an acetyl-coA synthetase isoform) codes for the enzyme responsible for the conversion of acetate to acetyl-coA, which is an intermediate or reactant in several of the aroma compound producing pathways [20].

An in-house BAT1 overexpressing strain was already available for use [21]. For the other 4 genes, a multi-copy overexpression plasmid-based cloning strategy was employed to allow for maximum gene expression and rapid characterization of the transformed VIN13 strains.

Fermentations were carried out as before with the 5 transformed cell lines and a VIN13 control. Samples for HPLC and GC-FID analysis were taken at the same time points, namely days 2, 5 and 14 of fermentation. No significant differences were observed regarding the glucose and fructose utilization of the overexpression strains during fermentation (Data not shown). Slight differences were found for ethanol production, while some changes in glycerol production were evident for the different strains (Figure 6).

Figure 7 depicts the aroma compound concentrations at the end of fermentation (day 14) only, as this is the most important time point from an enological perspective. Aroma profiles for days 2 and 5 can be viewed as additional data [see Additional data file 4].

Four of the five overexpressing strains showed significant changes in the aroma profiles produced at the end of fermentation. Only the YMR210W overexpressing strain did not show any changes, and is therefore not included in the figures below. We did not further investigate whether this absence of changes in aroma production is due to problems with the expression construct or reflects the absence of aroma–related activity of the gene product.

Significant differences were evident in the aroma profiles of the four transformed yeast strains under consideration. We investigated whether the observed changes in aroma compound concentrations at the end of fermentation can be reconciled with the anticipated changes based on multivariate prediction models. Figure 8 represents the qualitative alignment of real vs. predicted changes in aroma compound concentrations. Only aroma compounds with statistically reliable PLS models (test-set validation; slope >0.88; % RMSEP < 20) were taken into consideration. The dashed lines indicate the relative loading weights of each of the four genes (for each of the aroma compound models represented by the plot axes). The solid lines in the figures represent the log ratios of the actual aroma compound concentrations normalized to the VIN13 concentrations of the particular compound.

To clarify, the predicted influence of a given gene on a particular compound is represented on a scale from -1 to +1, based on statistical projections related to PLS loading weights. On this scale a value of -1 suggests a strong probability of significant concentration decreases of a given compound (for overexpression of the gene), while a value of +1 is indicative of a strong positive correlation between the expression levels of the gene of interest and the compound in question. A value of zero indicates no expected influence of gene expression on the relevant aroma compound.

Likewise, log-normalization was carried out on the actual metabolite concentrations measured in the overexpression strains to represent these values on a scale from -1 to 1, relative to the corresponding concentrations of the control fermentations. Figure 8 clearly shows that predicted and real changes overlapped significantly.

The aim of this study was to determine whether the transcription profiles of the various strains during fermentation could be reconciled with the volatile aroma compound production of these strains, and whether this comparative analysis could be used to predict the impact of individual gene expression levels on aroma compounds and profiles.

The data generated by the overexpression of four of the genes whose expression was statistically most significantly linked to the production of aroma profiles suggest that this approach has been successful. Indeed, overexpression of the selected genes had a far reaching impact on the aroma profiles produced by the fermenting yeast, and this impact was generally well aligned with the impact predicted from the comparative omics analysis. Indeed, the data aligned better than we, considering the significant challenges when approaching complex systems, had expected. Our data show that the metabolic changes observed upon overexpression of three of the four genes, AAD10, AAD14 and BAT1, were very significantly aligned with the changes that were predicted from the alignment of transcriptome and metabolome data alone. The predictions, as can be seen from the alignment of predicted vs. observed changes in metabolite levels in a qualitative manner, indeed proved fairly reliable. The model was able to assign positive and negative influences on a particular compound with relative accuracy. Although the extent/magnitude of the increase/decrease is not always well aligned with model values, the absolute direction of the change holds true in most cases. An absolute alignment would not be expected, since the level of expression in a plasmid-based system can not be adjusted to the differences of expression observed between the different strains. In the case of AAD10, only the influence of the overexpression on decanoic acid was not in line with the projection. Predictions for AAD14 and BAT1 were well matched with the observed changes in metabolite profiles. Predicted and real changes did not match satisfactorily in only one case, ACS1. Nevertheless, even in this case, eight out of the thirteen compounds evolved in the predicted direction. It should also be noted that the expression of this gene had generally a less severe impact on changes in the aroma profile than those of the other three genes.

Considering the complexity of the system, the rate of success achieved in this study can be considered as highly significant. To our knowledge, this is the first report to exploit such an intra- and interstrain comparative approach to identify genes that play a significant role in a complex metabolic network.

While we were clearly able to identify genes with significant impact on aroma compound production in a specific industrial environment, and which in some cases had not been previously directly linked to these pathways, the data do not allow a firm conclusion on the exact metabolic role of these genes. Indeed, the vast number of significant changes to metabolite levels makes it difficult to identify the specific 'point of influence' of any overexpressed gene in a given pathway.

The increases/decreases in specific volatile compounds seen for the VIN13(pBAT1-s) strain is in keeping with the results reported in colombar fermentations [21]. The two AAD gene overexpressing strains also showed interesting trends: Both strains produced higher levels (at comparable concentrations) of isoamyl alcohol, ethyl acetate, butanol, ethyl caprylate, ethyl caprate and hexanoic acid. However, noticeable differences can be seen in the levels of isobutanol, 2-phenyl ethanol, propionic acid, isoamyl acetate, ethyl hexanoate, isobutyric acid and isovaleric acid, relative to the control and to one another. This is indicative of the potential for the AAD genes to have overlapping yet distinct functional roles in the pathways leading to higher alcohol and ester production.

Overexpression of the ACS1 gene did not lead to such numerous and substantial increases/decreases in volatile production as was the case for the other three genes. Interestingly, valeric and isovaleric acid were below detection levels in these fermentations. Concentrations of isoamyl acetate, ethyl acetate, butanol and butyric acid were significantly higher, and ethyl caprate lower relative to control fermentations.

On the whole though, our analysis shows that the cross-comparison of gene expression data with metabolite levels has the potential to identify points of interest on a genomic scale. This also opens new possibilities to design improved yeast enhancement strategies for optimized aroma production and fermentation performance.

Many other genes showed significant variation in expression between different strains and/or time points, as well as high loadings on PLS models and strong negative or positive correlations with specific aroma compounds. These genes encode enzymes that either are known to participate in aroma compound production, or have activities (either experimentally proven or suggested through sequence alignments) that could suggest such roles. Here we discuss some of the most relevant of these enzymes, which fall into several categories, either according to their place in a specific metabolic pathway such as the metabolisms of branched chain amino acids or of aromatic amino acids, or based on their specific activity such as dehydrogenases (in particular aldehyde and alcohol dehydrogenases) and acetyl transferases.

Of the enzymes involved in branched chain amino acid metabolism, BAT1 has been discussed above. Other genes that encode enzymes in this pathway and that were identified in our study for their strong statistical link between expression levels and the production of specific compounds include LEU2, encoding a beta-isopropylmalate dehydrogenase that catalyzes the third step in the leucine biosynthesis pathway, and, to a lesser degree, LEU1, which encodes an isopropylmalate isomerase [22,23]. Both of these genes showed a significant statistical correlation with compounds such as isobutanol. Of the genes involved in the metabolism of isoleucine and valine (Ilv), only ILV5, which encodes an acetohydroxyacid reductoisomerase involved in branched-chain amino acid biosynthesis [24], showed a very strong positive correlation with almost all of the compounds analysed here, and, interestingly, a negative correlation with ethanol, suggesting that this gene could be an interesting target for metabolic engineering.

While BAT1 expression showed a significant positive correlation with a large number of the volatile compounds measured in our study, the cytosolic isoform (BAT2) of this enzyme showed no significant correlations with any of these aroma compounds. Although this isoform is supposedly highly expressed during stationary phase and repressed during the logarithmic phase, BAT2 expression levels in our study were found to stay constant, if not to decrease slightly upon entry into stationary phase in comparison to the exponential phase at day 2. In addition, BAT2 expression levels were generally considerably lower throughout fermentation when compared to BAT1.

Of the genes involved in aromatic amino acid metabolism, three, ARO1, which encodes a pentafunctional arom protein, ARO7, which encodes a chorismate mutase responsible for the conversion of chorismate to prephenate and ARO8, which codes for an aromatic aminotransferase showed statistically significant correlations between expression levels and metabolite production [25,26]. All three genes showed a modest positive correlation (r2 = 0.7) with 2-phenyl ethanol and mild negative correlations with all the other compounds. Only octanoic acid showed a very strong (r2 = 0.82) negative correlation with ARO8 expression at day 2 of fermentation. Despite its seemingly crucial role, ARO10, which encodes a phenylpyruvate decarboxylase corresponding to the first specific step in the Ehrlich pathway did not show any noteworthy correlations between its expression and any of the volatile compounds in our study [27]. Of course the possibility of translational or post-translational control of activity cannot be excluded.

Several specific enzyme activities were also overrepresented in our list. Such enzymes include many dehydrogensases. Aldehyde and alcohol dehydrogenases such as those encoded by ALD5, ALD6, ADH6 and ADH7 showed a substantial decline in expression levels between days 2 and 5 of fermentation, while others (such as ALD3, ALD4, ADH2 and ADH5) increased during this time. The distinct expression patterns during fermentation reflects the different regulatory mechanisms governing the expression of these genes (i.e. expression of ALD3 is glucose-repressed and stress-induced) and suggests that the different ALD gene products have specific roles during different stages of fermentation [28].

ALD4 and ALD5 (mitochondrial), and ALD3 and ALD6 (cytoplasmic) encode aldehyde dehydrogenases involved in the conversion of acetaldehyde to acetate [29].

ALD4 encodes a mitochondrial aldehyde dehydrogenase (utilizing NADP+ or NAD+) that is required for growth on ethanol and conversion of acetaldehyde to acetate [29]. Expression of ALD4 is also glucose repressed, and increases 2–4 -fold from day 2 to 5 of fermentation. ALD4 expression shows a very strong correlation to the amount of hexyl acetate (R2 = 0.82) produced by the fermenting yeast, as well as to ethyl acetate (0.77), isoamyl alcohol (0.91) and isoamyl acetate (0.85).

ALD6 encodes a constitutively expressed cytosolic aldehyde dehydrogenase (utilizes NADP+ as the preferred coenzyme) and is required for conversion of acetaldehyde to acetate [30]. Not surprisingly, ALD6 expression showed a very strong positive correlation to the levels of acetic acid produced by the fermenting cells (0.92). Also, expression was very strongly inversely correlated to ethanol production (R2 = 0.81). Interestingly, fairly strong positive correlations were also evident for 2-phenyl ethanol (R2 = 0.79) and 2-phenyl ethyl acetate (R2 = 0.67).

ADH6 encodes an NADPH-dependent cinnamyl alcohol dehydrogenase family member with broad substrate specificity [31]. Expression was correlated very strongly with isobutanol levels (0.81), isobutyric acid (0.86), propionic acid (0.81), acetic acid (0.87) and 2-phenyl ethanol (0.92). ADH4, ADH5 and ADH7 on the other hand showed only modest correlations with the above-mentioned, or any other aroma compounds for that matter.

With respect to the aryl alcohol dehydrogenase family of genes, the transcripts for AAD3, AAD10 and AAD14 showed the greatest variation in expression, both on an intra- and interstrain level. Expression of AAD10 and AAD14, for example, was increased more than twofold in most of the strains at day 5 relative to day 2 of fermentation. No distinct physiological role has been established for the products of these genes [7], but it is reasonable to suspect that the consistent increase in their respective transcript levels during the course of fermentation could be associated with the increase in one or several of the long chain alcohols or their acid counterparts as fermentation progresses (tables 2, 3).

This hypothesis is supported by the data generated through the overexpression of these genes. Indeed, overexpression yielded changes to the aroma profile that were very similar to those predicted from the alignment of transcriptome and metabolome data sets. The expression of AAD10 showed weak yet significant positive correlations with a number of the aroma compounds. Expression of AAD14 between different strains and time points was also highly variable. Highest expression levels were noted for the DV10 strain, and significant positive correlations with ethyl acetate (0.67) and ethyl caprate (0.74) were observed for this gene.

Acetyl transferases are another family of enzymes of relevance to aroma compound metabolism [32]. However, neither ATF1 nor ATF2, the two most prominent alcohol acetyl transferases, showed statistically strong correlations between expression levels and metabolite production. EEB1, on the other hand, which encodes an acyl-coenzymeA:ethanol O-acyltransferase and is responsible for the major part of medium-chain fatty acid ethyl ester biosynthesis during fermentation [33], showed weak negative correlations with ethanol and other higher alcohols, and a strong positive correlation for 2-phenylethyl acetate (0.9) as well as octanoic acid (0.78). It is tempting to speculate that Eeb1p may thus be largely responsible for the acetylation of 2-phenyl ethanol to produce 2-phenylethyl acetate.

EHT1 encodes an acyl-coenzymeA:ethanol O-acyltransferase that plays a role in medium-chain fatty acid ethyl ester biosynthesis, but also contains a known esterase activity [33]. EHT1 expression increased somewhat as fermentation progressed and inter-strain expression at both day 2 and 5 of fermentation varied significantly. Interestingly, EHT1 expression showed a fairly strong inverse correlated with 2-phenylethyl acetate (R2 = 0.74) and octanoic acid (R2 = 0.75), as well as a weaker yet significant inverse correlation with decanoic acid (R2 = 0.59). This could indicate that the esterase activity of Eht1p could predominate under certain conditions.

YMR210W encodes a putative acyltransferase with similarity to both Eeb1p and Eht1p, and may have a minor role in medium-chain fatty acid ethyl ester biosynthesis [33]. Expression was positively correlated with ethyl acetate (0.74), ethyl caprylate (0.85) and isoamyl acetate (0.78).

In addition to these relatively well studied acetyltransferases, the mRNA levels of the AYT1 gene, encoding a transferase of unknown substrate specificity, also showed considerable variation at different fermentative stages [34].

The impact of these individual genes on aroma compound metabolism has to be assessed individually. However, from the data presented here, it is clear that an analysis based on the comparison of transcriptome and metabolome data derived from different commercial yeast strains can help to identify genes that most significantly impact a metabolic network in specific environmental and industrial conditions. Our over-expression analysis of five genes that were randomly selected from the list of ORFs identified for their statistically significant impact on aroma production also clearly suggests that the method has significant predictive power regarding the reorientation of metabolic flux through the network in response to changes in gene expression levels. Indeed, for four out of five selected genes, BAT1, AAD10, AAD14 and ACS1, the match between predicted and real changes is highly significant. This is the first study linking metabolic networks to transcriptome analysis through the comparative analysis of different wine yeast strains.

The yeast strains used in this study are listed in table 5. All are diploid Saccharomyces cerevisiae strains used in industrial wine fermentations. Yeast cells were cultivated at 30°C in YPD synthetic media 1% yeast extract (Biolab, South Africa), 2% peptone (Fluka, Germany), 2% glucose (Sigma, Germany). Solid medium was supplemented with 2% agar (Biolab, South Africa).

Fermentation experiments were carried out with synthetic must MS300 which approximates to a natural grape must as previously described [35]. The medium contained 125 g/L glucose and 125 g/L fructose, and the pH was buffered at 3.3 with NaOH.

All fermentations were carried out under microaerophilic conditions in 100 ml glass bottles (containing 80 ml of the medium) sealed with rubber stoppers with a CO2 outlet. The fermentation temperature was approximately 22°C and no continuous stirring was performed during the course of the fermentation. Fermentation bottles were inoculated with YPD cultures in the logarithmic growth phase (around OD600 = 1) to an OD600 of 0.1 (i.e. a final cell density of approximately 106 cfu.ml-1). The cells from the YPD pre-cultures were briefly centrifuged and resuspended in MS300 to avoid carryover of YPD to the fermentation media. The fermentations followed a time course of 14 days and the bottles were weighed daily to assess CO2 release and the progress of fermentation. Samples of the fermentation media and cells were taken at days 2, 5 and 14 as representative of the exponential, early stationary and late stationary growth phases respectively. It should be stressed that early stationary phase in these conditions is metabolically active, since growth arrest is due to ethanol toxicity. Sugar levels and fermentative activity are still high at this stage.

Cell proliferation (i.e. growth) was determined spectrophotometrically (PowerwaveX, Bio-Tek Instruments) by measuring the optical density (at 600 nm) of 200 μl samples of the suspensions over the 14 day experimental period.

Culture supernatants were obtained from the cell-free upper layers of the fermentation media. For the purposes of glucose determination and carbon recovery, culture supernatants and starting media were analyzed by high performance liquid chromatography (HPLC) on an AMINEX HPX-87H ion exchange column using 5 mM H2SO4 as the mobile phase. Agilent RID and UV detectors were used in tandem for peak detection and quantification. Analysis was carried out using the HPChemstation software package.

Each 5 ml sample of synthetic must taken during fermentation was spiked with an internal standard of 4-methyl-2-pentanol to a final concentration of 10 mg.l-1. To each of these samples 1 ml of solvent (diethyl ether) was added and the tubes sonicated for 5 minutes. The top layer in each tube was separated by centrifugation at 3000 rpm for 5 minutes and the extract analyzed. After mixing, 3 μl of each sample was injected into the gas chromatograph (GC). All extractions were done in triplicate.

The analysis of volatile compounds was carried out on a Hewlett Packard 5890 Series II GC coupled to an HP 7673 auto-sampler and injector and an HP 3396A integrator. The column used was a Lab Alliance organic-coated, fused silica capillary with dimensions of 60 m × 0.32 mm internal diameter with a 0.5 μm coating thickness. The injector temperature was set to 200°C, the split ratio to 20:1 and the flow rate to 15 ml.min-1, with hydrogen used as the carrier gas for a flame ionisation detector held at 250°C. The oven temperature was increased from 35°C to 230°C at a ramp of 3°C min-1.

Internal standards (Merck, Cape Town) were used to calibrate the machine for each of the compounds measured.

T-tests and anova analyses were conducted using Statistica (version 7). HCL and KMC clustering were carried out using TIGR MeV v2.2 [36].

Sampling of cells from fermentations and total RNA extraction was performed as described [37]. Probe preparation and hybridization to Affymetrix Genechip® microarrays were performed according to Affymetrix instructions, starting with 6 μg of total RNA. Results for each strain and time point were derived from three independent culture replicates. The quality of total RNA, cDNA, cRNA and fragmented cRNA were confirmed using the Agilent Bioanalyzer 2100.

Acquisition and quantification of array images and data filtering were performed using Affymetrix GeneChip® Operating Software (GCOS) version 1.4. All arrays were scaled to a target value of 500 using the average signal from all gene features using GCOS. Genes with expression values below 12 were set to 12 + the expression value as previously described in order to eliminate insignificant variations [38].

Variable (gene) selection is important for the successful analysis of gene expression data since most of the genes are unchanged and irrelevant to the prediction and analysis of phenotypic measurements. These non-informative genes should be removed before further analysis. One approach is by significance analysis of microarrays [39]. Determination of differential gene expression between experimental parameters was conducted using SAM (Significance Analysis of Microarrays) version 2. The two-class, unpaired setting was used and genes with a Q value less than 0.5 were considered differentially expressed. Only genes with a fold change greater than 2 (positive or negative) for inter- or intra- strain comparisons were taken into consideration.

In terms of design, the samples represent the different fermentations (three independent replicates for each of the five strains) at different time points. The variables considered are the expression levels of the pre-selected genes (genes with a potential and established role in aroma compound metabolism according to GO and MIPS functional classification) as well as aroma compound concentrations in the synthetic must. The patterns within the different sets of data were investigated by principal-component analysis (PCA), while the correlations between different sets of data were determined by using partial least-squares (PLS) regression (The Unscrambler; Camo Inc., Corvallis, Oreg.). PCA is a bilinear modeling method which gives a visually interpretable overview of the main information in large, multidimensional datasets. By plotting the principal components it is possible to view statistical relationships between different variables in complex datasets and detect and interpret sample groupings, similarities or differences, as well as the relationships between the different variables [40].

PLS regression is a bilinear modeling method for identifying the variations in a data matrix for explanatory or predictive purposes [41]. By plotting the first PLS components one can view main associations between X variables and Y variables and also relationships within X data and within Y data. PLS2 analysis was conducted using all X and Y variables considered in our study. For predictive purposes, PLS1 models were constructed for individual Y variables to increase model-specificity and reliability.

The data were analyzed by using test-set validation with centered data and the variables were weighted according to their standard deviations. One strain was used as the test segment at each of the time points. Day 2 and 5 data were considered together as representative of the full scope of fermentation variability as the period from the start of fermentation until day 5 represents the period of maximum aroma compound production.

The Y variables were the respective aroma compounds measured and the X variables were the gene expression levels of the gene set that was pre-selected for analysis [42]. Genes were selected based on known or putative functions related to amino acid transport, metabolism, regulation etc, as well as other enzymatic or regulatory activity in pathways leading to the production of higher alcohols and esters. The same set of genes (X variables) was used for each of the different PLS1 models.

All plasmids used in this study are listed in table 6. Standard procedures for the isolation, cloning and modification of DNA were used throughout this study [43,44]. All enzymes for cloning, restriction digest and ligation reactions were obtained from Roche Diagnostics (Randburg, South Africa) and used according to supplier specifications.

The primers listed in table 7 were used to amplify the coding regions of the various genes by the PCR technique. Genomic DNA from the DV10 strain was used as the template. Eshericia coli DH5α (GIBCO-BRL/Life Technologies) was used as the host for the construction and propagation of the plasmids listed in table 6. Sequencing of all plasmids was carried out on an ABI PRISM automated sequencer. All plasmids contain the dominant marker PhR conferring phleomicin resistance (PhR), and were transformed into host VIN13 cells via electroporation [21,45].

DR carried out the experimental work and contributed to the experimental design and data analysis. She also drafted the manuscript. TN contributed to the statistical analysis of the data. FFB conceived of the study and participated in its design and coordination. All authors read and approved of the final manuscript.