dict.md logo

Information-Theoretic Inference of Large Transcriptional Regulatory Networks

The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR), an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes) network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

Two important issues in computational biology are the extent to which it is possible to model transcriptional interactions by large networks of interacting elements and how these interactions can be effectively learned from measured expression data [1]. The reverse engineering of transcriptional regulatory networks (TRNs) from expression data alone is far from trivial because of the combinatorial nature of the problem and the poor information content of the data [1]. An additional problem is that by focusing only on transcript data, the inferred network should not be considered as a biochemical regulatory network but as a gene-to-gene network, where many physical connections between macromolecules might be hidden by shortcuts.

In spite of these evident limitations, the bioinformatics community made important advances in this domain over the last few years. Examples are methods like Boolean networks, Bayesian networks, and Association networks [2].

This paper will focus on information-theoretic approaches [36] which typically rely on the estimation of mutual information from expression data in order to measure the statistical dependence between variables (the termsvariable” and “feature” are used interchangeably in this paper). Such methods have recently held the attention of the bioinformatics community for the inference of very large networks [46].

The adoption of mutual information in probabilistic model design can be traced back to Chow-Liu tree algorithm [3] and its extensions proposed by [7, 8]. Later [9, 10] suggested to improve network inference by using another information-theoretic quantity, namely multi-information.

This paper introduces an original information-theoretic method, called MRNET, inspired by a recently proposed feature selection technique, the maximum relevance/minimum redundancy (MRMR) algorithm [11, 12]. This algorithm has been used with success in supervised classification problems to select a set of nonredundant genes which are explicative of the targeted phenotype [12, 13]. The MRMR selection strategy consists in selecting a set of variables that has a high mutual information with the target variable (maximum relevance) and at the same time are mutually maximally independent (minimum redundancy between relevant variables). The advantage of this approach is that redundancy among selected variables is avoided and that the trade-off between relevance and redundancy is properly taken into account.

Our proposed MRNET strategy, preliminarily sketched in [14], consists of (i) formulating the network inference problem as a series of input/output supervised gene selection procedures, where one gene at the time plays the role of the target output, and (ii) adopting the MRMR principle to perform the gene selection for each supervised gene selection procedure.

The paper benchmarks MRNET against three state-of-the-art information-theoretic network inference methods, namely relevance networks (RELNET), CLR, and ARACNE. The comparison relies on thirty artificial microarray datasets synthesized by two public-domain generators. The extensive simulation setting allows us to study the effect of the number of samples, the number of genes, and the noise intensity on the inferred network accuracy. Also, the sensitivity of the performance to two alternative entropy estimators is assessed.

The outline of the paper is as follows. Section2 reviews the state-of-the-art network inference techniques based on information theory. Section3 introduces our original approach based on MRMR. The experimental framework and the results obtained on artificially generated datasets are presented in Sections 4and5, respectively. Section6 concludes the paper.

:: ::

This section reviews some state-of-the-art methods for network inference which are based on information-theoretic notions.

These methods require at first the computation of the mutual information matrix (MIM), a square matrix whose i, j element (1)MIMij=I(Xi;Xj)=∑xi∈X∑xj∈Xp(xi,xj)log⁡(p(xi,xj)p(xi)p(xj))is the mutual information between Xi and Xj, where XiX, i = 1, …, n, is a discrete random variable denoting the expression level of the ith gene.

The Chow and Liu approach consists in finding the maximum spanning tree (MST) of a complete graph, where the weights of the edges are the mutual information quantities between the connected nodes [3]. The construction of the MST with Kruskal's algorithm has an O(n2log⁡n) cost. The main drawbacks of this method are: (i) the minimum spanning tree has typically a low number of edges also for non sparse target networks and (ii) no parameter is provided to calibrate the size of the inferred network.

The relevance network approach [4] has been introduced in gene clustering problems and successfully applied to infer relationships between RNA expression and chemotherapeutic susceptibility [15]. The approach consists in inferring a genetic network, where a pair of genes {Xi, Xj} is linked by an edge if the mutual information I(Xi; Xj) is larger than a given threshold I0. The complexity of the method is O(n2) since all pairwise interactions are considered.

Note that this method is prone to infer false positives in the case of indirect interactions between genes. For example, if gene X1 regulates both gene X2 and gene X3, a high mutual information between the pairs {X1, X2}, {X1, X3}, and {X2, X3} would be present. As a consequence, the algorithm would infer an edge between X2 and X3 although these two genes interact only through gene X1.

The CLR algorithm [6] is an extension of RELNET. This algorithm computes the mutual information (MI) for each pair of genes and derives a score related to the empirical distribution of these MI values. In particular, instead of considering the information I(Xi; Xj) between genes Xi and Xj, it takes into account the score zij=zi2+zj2, where (2)zi=max⁡(0,I(Xi;Xj)−μiσi)and μi and σi are, respectively, the mean and the standard deviation of the empirical distribution of the mutual information values I(Xi, Xk), k = 1, …, n. The CLR algorithm was successfully applied to decipher the E. coli TRN [6]. Note that, like RELNET, CLR demands an O(n2) cost to infer the network from a given MIM.

The algorithm for the reconstruction of accurate cellular networks (ARACNE) [5] is based on the data processing inequality [16]. This inequality states that if gene X1 interacts with gene X3 through gene X2, then (3)I(X1;X3)≤min⁡(I(X1;X2),I(X2;X3)).The ARACNE procedure starts by assigning to each pair of nodes a weight equal to their mutual information. Then, as in RELNET, all edges for which I(Xi; Xj) < I0 are removed, where I0 is a given threshold. Eventually, the weakest edge of each triplet is interpreted as an indirect interaction and is removed if the difference between the two lowest weights is above a threshold W0. Note that by increasing I0, we decrease the number of inferred edges while we obtain the opposite effect by increasing W0.

If the network is a tree and only pairwise interactions are present, the method guarantees the reconstruction of the original network, once it is provided with the exact MIM. The ARACNE's complexity for inferring the network is O(n3) since the algorithm considers all triplets of genes. In [5], the method has been able to recover components of the TRN in mammalian cells and appeared to outperform Bayesian networks and relevance networks on several inference tasks [5].

:: ::

We propose to infer a network using the maximum relevance/minimum redundancy (MRMR) feature selection method. The idea consists in performing a series of supervised MRMR gene selection procedures, where each gene in turn plays the role of the target output.

The MRMR method has been introduced in [11, 12] together with a best-first search strategy for performing filter selection in supervised learning problems. Consider a supervised learning task, where the output is denoted by Y and V is the set of input variables. The method ranks the set V of inputs according to a score that is the difference between the mutual information with the output variable Y (maximum relevance) and the average mutual information with the previously ranked variables (minimum redundancy). The rationale is that direct interactions (i.e., the most informative variables to the target Y) should be well ranked, whereas indirect interactions (i.e., the ones with redundant information with the direct ones) should be badly ranked by the method. The greedy search starts by selecting the variable Xi having the highest mutual information to the target Y. The second selected variable Xj will be the one with a high information I(Xj; Y) to the target and at the same time a low information I(Xj; Xi) to the previously selected variable. In the following steps, given a set S of selected variables, the criterion updates S by choosing the variable (4)XjMRMR=arg⁡max⁡Xj∈V\S(uj−rj)that maximizes the score (5)sj=uj−rj,where uj is a relevance term and rj is a redundancy term. More precisely, (6)uj=I(Xj;Y)is the mutual information of Xj with the target variable Y, and (7)rj=1|S|∑Xk∈SI(Xj;Xk)measures the average redundancy of Xj to each already selected variable XkS. At each step of the algorithm, the selected variable is expected to allow an efficient trade-off between relevance and redundancy. It has been shown in [12] that the MRMR criterion is an optimal “pairwise” approximation of the conditional mutual information between any two genes Xj and Y given the set S of selected variables I(Xj; Y | S).

The MRNET approach consists in repeating this selection procedure for each target gene by putting Y = Xi and V = X \ {Xi}, i = 1, …, n, where X is the set of the expression levels of all genes. For each pair {Xi, Xj}, MRMR returns two (not necessarily equal) scores si and sj according to (5). The score of the pair {Xi,Xj} is then computed by taking the maximum of si and sj. A specific network can then be inferred by deleting all the edges whose score lies below a given threshold I0 (as in RELNET, CLR, and ARACNE). Thus, the algorithm infers an edge between Xi and Xj either when Xi is a well-ranked predictor of Xj (si> I0) or when Xj is a well-ranked predictor of Xi (sj > I0).

An effective implementation of the MRMR best-first search is available in [17]. This implementation demands an O(f × n) complexity for selecting f features using a best-first search strategy. It follows that MRNET has an O(f × n2) complexity since the feature selection step is repeated for each of the n genes. In other terms, the complexity ranges between O(n2) and O(n3) according to the value of f. Note that the lower the f value, the lower the number of incoming edges per node to infer and consequently the lower the resulting complexity.

Note that since mutual information is a symmetric measure, it is not possible to derive the direction of the edge from its weight. This limitation is common to all the methods presented so far. However, this information could be provided by edge orientation algorithms (e.g., IC) commonly used in Bayesian networks [7].

The experimental framework consists of four steps (see Figure 1): the artificial network and data generation, the computation of the mutual information matrix, the inference of the network, and the validation of the results. This section details each step of the approach.

In order to assess the results returned by our algorithm and compare it to other methods, we created a set of benchmarks on the basis of artificially generated microarray datasets. In spite of the evident limitations of using synthetic data, this makes possible a quantitative assessment of the accuracy, thanks to the availability of the true network underlying the microarray dataset (see Figure 1).

We used two different generators of artificial gene expression data: the data generator described in [18] (hereafter referred to as the sRogers generator) and the SynTReN generator [19]. The two generators, whose implementations are freely available on the World Wide Web, are sketched in the following paragraphs.

In order to benchmark MRNET versus RELNET, CLR, and ARACNE, the same MIM is used for the four inference approaches. Several estimators of mutual information have been proposed in literature [5, 6, 20, 21]. Here, we test the Miller-Madow entropy estimator [20] and a parametric Gaussian density estimator. Since the Miller-Madow method requires quantized values, we pretreated the data with the equal-sized intervals algorithm [22], where the size l=N. The parametric Gaussian estimator is directly computed by I(Xi, Xj) = (1/2)log⁡(σiiσjj/|C|), where |C| is the determinant of the covariance matrix. Note that the complexity of both estimators is O(N), where N is the number of samples. This means that since the whole MIM cost is O(N × n2), the MIM computation could be the bottleneck of the whole network inference procedure for a large number of samples (Nn). We deem, however, that at the current state of the technology, this should not be considered as a major issue since the number of samples is typically much smaller than the number of measured features.

A network inference problem can be seen as a binary decision problem, where the inference algorithm plays the role of a classifier: for each pair of nodes, the algorithm either adds an edge or does not. Each pair of nodes is thus assigned a positive label (an edge) or a negative one (no edge).

A positive label (an edge) predicted by the algorithm is considered as a true positive (TP) or as a false positive (FP) depending on the presence or not of the corresponding edge in the underlying true network, respectively. Analogously, a negative label is considered as a true negative (TN) or a false negative (FN) depending on whether the corresponding edge is present or not in the underlying true network, respectively.

The decision made by the algorithm can be summarized by a confusion matrix (see Table 2).

It is generally recommended [23] to use receiver operator characteristic (ROC) curves when evaluating binary decision problems in order to avoid effects related to the chosen threshold. However, ROC curves can present an overly optimistic view of algorithm's performance if there is a large skew in the class distribution, as typically encountered in TRN inference because of sparseness.

To tackle this problem, precision-recall (PR) curves have been cited as an alternative to ROC curves [24]. Let the precision quantity (8)p=TPTP+FP,measure the fraction of real edges among the ones classified as positive and the recall quantity (9)r=TPTP+FN,also know as true positive rate, denote the fraction of real edges that are correctly inferred. These quantities depend on the threshold chosen to return a binary decision. The PR curve is a diagram which plots the precision (p) versus recall (r) for different values of the threshold on a two-dimensional coordinate system.

Note that a compact representation of the PR diagram is returned by the maximum of the F -score quantity (10)F=2prr+p,which is a weighted harmonic average of precision and recall. The following section will present the results by means of PR curves and F -scores.

Also in order to asses the significance of the results, a McNemar test can be performed. The McNemar test [25] states that if two algorithms A and B have the same error rate, then (11)P((|NABNBA|−1)2NAB+NBA>3.841459)<0.05,where NAB is the number of incorrect edges of the network inferred from algorithm A that are correct in the network inferred from algorithm B, and NBA is the counterpart.

A thorough comparison would require the display of the PR-curves (Figure 2) for each dataset. For reason of space, we decided to summarize the PR-curve information by the maximum F -score in Table 3. Note that for each dataset, the accuracy of the best methods (i.e., those whose score is not significantly lower than the highest one according to McNemar test) is typed in boldface.

We may summarize the results as follows.

As shown experimentally in the previous section, MRNET is competitive with the state-of-the-art techniques. Furthermore, MRNET benefits from some additional properties which are common to all the feature selection strategies for network inference [26, 27], as follows.

Feature selection algorithms can often deal with thousands of variables in a reasonable amount of time. This makes inference scalable to large networks.

Feature selection algorithms may be easily made parallel, since each of the n selections tasks is independent.

Feature selection algorithms may be made faster by a priori knowledge. For example, knowing the list of regulator genes of an organism improves the selection speed and the inference quality by limiting the search space of the feature selection step to this small list of genes. The knowledge of existing edges can also improve the inference. For example, in a sequential selection process, as in the forward selection used with MRMR, the next variable is selected given the already selected features. As a result, the performance of the selection can be strongly improved by conditioning on known relationships.

However, there is a disadvantage in using a feature selection technique for network inference. The objective of feature selection is selecting, among a set of input variables, the ones that will lead to the best predictive model. It has been proved in [28] that the minimum set that achieves optimal classification accuracy under certain general conditions is the Markov blanket of a target variable. The Markov blanket of a target variable is composed of the variable's parents, the variable's children, and the variable's children's parents [7]. The latter are indirect relationships. In other words, these variables have a conditional mutual information to the target variable Y higher than their mutual information. Let us consider the following example. Let Y and Xi be independent random variables, and Xj = Xi + Y (see Figure 7). Since the variables are independent, I(Xi; Y) = 0, and the conditional mutual information is higher than the mutual information, that is, I(Xi; Y | Xj) > 0. It follows that Xi has some information to Y given Xj but no information to Y taken alone. This behavior is colloquially referred to as explaining-away effect in the Bayesian network literature [7]. Selecting variables, like Xi, that take part into indirect interactions reduce the accuracy of the network inference task. However, since MRMR relies only on pairwise interactions, it does not take into account the gain in information due to conditioning. In our example, the MRMR algorithm, after having selected Xj, computes the score si = I(Xi; Y) − I(Xi; Xj), where I(Xi; Y) = 0 and I(Xi; Xj) > 0. This score is negative and is likely to be badly ranked. As a result, the MRMR feature selection criterion is less exposed to the inconvenient of most feature selection techniques while sharing their interesting properties. Further experiments will focus on this aspect.

A new network inference method, MRNET, has been proposed. This method relies on an effective method of information-theoretic feature selection called MRMR. Similarly to other network inference methods, MRNET relies on pairwise interactions between genes, making possible the inference of large networks (up to several thousands of genes).

Another advantage of MRNET, which could be exploited in future work, is its ability to benefit explicitly from a priori knowledge.

MRNET was compared experimentally to three state-of-the-art information-theoretic network inference methods, namely RELNET, CLR, and ARACNE, on thirty inference tasks. The microarray datasets were generated artificially with two different generators in order to effectively assess their inference power. Also, two different mutual information estimation methods were used. The experimental results showed that MRNET is competitive with the benchmarked information-theoretic methods.

Future work will focus on three main axes: (i) the assessment of additional mutual information estimators, (ii) the validation of the techniques on the basis of real microarray data, (iii) a theoretical analysis of which conditions should be met for MRNET to reconstruct the true network.