Data-Driven Gene Regulatory Network Inference

GXN: Generalizable Gene Self-Expressive Networks

GXN paper Self-expressiveness is a mathematical property that aims at characterizing the relationship between instances in a dataset. This property has been applied widely and successfully in computer-vision tasks, time-series analysis, and to infer underlying network structures in domains including protein signaling interactions and social-networks activity. Nevertheless, despite its potential, self-expressiveness has not been explicitly used to infer gene networks. In this article, we present Generalizable Gene Self-Expressive Networks, a new, interpretable, and generalization-aware formalism to model gene networks, and we propose two methods: GXN•EN and GXN•OMP, based respectively on ElasticNet and OMP (Orthogonal Matching Pursuit), to infer and assess Generalizable Gene Self-Expressive Networks. We evaluate these methods on four Microarray datasets from the DREAM5 benchmark, using both internal and external metrics. The results obtained by both methods are comparable to those obtained by state-of-the-art tools, but are fast to train and exhibit high levels of sparsity, which make them easier to interpret. Moreover we applied these methods to three complex datasets containing RNA-seq informations from different mammalian tissues/cell-types. Lastly, we applied our methodology to compare a normal vs. a disease condition (Alzheimer), which allowed us to detect differential expression of genes’ sub-networks between these two biological conditions. Globally, the gene networks obtained exhibit a sparse and modular structure, with inner communities of genes presenting statistically significant over/under-expression on specific cell types, as well as significant enrichment for some anatomical GO terms, suggesting that such communities may also drive important functional roles

GReNaDIne: Gene Regulatory Network Data-driven Inference

GReNaDIne paper Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze a dataset, users should be able to test different techniques and choose the most appropriate one. This step can be particularly difficult and time consuming, since most methods’ implementations are made available independently, possibly in different programming languages. The implementation of an open-source library containing different inference methods within a common framework is expected to be a valuable toolkit for the systems biology community. Results: In this work, we introduce GReNaDIne (Gene Regulatory Network Data-driven Inference), a Python package that implements 18 machine learning data-driven gene regulatory network inference methods. It also includes eight generalist preprocessing techniques, suitable for both RNA-seq and microarray dataset analysis, as well as four normalization techniques dedicated to RNA-seq. In addition, this package implements the possibility to combine the results of different inference tools to form robust and efficient ensembles. This package has been successfully assessed under the DREAM5 challenge benchmark dataset. The open-source GReNaDIne Python package is made freely available in a dedicated GitLab repository, as well as in the official third-party software repository PyPI Python Package Index. The latest documentation on the GReNaDIne library is also available at Read the Docs, an open-source software documentation hosting platform. Contribution: The GReNaDIne tool represents a technological contribution to the field of systems biology. This package can be used to infer gene regulatory networks from high-throughput gene expression data using different algorithms within the same framework. In order to analyze their datasets, users can apply a battery of preprocessing and postprocessing tools and choose the most adapted inference method from the GReNaDIne library and even combine the output of different methods to obtain more robust results. The results format provided by GReNaDIne is compatible with well-known complementary refinement tools such as PYSCENIC.

ICTAI 2020 paper | Gene Regulatory Network Inference Using Ensembles of Predictors In the machine learning field, the technique known as ensemble learning aims at combining different base learners in order to increase the quality and the robustness of the predictions. Indeed, this approach has widely been applied to tackle, with success, real world problems from different domains, including computational biology. Nevertheless, despite the potential of this technique, ensembles that combine results from different kinds of algorithms, have been understudied in the context of gene regulatory network inference. In this paper we used a genetic algorithm and frequent itemset mining, to study and design effective ensembles, to reverse-engineer gene regulatory networks, from high-throughput data. The methods proposed here, were evaluated and compared to well-established single and ensemble methods, on real and synthetic datasets. Results demonstrate the efficiency and the robustness of these new methods, advocating for their use as gene regulatory network inference tools.

ICTAI 2019 paper | Data-driven Gene Regulatory Network Inference based on Classification Algorithms: Different paradigms of gene regulatory network inference have been proposed so far in the literature. The data-driven family is an important inference paradigm, that aims at scoring potential regulatory links between transcription factors and target genes, analyzing gene expression datasets. Three major approaches have been proposed to score such links relying on correlation measures, mutual information metrics, and regression algorithms. In this paper we present a new family of data-driven inference approaches, inspired on the regression based family, and based on classification algorithms. This paper advocates for the use of this paradigm as a new promising approach to infer gene regulatory networks. Indeed, the implementation and test of five new inference methods based on well-known classification algorithms shows that such an approach exhibits good quality results when compared to well-established paradigms.

GReNaDIne can be installed via pip install GReNaDIne, and it is also available on gitlab.

GXN can be installed via pip install GXN, and it is also available on gitlab.