. .. Antibacterial-drug-discovery, 85 6.2 Feature selection

.. .. Chordalysis,

, Classication of antibacterials and non-antibacterials, p.94

. .. Experiments,

.. .. Conclusions,

, Feature selection is an important step in KDD, since it reduces the complexity of a dataset

, According to a recent report on antibiotic research released Sept. 17 by the London School of Economics and Political Science (LSE), 175,000 deaths are attributed to hospital-acquired infections each year in Europe alone

R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proc. 20th int. conf. very large data bases, VLDB, vol.1215, p.487499, 1994.

M. Ailem, F. Role, and M. Nadif, Graph modularity maximization as an eective method for co-clustering text data. Knowledge-Based Systems, vol.109, p.160173, 2016.

M. Alam, A. Buzmakov, and A. Napoli, Exploratory knowledge discovery over web of data
URL : https://hal.archives-ouvertes.fr/hal-01673439

, Discrete Applied Mathematics, vol.249, p.217, 2018.

M. Alam, T. N. Le, and A. Napoli, Latviz: A new practical tool for performing interactive exploration over concept lattices, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01420751

A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos et al., Distinct types of diuse large b-cell lymphoma identied by gene expression proling, Nature, vol.403, issue.6769, p.503, 2000.

K. Allab, L. Labiod, and M. Nadif, Multi-manifold matrix decomposition for data coclustering, Pattern Recognition, vol.64, p.386398, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01408092

S. Andrews, In-Close, a fast algorithm for computing formal concepts, International Conference on Conceptual Structures (ICCS), 2009.

S. Andrews, In-Close2, a high performance formal concept miner, International Conference on Conceptual Structures, vol.5062, 2011.

Y. Asses, A. Buzmakov, T. Bourquard, S. O. Kuznetsov, and A. Napoli, A Hybrid Classication Approach based on FCA and Emerging Patterns -An application for the classication of biological inhibitors, Proceedings of CLA. CEUR Workshop Proceedings, vol.972, p.211222, 2012.

J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, Sequential pattern mining using a bitmap representation, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.429435, 2002.

A. Ben-dor, B. Chor, R. Karp, and Z. Yakhini, Discovering local structure in gene expression data: the order-preserving submatrix problem, Journal of computational biology, vol.10, issue.3-4, p.373384, 2003.

A. Berry and R. Pogorelcnik, A simple algorithm to generate the minimal separators and the maximal cliques of a chordal graph, Information Processing Letters, vol.111, issue.11, p.508511, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00678694

L. Breiman, Random forests, Machine Learning, vol.45, issue.1, p.532, 2001.

C. D. Brown and H. T. Davis, Receiver operating characteristics curves and related decision measures: A tutorial, Chemometrics and Intelligent Laboratory Systems, vol.80, issue.1, p.2438, 2006.

A. Buzmakov, E. Egho, N. Jay, S. O. Kuznetsov, A. Napoli et al., On mining complex sequential data by means of FCA and pattern structures, International Journal of General Systems, vol.45, issue.2, p.135159, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01186715

G. Cano, J. Garcia-rodriguez, A. Garcia-garcia, H. Perez-sanchez, J. A. Benediktsson et al., Automatic selection of molecular descriptors using random forest: Application to drug discovery, Expert Systems with Applications, vol.72, p.151159, 2017.

C. C. Chang and C. J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST), vol.2, issue.3, p.27, 2011.

Y. Cheng and G. M. Church, Biclustering of expression data, vol.8, p.93103, 2000.

V. Codocedo, G. Bosc, M. Kaytoue, J. F. Boulicaut, and A. Napoli, A proposition for sequence mining using pattern structures, Proceedings of ICFCA, p.106121, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01549107

V. Codocedo and A. Napoli, Lattice-based biclustering using partition pattern structures, Proceedings of the Twenty-rst European Conference on Articial Intelligence. pp. 213218, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01095865

V. Codocedo-henríquez, Contributions à l'indexation et à la récupération d'information utilisant l'analyse formelle de concepts, 2015.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.20, issue.3, p.273297, 1995.

M. Couceiro and A. Napoli, Elements about exploratory, knowledge-based, hybrid, and explainable knowledge discovery, International Conference on Formal Concept Analysis, p.316, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02195480

A. Davoudi, S. S. Ghidary, and K. Sadatnejad, Dimensionality reduction based on distance preservation to local mean for symmetric positive denite matrices and its application in braincomputer interfaces, Journal of Neural Engineering, vol.14, issue.3, p.36019, 2017.

L. Di-jorio, A. Laurent, and M. Teisseire, Mining frequent gradual itemsets from large databases, International Symposium on Intelligent Data Analysis, 2009.

C. Ding, X. He, and H. D. Simon, On the equivalence of nonnegative matrix factorization and spectral clustering, Proceedings of the 2005 SIAM International Conference on Data Mining, p.606610, 2005.

C. Ding, T. Li, W. Peng, and H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.126135, 2006.

G. Dong and J. Li, Ecient mining of emerging patterns: Discovering trends and dierences, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.4352, 1999.

E. Egho, C. Raïssi, T. Calders, N. Jay, and A. Napoli, On measuring similarity for sequences of itemsets, Data Mining and Knowledge Discovery, vol.29, issue.3, p.732764, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00740231

,

M. Ester, H. P. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, vol.96, p.226231, 1996.

V. Estivill-castro, Why so many clustering algorithms: A position paper, SIGKDD Explorations, vol.4, issue.1, p.6575, 2002.

U. Fayyad, G. Piatetsky-shapiro, and P. Smyth, From data mining to knowledge discovery in databases, AI Magazine, vol.17, issue.3, p.3737, 1996.

P. Fernandes, The global challenge of new classes of antibacterial agents: an industry perspective, Current Opinion in Pharmacology, vol.24, p.711, 2015.

A. A. Freitas, Advances in evolutionary computing, 2003.

B. Ganter and S. O. Kuznetsov, Pattern structures and their projections, International Conference on Conceptual Structures, p.129142, 2001.

B. Ganter and R. Wille, Formal concept analysis: mathematical foundations, Springer Science & Business Media, 2012.

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for cancer classication using support vector machines, Machine Learning, vol.46, issue.1-3, p.389422, 2002.

S. J. Haberman, The analysis of frequency data: Statistical research monographs, 1977.

J. Han, J. Pei, B. Mortazavi-asl, H. Pinto, Q. Chen et al., PrexSpan: Mining sequential patterns eciently by prex-projected pattern growth, Proceedings of the 17th International Conference on Data Engineering, p.215224, 2001.

J. A. Hartigan, Direct clustering of a data matrix, Journal of the American Statistical Association, vol.67, issue.337, p.123129, 1972.

I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner et al., Gene-expression proles in hereditary breast cancer, New England Journal of Medicine, vol.344, issue.8, p.539548, 2001.

R. Henriques, F. L. Ferreira, and S. C. Madeira, BicPAMS: software for biological data analysis with pattern-based biclustering, BMC Bioinformatics, vol.18, issue.1, p.82, 2017.

R. Henriques and S. C. Madeira, BicPAM: Pattern-based biclustering for biomedical data analysis, Algorithms for Molecular Biology, vol.9, issue.1, p.27, 2014.

R. Henriques and S. C. Madeira, BicSPAM: exible biclustering using sequential patterns, BMC Bioinformatics, vol.15, issue.1, p.130, 2014.

R. Henriques and S. C. Madeira, BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge, Algorithms for Molecular Biology, vol.11, issue.1, p.23, 2016.

R. Henriques and S. C. Madeira, BicNET: Flexible module discovery in large-scale biological networks using biclustering, Algorithms for Molecular Biology, vol.11, issue.1, p.14, 2016.

R. Henriques, S. C. Madeira, and C. Antunes, F2G: Ecient discovery of full-patterns

. Ecml/pkdd, , p.19, 2013.

S. Hochreiter, U. Bodenhofer, M. Heusel, A. Mayr, A. Mitterecker et al., FABIA: Factor analysis for bicluster acquisition, Bioinformatics, vol.26, issue.12, p.15201527, 2010.

J. Hung, An experiment about the classication of antibacterial molecules, Orpailleur team, 2015.

S. F. Hussain and M. Ramazan, Biclustering of human cancer microarray data using cosimilarity based co-clustering, Expert Systems with Applications, vol.55, p.520531, 2016.

D. I. Ignatov, S. O. Kuznetsov, and J. Poelmans, Concept-based biclustering for internet advertisement, Data Mining Workshops (ICDMW), p.123130, 2012.

D. I. Ignatov, J. Poelmans, and V. Zaharchuk, Recommender system based on algorithm of bicluster analysis RecBi, 2012.

D. I. Ignatov and B. W. Watson, Towards a unied taxonomy of biclustering methods, 2017.

Y. A. Ivanenkov, N. P. Savchuk, S. Ekins, and K. V. Balakin, Computational mapping tools for drug discovery, Drug Discovery Today, vol.14, p.767775, 2009.

G. H. John, R. Kohavi, and K. Peger, Irrelevant features and the subset selection problem, Machine Learning Proceedings, p.121129, 1994.

S. C. Johnson, Hierarchical clustering schemes, Psychometrika, vol.32, issue.3, p.241254, 1967.

G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM Journal on Scientic Computing, vol.20, issue.1, p.359392, 1998.

M. Kaytoue, Z. Assaghir, A. Napoli, and S. O. Kuznetsov, Embedding tolerance relations in formal concept analysis: an application in information fusion, Proceedings of the 19th ACM international conference on Information and knowledge management, p.16891692
URL : https://hal.archives-ouvertes.fr/inria-00600205

M. Kaytoue, S. O. Kuznetsov, and A. Napoli, Biclustering numerical data in formal concept analysis, International Conference on Formal Concept Analysis, p.135150, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00600203

M. Kaytoue, S. O. Kuznetsov, A. Napoli, and S. Duplessis, Mining gene expression data with pattern structures in formal concept analysis, Information Sciences, vol.181, issue.10, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00541100

S. Korkmaz, G. Zararsiz, and D. Goksuluk, Drug/nondrug classication using support vector machines with various feature selection strategies, Computer Methods and Programs in Biomedicine, vol.117, issue.2, p.5160, 2014.

T. Kuik, Z. Boger, and M. Zancanaro, Analysis and prediction of museum visitors' behavioral pattern types, Ubiquitous Display Environments, p.161176, 2012.

S. O. Kuznetsov, A fast algorithm for computing all intersections of objects from an arbitrary semilattice, p.1720, 1993.

S. O. Kuznetsov and D. I. Ignatov, Concept stability for constructing taxonomies of web-site users, 2009.

S. O. Kuznetsov and S. A. Obiedkov, Comparing performance of algorithms for generating concept lattices, Journal of Experimental & Theoretical Articial Intelligence, vol.14, issue.2-3, p.189216, 2002.

C. Laclau and M. Nadif, Hard and fuzzy diagonal co-clustering for document-term partitioning, Neurocomputing, vol.193, p.133147, 2016.

J. Lanir, T. Kuik, E. Dim, A. J. Wecker, and O. Stock, The inuence of a location-aware mobile guide on museum visitors' behavior, Interacting with Computers, vol.25, issue.6, p.443460, 2013.

S. Lee, D. Son, W. Yu, and T. Park, Gene-gene interaction analysis for the accelerated failure time model using a unied model-based multifactor dimensionality reduction method

, Genomics & Informatics, vol.14, issue.4, p.166, 2016.

F. Li and Y. Yang, Using recursive classication to discover predictive features, Proceedings of the 2005 ACM Symposium on Applied Computing, p.10541058, 2005.

L. Liu, L. Chen, Y. H. Zhang, L. Wei, S. Cheng et al., Analysis and prediction of drugdrug interaction by minimum redundancy maximum relevance and incremental feature selection, Journal of Biomolecular Structure and Dynamics, vol.35, issue.2, p.312329, 2017.

Y. Liu, A comparative study on feature selection methods for drug discovery, Journal of Chemical Information and Computer Sciences, vol.44, issue.5, p.18231828, 2004.

S. C. Madeira and A. L. Oliveira, Biclustering algorithms for biological data analysis: a survey

, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol.1, issue.1, p.45, 2004.

D. Van-der-merwe, S. Obiedkov, and D. Kourie, AddIntent: A new incremental algorithm for constructing concept lattices, International Conference on Formal Concept Analysis, p.372385, 2004.

V. A. Padilha and R. J. Campello, A systematic comparative evaluation of biclustering techniques, BMC bioinformatics, vol.18, issue.1, p.55, 2017.

F. Petitjean, L. Allison, and G. I. Webb, A statistically ecient and scalable method for loglinear analysis of high-dimensional data, 2014 IEEE International Conference on Data Mining, p.480489, 2014.

F. Petitjean and G. I. Webb, Scaling log-linear analysis to datasets with thousands of variables, Proceedings of the 2015 SIAM International Conference on Data Mining, p.469477, 2015.

F. Petitjean, G. I. Webb, and A. E. Nicholson, Scaling log-linear analysis to high-dimensional data, 2013 IEEE International Conference on Data Mining, vol.597606, 2013.

G. Pio, M. Ceci, D. D'elia, C. Loglisci, and D. Malerba, A novel biclustering algorithm for the discovery of meaningful biological correlations between microRNAs and their target genes, BMC bioinformatics, vol.14, issue.7, p.8, 2013.

G. Pio, M. Ceci, C. Loglisci, D. D'elia, and D. Malerba, Hierarchical and overlapping co-clustering of mrna: mirna interactions, ECAI. pp. 654659. Citeseer, 2012.

G. Pio, M. Ceci, D. Malerba, and D. D'elia, ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks, BMC Bioinformatics, vol.16, issue.9, p.7, 2015.

B. Pontes, R. Giráldez, and J. S. Aguilar-ruiz, Biclustering on expression data: A review, Journal of biomedical informatics, vol.57, p.163180, 2015.

A. M. Prescott and S. M. Abel, Combining in silico evolution and nonlinear dimensionality reduction to redesign responses of signaling networks, Physical Biology, vol.13, issue.6, p.66015, 2017.

. R-core-team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, 2014.

M. Reutlinger and G. Schneider, Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery, Journal of Molecular Graphics and Modelling, vol.34, p.117, 2012.

R. Rocci and M. Vichi, Two-mode multi-partitioning, Computational Statistics & Data Analysis, vol.52, issue.4, p.19842003, 2008.

J. Sadowski, J. Gasteiger, and G. Klebe, Comparison of automatic three-dimensional model builders using 639 x-ray structures, Journal of Chemical Information and Computer Sciences, vol.34, issue.4, p.10001008, 1994.

A. Salah, M. Ailem, and M. Nadif, Word co-occurrence regularized non-negative matrix trifactorization for text data co-clustering, Thirty-Second AAAI Conference on Articial Intelligence, 2018.

B. G. Tabachnick, L. S. Fidell, and J. B. Ullman, Using Multivariate Statistics, vol.5, 2007.

M. A. Tahir, A. Bouridane, and F. Kurugollu, Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classier, Pattern Recognition Letters, vol.28, issue.4, p.438446, 2007.

A. Tanay, R. Sharan, and R. Shamir, Discovering statistically signicant biclusters in gene expression data, Bioinformatics, vol.18, issue.suppl_1, pp.136-144, 2002.

H. Tang, Z. D. Su, H. H. Wei, W. Chen, and H. Lin, Prediction of cell-penetrating peptides with feature selection techniques, Biochemical and Biophysical Research Communications, vol.477, issue.1, p.150154, 2016.

J. Tang, S. Alelyani, and H. Liu, Feature selection for classication: A review. Data Classication: Algorithms and Applications p, p.37, 2014.

R. Todeschini and V. Consonni, Molecular Descriptors for Chemoinformatics, vol.41, 2009.

R. Veroneze, A. Banerjee, and F. J. Von-zuben, Enumerating all maximal biclusters in numerical datasets, Information Sciences, vol.379, p.288309, 2017.

M. Vichi, Double k-means clustering for simultaneous classication of objects and variables, Advances in Classication and Data Analysis, p.4352, 2001.

E. Véron and M. Levasseur, Ethnographie de l'exposition. Bibliothèque Publique d'Information, Centre Georges Pompidou, 1983.

G. I. Webb, Layered critical values: a powerful direct-adjustment approach to discovering signicant patterns, Machine Learning, vol.71, issue.2-3, p.307323, 2008.

X. Xu, A. Li, and M. Wang, Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine, IET Systems Biology, vol.9, issue.4, p.155163, 2015.

Y. Xue, Z. R. Li, C. W. Yap, L. Z. Sun, X. Chen et al., Eect of molecular descriptor feature selection in support vector machine classication of pharmacokinetic and toxicological properties of chemical agents, Journal of Chemical Information and Computer Sciences, vol.44, issue.5, p.16301638, 2004.

Y. Xue, M. Li, Z. Liao, J. Luo, T. Li et al., A biclustering algorithm with coherent evolution on the contiguous columns facing time-series gene data, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), p.328333, 2014.

Y. Xue, T. Li, Z. Liu, C. Pang, M. Li et al., A new approach for the deep order preserving submatrix problem based on sequential pattern mining, International Journal of Machine Learning and Cybernetics, vol.9, issue.2, p.263279, 2018.

J. Yang, H. Wang, H. Ding, N. An, and G. Alterovitz, Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization, BMC Bioinformatics, vol.18, issue.1, p.47, 2017.

M. Yang, J. Chen, X. Shi, L. Xu, Z. Xi et al., Development of in silico models for predicting p-glycoprotein inhibitors based on a two-step approach for feature selection and its application to chinese herbal medicine screening, Molecular Pharmaceutics, vol.12, issue.10, p.36913713, 2015.

M. J. Zaki and C. J. Hsiao, Ecient algorithms for mining closed itemsets and their lattice structure, IEEE transactions on knowledge and data engineering, vol.17, issue.4, p.462478, 2005.

M. Zancanaro, T. Kuik, Z. Boger, D. Goren-bar, and D. Goldwasser, Analyzing museum visitors' behavior patterns, International Conference on User Modeling, p.238246

. Springer, , 2007.

H. Zhang and G. Sun, Feature selection using tabu search method, Pattern Recognition, vol.35, issue.3, p.701711, 2002.

X. Zhao, F. Nie, S. Wang, J. Guo, P. Xu et al., Unsupervised 2d dimensionality reduction with adaptive structure learning, Neural Computation, vol.29, issue.5, p.13521374, 2017.