Optimization and Data Mining

Research focus

The Peer review has evaluated this group as Average


The MOLD - mathematical modeling, optimization, learning from data - research group is aimed at developing and applying mathematical models for optimization, data mining and knowledge discovery. The group is coordinated by Carlo Vercellis and its website is at www.mold.polimi.it. The research activities of the MOLD group are mainly focused on the following themes: - optimization models; - mathematical models for inductive learning; - data mining and business intelligence; - marketing models; - logistics and production systems optimization; - performance analysis and benchmarking; - models for biolife sciences. Optimization models and methods A large number of decision making processes arising within companies and the public administration can be cast in form of optimization models: the decision maker identifies a number of feasible courses of action and defines a criterion for comparing the alternative decisions, such as the total cost or the total gain. Optimization methods allow to determine the best choice among the alternative decisions in order to minimize the cost or maximize the gain. Optimization models have been applied in a large number of situations in which a set of scarce resources have to be allocated among different activities in the most effective way. Resources can represent people, production processes, raw materials, components, money. The MOLD research group is mainly active in developing and analyzing approximate algorithms for hard problems in mixed integer programming, combinatorial optimization, linear and convex optimization, stochastic optimization. A forthcoming book will collect research achievements in this area: C. Vercellis. Optimization. McGraw-Hill, 2007. Mathematical models for inductive learning Inductive learning is aimed at understanding the mechanisms behind intelligence, interpreted as the capability of extracting knowledge from past experiences and applying it to predict future outcomes or events. Mathematical models for learning are based on algorithms that are capable to learn from the available examples to extract a set of rules, reproducing the sophisticated ability to learn that the human mind has acquired through evolution. Beside the intrinsic theoretical interest, mathematical models for learning have a large number of applications in different fields, including image, sound and texts recognition; biolife sciences, molecular genetics and medical diagnosis; relational marketing; production systems; identification of frauds and anomalies. The MOLD research group is mainly 55 focused on methods for supervised learning, based on continuous and discrete support vector machines (SVM), kernels, classification trees, neural networks; unsupervised learning models for clustering and association rules identification. Apart from previous publications, further recent achievements in this area are appearing as: C. Vercellis. Business intelligence. Data mining and optimization. Wiley, 2007. G. Felici, C. Vercellis. Mathematical methods for knowledge discovery and data mining. Idea Group, 2007. C. Orsenigo, C. Vercellis. Multicategory classification via discrete support vector machines. Computational Management Science, 2007. C. Orsenigo, C. Vercellis. Accurately learning from few examples with a polyhedral classifier. Computational Optimization and Applications, 2007. Data mining and optimization models for biolife and healthcare problems At present, different fields of biolife sciences (genomics and proteomics, medicine and health care) are increasingly characterized by the availability of very large warehouses of experimental data. Hence, biolife fields are becoming progressively more dependent on mathematical methods aimed at learning rules and deriving accurate predictions from these data, with great opportunities to achieve relevant improvements in human performance and health. On the methodological facet, the research group will focus on the development of new models and algorithms in two main areas, by bringing together the competence of the different research units. On one side, mathematical predictive methods for supervised and unsupervised learning from data, among which: hierarchical and ensemble classifiers based on discrete variants of support vector machines, that in several application domains have been shown to achieve significant improvements in accuracy; predictive kernels models for the treatment of heterogeneous datasets; classification methods for labeled time-series. The applied side of the research involves the analysis of publicly available datasets, that are used as benchmarks for validating the proposed methods on several relevant biolife tasks. In particular, the study of mathematical models of the granular cell, framed in a realistic network of synapses of the cerebellum. Recent advances in this area are appearing as: C. Orsenigo, C. Vercellis. Protein folding classification through multicategory discrete SVM. Mathematical methods for knowledge discovery and data mining. G. Felici, C. Vercellis eds., Idea Group (2007). C. Orsenigo, C. Vercellis. Evaluating membership functions for fuzzy discrete SVM. Lecture notes in computer science. Springer (2007). C. Orsenigo, C. Vercellis. Predicting HIV protease-cleavable peptides by discrete support vector machines. Lecture notes in computer science. Springer (2007). Best paper at the international conference EvoBio 2007, Valencia. C. Orsenigo, C. Vercellis. Softening the margin in discrete SVM. Lecture notes in computer science. Springer (2007).

Dipartimento di afferenza

Dipartimento di Ingegneria Gestionale (DIG)

Docenti afferenti

Full Professors
Carlo Vercellis
Assistant Professors
Francesca Fumero