by Prof. Michel Verleysen
Professor - Honorary Research Director FNRS
Universit'e catholique de Louvain
Machine Learning Group
ICTEAM Institute
Louvain School of Engineering
3 place du Levant, B-1348 Louvain-la-Neuve, Belgium
Tel: +32 10 47 25 51 - Fax: +32 10 47 25 98
E-mail: michel.verleysen@uclouvain.be
Homepage: http://www.dice.ucl.ac.be/~verleyse
Abstract
Machine learning methods are used to build models for classification and regression tasks, among others.
Models are built on the basis of information contained in a set of samples, with few or no information
about the underlying process.
The more information there is in the set of samples, the better the model should
be. However, this natural assumption does not always hold, since most machine learning paradigms suffer
from the “curse of dimensionality”. The curse of dimensionality means that strange phenomena appear when
data are represented in a high-dimensional space. These phenomena are most often counter-intuitive: the
conventional geometrical interpretation of data analysis in 2- or 3-dimensional spaces cannot be extended to
much higher dimensions.
Among the problems related to the curse of dimensionality, the feature redundancy and concentration of the
norm are probably those that have the largest impact on data analysis tools. Feature redundancy means that
models will lose the identifiability property (for example they will oscillate between equivalent solutions), will be
difficult to interpret, etc.; although it is an advantage on the point of view of information content in the data, the
redundancy makes the learning of the model more difficult. The concentration of the norm is a more specific
unfortunate property of high-dimensional vectors: when the dimension of the space increases, norms and
distance will concentrate, making the discrimination between data more difficult. Most data analysis tools are
not robust to these phenomena. Their performance collapse when the dimension of the data space increases,
in particular when the number of data available for learning is limited.
This tutorial will start by a presentation of phenomena related to the curse of dimensionality. Then, feature
selection and nonlinear dimensionality reduction will be discussed, as possible remedies to this curse.
Feature selection consists in selecting some of the variables/features among those available in the dataset,
according to a relevance criterion. The goal is twofold: to avoid redundancy between features, and to discard
irrelevant ones. State-of-the-art feature selection methods based on information theory criteria will be presented,
together with the respective advantages of filter, wrapper and embedded methods.
Nonlinear dimensionality reduction, or manifold learning, consists in mapping the high-dimensional data to a
lower-dimensional representation, while preserving some topology, distance or information criterion.
Such nonlinear projection methods may be used both for dimensionality reduction (therefore fighting the
curse of dimensionality), and for the visualization of data when the manifold dimension is restricted to 2 or 3.
The tutorial will conclude by opening new challenges and questions in the field of feature selection and
dimensionality reduction.
Speaker
Michel Verleysen received the M.S. and Ph.D. degrees in electrical engineering from the Université
catholique de Louvain (Belgium) in 1987 and 1992, respectively. He was an invited professor at the
Swiss E.P.F.L. (Ecole Polytechnique Fédérale de Lausanne, Switzerland) in 1992, at the
Université d'Evry Val d'Essonne (France) in 2001, and at the Université ParisI-Panthéon-Sorbonne
from 2002 to 2009, respectively. He is now Full Professor at the Université catholique de Louvain,
and Honorary Research Director of the Belgian F.N.R.S. (National Fund for Scientific Research).
He is editor-in-chief of the Neural Processing Letters journal, chairman of the annual
ESANN conference (European Symposium on Artificial Neural Networks, Computational Intelligence
and Machine Learning), past associate editor of the IEEE Trans. on Neural Networks journal, and
member of the editorial board and program committee of several journals and conferences on
neural networks and learning. He is author or co-author of more than 200 scientific papers in international
journals and books or communications to conferences with reviewing committee.
He is the co-author of the scientific popularization book on artificial neural networks in the series “Que Sais-Je?”,
in French, and of the "Nonlinear Dimensionality Reduction" book published by Springer in 2007.
His research interests include machine learning, artificial neural networks, self-organization,
time-series forecasting, nonlinear statistics, adaptive signal processing, and high-dimensional
data analysis.