2013. Optimal detection of sparse principal components in high dimension
With Quentin Berthet
Ann. Statist. (to appear) arXiv:1202.5070
We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels and performs very well on simulated datasets.
@misc {BerRig12,
AUTHOR = {Berthet, Q. and Rigollet, P.},
TITLE = {Optimal detection of sparse principal components in high dimension},
YEAR = {2013},
MONTH = {February},
NOTE = {Ann. Statist. (to appear). arXiv:1202.5070},
}
2013. The multi-armed bandit problem with covariates
With Vianney Perchet
Ann. Statist., 41(2), 693-721
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (ABSE) that adaptively decomposes the global problem into suitably localized static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (SE) policy. Our results include sharper regret bounds for the SE policy in a static bandit problem and minimax optimal regret bounds for the ABSE policy in the dynamic problem.
@article {PerRig13,
AUTHOR = {Perchet, V. and Rigollet, P.},
TITLE = {The multi-armed bandit problem with covariates},
YEAR = {2013},
VOLUME = {43},
NUMBER = {2},
Pages = {693-721},
YEAR = {2013},
JOURNAL = {Ann. Statist.},
FJOURNAL = {The Annals of Statistics},
NOTE = {arXiv:1110.6084},
}
2012. Sparse estimation by exponential weighting
With Alexandre Tsybakov
Statist. Sci., 27(4), 558-575
Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows us to derive sparsity oracle inequalities in several popular frameworks including ordinary sparsity, fused sparsity and group sparsity. One striking aspect of these theoretical results is that they hold under no condition on the dictionary. Moreover, we describe an efficient implementation of the sparsity pattern aggregation principle that compares favorably to state-of-the-art procedures on some basic numerical examples.
@article{RigTsy12,
Author = {Rigollet, P. and Tsybakov, A.},
Journal = {Statistical Science},
Number = {4},
Pages = {558-575},
Title = {Sparse estimation by exponential weighting},
Volume = {27},
Year = {2012},
}
2012. Estimation of Covariance Matrices under Sparsity Constraints
With Alexandre Tsybakov
Statist. Sinica, 22(4), 1319-1378.
Discussion of ``Minimax Estimation of Large Covariance Matrices under L1-Norm" by Tony Cai and Harrison Zhou.
@article {RigTsy12a,
AUTHOR = {Rigollet, P. and Tsybakov, A.},
TITLE = {Estimation of Covariance Matrices under Sparsity Constraints},
YEAR = {2012},
VOLUME = {22},
NUMBER = {4},
JOURNAL = {Statist. Sinica},
FJOURNAL = {Statistica Sinica},
NOTE = {Discussion of ``Minimax Estimation of Large Covariance Matrices under L1-Norm" by T. Tony Cai and Harrison H. Zhou},
PAGES = {1319-1378}
}
2012. Deviation Optimal Learning using Greedy Q-aggregation
With Dong Dai and Tong Zhang
Ann. Statist., 40(3), 1878-1905
Given a finite family of functions, the goal of model selection is to construct a procedure that mimics the function from this family that is the closest to an unknown regression function. More precisely, we consider a general regression model with fixed design and measure the distance between functions by the mean squared error at the design points. While procedures based on exponential weights are known to solve the problem of model selection in expectation, they are, surprisingly, sub-optimal in deviation. We propose a new formulation called Q-aggregation that addresses this limitation; namely, its solution leads to sharp oracle inequalities that are optimal in a minimax sense. Moreover, based on the new formulation, we design greedy Q-aggregation procedures that produce sparse aggregation models achieving the optimal rate. The convergence and performance of these greedy procedures are illustrated and compared with other standard methods on simulated examples.
@article {DaiRigZha12,
AUTHOR = {Dai, D. and Rigollet, P. and Zhang, T.},
TITLE = {Deviation Optimal Learning using Greedy Q-aggregation},
YEAR = {2012},
MONTH = {March},
JOURNAL = {Ann. Statist.},
FJOURNAL = {The Annals of Statistics},
VOLUME = {40},
NUMBER = {3},
PAGES = {1878-1905},
NOTE = {arXiv:1203.2507},
}
2012. Kullback-Leibler aggregation and misspecified generalized linear models
Ann. Statist., 40(2), 639-665.
In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense.
@article{Rig12,
Author = {Philippe Rigollet},
Doi = {10.1214/11-AOS961},
Fjournal = {Annals of Statistics},
Issn = {0090-5364},
Journal = {Ann. Statist.},
Number = {2},
Pages = {639-665},
Title = {Kullback--Leibler aggregation and misspecified generalized linear models},
Volume = {40},
Year = {2012},
}
2011. Neyman-Pearson classification, convexity and stochastic constraints
With Xin Tong
J. Mach. Learn. Res., 12(Oct):2831-2855
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i) its probability of type I error is below a pre-specified level and (ii), it has probability of type II error close to the minimum possible. The proposed classifier is obtained by solving an optimization problem with an empirical objective and an empirical constraint. New techniques to handle such problems are developed and have consequences on chance constrained programming.
@article {RigTon11,
AUTHOR = {Rigollet, P. and Tong, X.},
TITLE = {Neyman-Pearson classification, convexity and stochastic constraints},
JOURNAL = {J. Mach. Learn. Res.},
FJOURNAL = {Journal of Machine Learning Research (JMLR)},
VOLUME = {12},
YEAR = {2011},
PAGES = {2831-2855 (electronic)},
}
2011. Exponential Screening and optimal rates of sparse
estimation
With Alexandre Tsybakov
Ann. Statist., 39(2), 731-771.
In high-dimensional linear regression, the goal pursued here is to estimate an
unknown regression function using linear combinations of a suitable
set of covariates. One of the key assumptions for the success of any
statistical procedure in this setup is to assume that the linear
combination is sparse in some sense, for example, that it involves
only few covariates. We consider a general, non necessarily linear,
regression with Gaussian noise and study a related question that is
to find a linear combination of approximating functions, which is at
the same time sparse and has small mean squared error (MSE). We
introduce a new estimation procedure, called \emph{Exponential
Screening} that shows remarkable adaptation properties. It adapts to
the linear combination that optimally balances MSE and sparsity,
whether the latter is measured in terms of the number of non-zero
entries in the combination ($\ell_0$ norm) or in terms of the global
weight of the combination ($\ell_1$ norm). The power of this
adaptation result is illustrated by showing that Exponential
Screening solves optimally and simultaneously all the problems of
aggregation in Gaussian regression that have been discussed in the
literature. Moreover, we show that the performance of the
Exponential Screening estimator cannot be improved in a minimax
sense, even if the optimal sparsity is known in advance. The
theoretical and numerical superiority of Exponential Screening
compared to state-of-the-art sparse procedures is also discussed.
@article {RigTsy11,
AUTHOR = {Rigollet, P. and Tsybakov, A.},
TITLE = {{E}xponential {S}creening and optimal rates of sparse
estimation},
JOURNAL = {Ann. Statist.},
FJOURNAL = {The Annals of Statistics},
VOLUME = {39},
YEAR = {2011},
NUMBER = {2},
PAGES = {731--771},
}
2011. Neyman-Pearson classification under a strict constraint
With Xin Tong
Proceedings of the 24th Annual Conference on Learning Theory
June 9-11, 2011, Budapest, Hungary. J. Mach. Learn. Res., W&CP, 19:595-614.
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i), its probability of type~I error is below a pre-specified level and (ii), it has probability of type ~II error close to the minimum possible. The proposed classifier is obtained by minimizing an empirical objective subject to an empirical constraint. The novelty of the method is that the classifier output by this problem is shown to satisfy the original constraint on type~I error. This strict enforcement of the constraint has interesting consequences on the control of the type~II error and we develop new techniques to handle this situation. Finally, connections with chance constrained optimization are evident and are investigated.
@article {RigTon11colt,
AUTHOR = {Rigollet, P. and Tong, X.},
TITLE = {Neyman-Pearson classification under a strict constraint},
JOURNAL = {J. Mach. Learn. Res.- W&CP},
FJOURNAL = {Journal of Machine Learning Research (JMLR) - Workshops and Conference Proceedings},
VOLUME = {19},
YEAR = {2011},
PAGES = {595-614 (electronic)},
EE = {http://jmlr.csail.mit.edu/proceedings/papers/v19/rigollet11a.html},
}
2010. Optimal rates of sparse estimation and universal aggregation
With Alexandre Tsybakov
Oberwolfach reports, 7(1), 924-927
In: Modern Nonparametric Statistics: Going Beyond Asymptotic Minimax, Mar.-Apr. 2010
@inproceedings{RigTsy10,
author = {Rigollet, P. and Tsybakov, A.},
title = { Optimal rates of sparse estimation and universal aggregation},
BOOKTITLE = {Modern Nonparametric Statistics: Going Beyond Asymptotic Minimax},
EDITOR = {Birg{\'e}, Lucien and Johanstone, Iain and Spokoiny, Vladimir},
PUBLISHER = {Mathematisches Forschungsinstitut Oberwolfach},
PAGES = {924--927},
SERIES = {Oberwolfach Reports},
VOLUME = {7},
NUMBER = 14},
YEAR = {2010},
}
2010. Nonparametric Bandits with Covariates
With Assaf Zeevi
In COLT (A. T. Kalai and M. Mohri, eds.). Omnipress, 54-66.
We consider a bandit problem which involves sequential sampling
from two populations (arms). Each arm produces a noisy reward
realization which depends on an observable random covariate.
The goal is to maximize cumulative expected reward. We derive
general lower bounds on the performance of any admissible policy,
and develop an algorithm whose performance achieves the order of
said lower bound up to logarithmic terms. This is done by decomposing
the global problem into suitably ``localized'' bandit
problems. Proofs blend ideas from nonparametric statistics and traditional
methods used in the bandit literature.
@inproceedings{RigZee10,
author = {Rigollet, P. and Zeevi, A.},
title = {Nonparametric Bandits with Covariates},
booktitle = {COLT},
year = {2010},
pages = {54-66},
crossref = {colt10},
}
@proceedings{colt10,
editor = {Adam Tauman Kalai and Mehryar Mohri},
title = {23rd Annual Conference on Learning Theory - COLT 2010, Haifa,
Israel, June 27-29, 2010},
booktitle = {COLT},
publisher = {Omnipress},
year = {2010},
}
2009. Fast rates for plug-in estimators of density level sets
With Régis Vert
Bernoulli, 15(4), 1154-1178.
In the context of density level set estimation, we study the convergence of general plug-in
methods under two main assumptions on the density for a given level $\lambda$. More precisely, it
is assumed that the density (i) is smooth in a neighborhood of $\lambda$ and (ii) has $\gamma$-exponent at
level $\lambda$. Condition (i) ensures that the density can be estimated at a standard nonparametric
rate and condition (ii) is similar to Tsybakov’s margin assumption which is stated for the
classification framework. Under these assumptions, we derive optimal rates of convergence for
plug-in estimators. Explicit convergence rates are given for plug-in estimators based on kernel
density estimators when the underlying measure is the Lebesgue measure. Lower bounds proving
optimality of the rates in a minimax sense when the density is Holder smooth are also provided.
@article {RigVer09,
AUTHOR = {Rigollet, P. and Vert, R.},
TITLE = {Fast rates for plug-in estimators of density level sets},
JOURNAL = {Bernoulli},
FJOURNAL = {Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability},
VOLUME = {15},
YEAR = {2009},
NUMBER = {4},
PAGES = {1154--1178},
}
2009. Learning by mirror averaging
With Anatoli Juditsky and Alexandre Tsybakov
Ann. Statist., 36(5), 2183-2206.
Given a finite collection of estimators or classifiers, we study the problem
of model selection type aggregation, that is, we construct a new estimator or
classifier, called aggregate, which is nearly as good as the best among them
with respect to a given risk criterion. We define our aggregate by a simple
recursive procedure which solves an auxiliary stochastic linear programming
problem related to the original nonlinear one and constitutes a special case
of the mirror averaging algorithm. We show that the aggregate satisfies sharp
oracle inequalities under some general assumptions. The results are applied to
several problems including regression, classification and density estimation.
@article {jrt09,
AUTHOR = {Juditsky, A. and Rigollet, P. and Tsybakov, A. B.},
TITLE = {Learning by mirror averaging},
JOURNAL = {Ann. Statist.},
FJOURNAL = {The Annals of Statistics},
VOLUME = {36},
YEAR = {2008},
NUMBER = {5},
PAGES = {2183--2206},
}
2007. Generalization error bounds in semi-supervised classification under the cluster assumption
J. Mach. Learn. Res., 8(Jul), 1369-1392
We consider semi-supervised classification when part of the available data is unlabeled. These unlabeled data can be useful for the classification problem when we make an assumption relating the behavior of the regression function to that of the marginal distribution. Seeger (2000) proposed the well-known cluster assumption as a reasonable one. We propose a mathematical formulation of this assumption and a method based on density level sets estimation that takes advantage of it to achieve fast rates of convergence both in the number of unlabeled examples and the number of labeled examples.
@article {Rig07,
AUTHOR = {Rigollet, Philippe},
TITLE = {Generalized error bounds in semi-supervised classification
under the cluster assumption},
JOURNAL = {J. Mach. Learn. Res.},
FJOURNAL = {Journal of Machine Learning Research (JMLR)},
VOLUME = {8},
YEAR = {2007},
PAGES = {1369--1392 (electronic)},
}
2007. Linear and convex aggregation of density estimators
With Alexandre Tsybakov
Math. Methods of Statist., 15(3), 260-280
In this article, the authors propose some aggregation procedures for finding the best linear or convex combinations of several density estimates in order to achieve lower $L_2$ risk. They prove sharp oracle inequalities, which show the asymptotic convergence of the risk of their proposed methods to that of the best linear or convex combinations of density estimates. They also derive the order of this convergence and prove its optimality. In practice, one normally uses one part of the sample for training (finding density estimates) and the other part for aggregation (finding the linear/convex combination). So, both the initial and the aggregated density estimates depend on the sample splitting. In order to avoid this arbitrariness, the authors propose to take the average of the aggregates obtained for different splits, and they prove similar asymptotic results for these averaged aggregates. They apply these results for aggregation of several kernel density estimates indexed by their bandwidths and use some simulated results to demonstrate the performance of their proposed methods.
@article {rt07,
AUTHOR = {Rigollet, Ph. and Tsybakov, A. B.},
TITLE = {Linear and convex aggregation of density estimators},
JOURNAL = {Math. Methods Statist.},
FJOURNAL = {Mathematical Methods of Statistics},
VOLUME = {16},
YEAR = {2007},
NUMBER = {3},
PAGES = {260--280},
}
2006. Adaptive density estimation using the blockwise Stein method
Bernoulli, 12(2), 351-370
The article considers nonparametric estimation of densities of unknown smoothness for i.i.d. data. A new modified Stein estimator is proposed. The performance of the density estimator is measured by the mean integrated squared error (MISE). The MISE, when written in the Fourier domain, is related to the mean squared error in the Gaussian sequence model. This model has been studied by many authors in order to derive estimators which mimic best estimators within certain classes, i.e. oracles. Clearly, the difficulty in adapting such results to density estimation is the nonnormality of the data. The main result of this paper is the oracle inequality given in Theorem 2.4 for a modified version of Stein's blockwise method. Due to this result it is possible to show that the proposed estimator is sharp minimax adaptive on a range of Sobolev classes of estimators. Finally its MISE is compared with that of kernel density estimators in a large class covering several standard estimators: it is shown that the estimators in that class are never asymptotically better than the proposed estimator.
@article {Rig06,
AUTHOR = {Rigollet, Philippe},
TITLE = {Adaptive density estimation using the blockwise {S}tein
method},
JOURNAL = {Bernoulli},
FJOURNAL = {Bernoulli. Official Journal of the Bernoulli Society for
Mathematical Statistics and Probability},
VOLUME = {12},
YEAR = {2006},
NUMBER = {2},
PAGES = {351--370},
}
2005. Mirror averaging, aggregation and model selection
With Anatoli Juditsky and Alexandre Tsybakov
Oberwolfach reports, 2(4), 2688-2691
In: Meeting on Statistical and Probabilistic Methods of Model Selection, October 2005
@inproceedings{jrt05ow,
AUTHOR = {Juditsky, Anatoli and Rigollet, Philippe and Tsybakov, Alexandre},
TITLE = {Mirror averaging, aggregation and model selection},
BOOKTITLE = {Statistische und Probabilistische Methoden der Modellwahl},
EDITOR = {Berger, James and Dette, Holder and Lugosi, Gabor and Munk, Alex},
PUBLISHER = {Mathematisches Forschungsinstitut Oberwolfach},
PAGES = {2688--2691},
SERIES = {Oberwolfach Reports},
VOLUME = {2},
NUMBER = {4},
YEAR = {2005},
}
2005. Oracle inequalities for probability density estimations - French
C. R. Math. Acad. Sci. Paris, 340(1), 59-62
We study the problem of the nonparametric estimation of a probability density in $L_2(R)$. Expressing the mean integrated squared error in the Fourier domain, we show that it is close to the mean squared error in the Gaussian sequence model. Then, applying a modified version of Stein's blockwise method, we obtain a linear monotone oracle inequality and a kernel oracle inequality. As a consequence, the proposed estimator is sharp minimax adaptive (i.e. up to a constant) on a scale of Sobolev classes of densities.
@article {Rig05,
AUTHOR = {Rigollet, Philippe},
TITLE = {In\'egalit\'es d'oracle pour l'estimation d'une densit\'e de
probabilit\'e},
JOURNAL = {C. R. Math. Acad. Sci. Paris},
FJOURNAL = {Comptes Rendus Math\'ematique. Acad\'emie des Sciences. Paris},
VOLUME = {340},
YEAR = {2005},
NUMBER = {1},
PAGES = {59--62},
}