Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due to its flexibility and strong theoretical foundations. However, its use in large-scale scenarios is plagued by high computational costs. In this work, we use a Nystr\"om approximation of the MMD to design a computationally efficient and practical testing algorithm while preserving statistical guarantees. Our main result is a finite-sample bound on the power of the proposed test for distributions that are sufficiently separated with respect to the MMD. The derived separation rate matches the known minimax optimal rate in this setting. We support our findings with a series of numerical experiments, emphasizing realistic scientific data.
Published papers
Approximation de Nyström gloutonne par minimisation de la trace
We introduce a greedy algorithm for the construction of kernel Nyström approximations. The targeted objective is the residual trace of the data kernel matrix, which is approximated using random Fourier features and sketching. We motivate theoretically the choice of such a criterion when the induced features are used for kernel k-means clustering, and show empirically that the proposed algorithm indeed outperforms other landmark selection methods classically used in the literature on this task.
Linear quadratic control of nonlinear systems with Koopman operator learning and the Nyström method
Edoardo Caldarelli, Antoine Chatalic, Adrià Colomé, Cesare Molinari, Carlos Ocampo-Martinez, Carme Torras, Lorenzo Rosasco . Automatica
In this paper, we study how the Koopman operator framework can be combined with kernel methods to effectively control nonlinear dynamical systems. While kernel methods have typically large computational requirements, we show how random subspaces (Nyström approximation) can be used to achieve huge computational savings while preserving accuracy. Our main technical contribution is deriving theoretical guarantees on the effect of the Nyström approximation. More precisely, we study the linear quadratic regulator problem, showing that the approximated Riccati operator converges at the rate m−1/2, and the regulator objective, for the associated solution of the optimal control problem, converges at the rate m−1, where m is the random subspace size. Theoretical findings are complemented by numerical experiments corroborating our results.
Efficient Numerical Integration in Reproducing Kernel Hilbert Spaces via Leverage Scores Sampling
Antoine Chatalic, Nicolas Schreuder, Ernesto De Vito, Lorenzo Rosasco . Journal of Machine Learning Research
Heteroscedastic Gaussian processes (HGPs) are kernel-based, non-parametric models that can be used to infer nonlinear functions with time-varying noise. In robotics, they can be employed for learning from demonstration as motion primitives, i.e. as a model of the trajectories to be executed by the robot. HGPs provide variance estimates around the reference signal modeling the trajectory, capturing both the predictive uncertainty and the motion variability. However, similarly to standard Gaussian processes they suffer from a cubic complexity in the number of training points, due to the inversion of the kernel matrix. The uncertainty can be leveraged for more complex learning tasks, such as inferring the variable impedance profile required from a robotic manipulator. However, suitable approximations are needed to make HGPs scalable, at the price of potentially worsening the posterior mean and variance profiles. Motivated by these observations, we study the combination of HGPs and random features, which are a popular, data-independent approximation strategy of kernel functions. In a theoretical analysis, we provide novel guarantees on the approximation error of the HGP posterior due to random features. Moreover, we validate this scalable motion primitive on real robot data, related to the problem of variable impedance learning. In this way, we show that random features offer a viable and theoretically sound alternative for speeding up the trajectory processing, without sacrificing accuracy.
Estimating Koopman operators with sketching to provably learn large scale dynamical systems
Giacomo Meanti, Antoine Chatalic, V. Kostić, P. Novelli, M. Pontil, L. Rosasco . NeurIPS 2023
The theory of Koopman operators allows to deploy non-parametric machine learning algorithms to predict and analyze complex dynamical systems. Estimators such as principal component regression (PCR) or reduced rank regression (RRR) in kernel spaces can be shown to provably learn Koopman operators from finite empirical observations of the system's time evolution. Scaling these approaches to very long trajectories is a challenge and requires introducing suitable approximations to make computations feasible. In this paper, we boost the efficiency of different kernel-based Koopman operator estimators using random projections (sketching). We derive, implement and test the new"sketched"estimators with extensive experiments on synthetic and large-scale molecular dynamics datasets. Further, we establish non asymptotic error bounds giving a sharp characterization of the trade-offs between statistical learning rates and computational efficiency. Our empirical and theoretical analysis shows that the proposed estimators provide a sound and efficient way to learn large scale dynamical systems. In particular our experiments indicate that the proposed estimators retain the same accuracy of PCR or RRR, while being much faster.
M2M: A general method to perform various data analysis tasks from a differentially private sketch
Florimond Houssiau, Vincent Schellekens, Antoine Chatalic, Shreyas Kumar Annamraju, Yves-Alexandre de Montjoye . 18th International Workshop on Security and Trust Management (STM 2022)
Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment (M$^2$M) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.
Nyström Kernel Mean Embeddings
Antoine Chatalic, Nicolas Schreuder, Lorenzo Rosasco, Alessandro Rudi . Proceedings of the 39th International Conference on Machine Learning
Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nyström method, which exploits a small random subset of the dataset. Our main result is an upper bound on the approximation error of this procedure. It yields sufficient conditions on the subsample size to obtain the standard (1/sqrt(n)) rate while reducing computational costs. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules, and we illustrate our theoretical findings with numerical experiments.
Mean Nyström Embeddings for Adaptive Compressive Learning
Antoine Chatalic, Luigi Carratino, Ernesto De Vito, Lorenzo Rosasco . International Conference on Artificial Intelligence and Statistics
Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nyström approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nyström sketches indeed outperform those built with random features.
Sketching Data Sets for Large-Scale Learning: Keeping only what you need
Rémi Gribonval, Antoine Chatalic, Nicolas Keriven, Vincent Schellekens, Laurent Jacques, Philip Schniter . IEEE Signal Processing Magazine
Big data can be a blessing: with very large training data sets it becomes possible to perform complex learning tasks with unprecedented accuracy. Yet, this improved performance comes at the price of enormous computational challenges. Thus, one may wonder: Is it possible to leverage the information content of huge data sets while keeping computational resources under control? Can this also help solve some of the privacy issues raised by large-scale learning? This is the ambition of compressive learning, where the data set is massively compressed before learning. Here, a "sketch" is first constructed by computing carefully chosen nonlinear random features [e.g., random Fourier (RF) features] and averaging them over the whole data set. Parameters are then learned from the sketch, without access to the original data set. This article surveys the current state of the art in compressive learning, including the main concepts and algorithms, their connections with established signal processing methods, existing theoretical guarantees on both information preservation and privacy preservation, and important open problems. For an extended version of this article that contains additional references and more in-depth discussions on a variety of topics, see [1].
Compressive learning with privacy guarantees
Antoine Chatalic, Vincent Schellekens, Florimond Houssiau, Yves-Alexandre De Montjoye, Laurent Jacques, Rémi Gribonval . Information and Inference: A Journal of the IMA
This work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, called a sketch vector, from which the learning task is then performed. We provide sharp bounds on the so-called sensitivity of this sketching mechanism. This allows us to leverage standard techniques to ensure differential privacy—a well-established formalism for defining and quantifying the privacy of a random mechanism—by adding Laplace of Gaussian noise to the sketch. We combine these standard mechanisms with a new feature subsampling mechanism, which reduces the computational cost without damaging privacy. The overall framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis, for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions.
Efficient and Privacy-Preserving Compressive Learning
Clustering of large-scale collections can be performed efficiently based on low-dimensional sketches obtained by averaging random Fourier features of the items in the training collection. Some prior knowledge about the data distribution is however required to design the sketching mechanism and optimize its performance. We show empirically the importance of estimating the inter-cluster separation, and give a proof of concept of how to learn it.
Projections aléatoires pour l'apprentissage compressif
Antoine Chatalic, Nicolas Keriven, Rémi Gribonval . Gretsi
L'apprentissage compressif a pour objectif de réduire drastiquement le volume de grandes collections d'entraînement via des sortes de projections aléatoires, tout en préservant l'information nécessaire à l'apprentissage. En s'appuyant sur quelques exemples en apprentissage non-supervisé, nous proposons un tour d'horizon des outils utilisés, des garanties théoriques à la fois en termes de préservation d'information et de respect de la vie privée, pour finir avec quelques problèmes ouverts.
Compressive k-Means with Differential Privacy
Vincent Schellekens, Antoine Chatalic, Florimond Houssiau, Yves-Alexandre de Montjoye, Laurent Jacques, Rémi Gribonval . SPARS workshop
Vincent Schellekens, Antoine Chatalic, Florimond Houssiau, Yves-Alexandre De Montjoye, Laurent Jacques, Rémi Gribonval . 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
This work addresses the problem of learning from large collections of data with privacy guarantees. The sketched learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, from which the learning task is then performed. We modify the standard sketching mechanism to provide differential privacy, using addition of Laplace noise combined with a subsampling mechanism (each moment is computed from a subset of the dataset). The data can be divided between several sensors, each applying the privacy-preserving mechanism locally, yielding a differentially-private sketch of the whole dataset when reunited. We apply this framework to the k-means clustering problem, for which a measure of utility of the mechanism in terms of a signal-to-noise ratio is provided, and discuss the obtained privacy-utility tradeoff.
Sketched Clustering via Hybrid Approximate Message Passing
Evan Byrne, Antoine Chatalic, Rémi Gribonval, Philip Schniter . IEEE Transactions on Signal Processing
In sketched clustering, a dataset of T samples is first sketched down to a vector of modest size, from which the centroids are subsequently extracted. Its advantages include 1) reduced storage complexity and 2) centroid extraction complexity independent of T. For the sketching methodology recently proposed by Keriven et al., which can be interpreted as a random sampling of the empirical characteristic function, we propose a sketched clustering algorithm based on approximate message passing. Numerical experiments suggest that our approach is more efficient than the state-of-the-art sketched clustering algorithm “CL-OMPR” (in both computational and sample complexity) and more efficient than k-means++ when T is large.
Large-Scale High-Dimensional Clustering with Fast Sketching
Antoine Chatalic, Rémi Gribonval, Nicolas Keriven . IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)