
Haim Kaplan
I work on data structures, algorithms, computational geometry and machine learning. Right now my main focus is on online learning, reinforcement learning, and clustering. Typically my research is theoretical. I am also a faculty at the school of computer science of Tel Aviv University.
Authored Publications
Sort By
Preview abstract
We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines.
View details
Preview abstract
We introduce efficient differentially private (DP) algorithms for several linear algebraic tasks, including solving linear equalities over arbitrary fields, linear inequalities over the reals, and computing affine spans and convex hulls. As an application, we obtain efficient DP algorithms for learning halfspaces and affine subspaces. Our algorithms addressing equalities are strongly polynomial, whereas those addressing inequalities are weakly polynomial. Furthermore, this distinction is inevitable: no DP algorithm for linear programming can be strongly polynomial-time efficient.
View details
Preview abstract
We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically,
* We introduce a new definitional paradigm -- Narcissus Resiliency -- to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security. Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes.
* We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.
View details
Preview abstract
In this work we revisit an interactive variant of joint differential privacy, recently introduced by Naor et al. [2023], and generalize it towards handling online processes in which existing privacy definitions seem too restrictive. We study basic properties of this definition and demonstrate that it satisfies (suitable variants) of group privacy, composition, and post processing.
In order to demonstrate the advantages of this privacy definition compared to traditional forms of differential privacy, we consider the basic setting of online classification. We show that any (possibly non-private) learning rule can be effectively transformed to a private learning rule with only a polynomial overhead in the mistake bound. This demonstrates a stark difference with traditional forms of differential privacy, such as the one studied by Golowich and Livni [2021], where only a double exponential overhead in the mistake bound is known (via an information theoretic upper bound).
View details
Preview abstract
We introduce the concurrent shuffle model of differential privacy. In this model we have multiple concurrent shufflers permuting messages from different, possibly overlapping, batches of users. Similarly to the standard (single) shuffle model, the privacy requirement is that the concatenation of all shuffled messages should be differentially private. We study the private continual summation problem (a.k.a. the counter problem) and show that
the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model. Specifically, we give a summation algorithm with error $\Tilde{O}(n^{1/(2k+1)})$ with $k$ concurrent shufflers on a sequence of length $n$. Furthermore, we prove that this bound is tight for any $k$, even if the algorithm can choose the sizes of the batches adaptively. For $k=\log n$ shufflers, the resulting error is polylogarithmic, much better than $\Tilde{\Theta}(n^{1/3})$ which we show is the smallest possible with a single shuffler.
We use our online summation algorithm to get algorithms with improved regret bounds for the contextual linear bandit problem. In particular we get optimal $\Tilde{O}(\sqrt{n})$ regret with $k= \Tilde{\Omega}(\log n)$ concurrent shufflers.
View details
Differentially-Private Bayes Consistency
Olivier Bousquet
Aryeh Kontorovich
Shay Moran
Menachem Sadigurschi
Archive, Archive, Archive
Preview abstract
We construct a universally Bayes consistent learning rule
which satisfies differential privacy (DP).
We first handle the setting of binary classification
and then extend our rule to the more
general setting of density estimation (with respect to the total variation metric).
The existence of a universally consistent DP learner
reveals a stark difference with the distribution-free PAC model.
Indeed, in the latter DP learning is extremely limited:
even one-dimensional linear classifiers
are not privately learnable in this stringent model.
Our result thus demonstrates that by allowing
the learning rate to depend on the target distribution,
one can circumvent the above-mentioned impossibility result
and in fact learn \emph{arbitrary} distributions by a single DP algorithm.
As an application, we prove that any VC class can be privately learned in a semi-supervised
setting with a near-optimal \emph{labelled} sample complexity of $\tilde O(d/\eps)$ labeled examples
(and with an unlabeled sample complexity that can depend on the target distribution).
View details
Preview abstract
The amount of training-data is one of the key factors which determines the generalization capacity of learning algorithms. Intuitively, one expects the error rate to decrease as the amount of training-data increases. Perhaps surprisingly, natural attempts to formalize this intuition give rise to interesting and challenging mathematical questions. For example, in their classical book on pattern recognition, Devroye, Gyorfi, and Lugosi (1996) ask whether there exists a monotone Bayes-consistent algorithm. This question remained open for over 25 years, until recently Pestov (2021) resolved it for binary classification, using an intricate construction of a monotone Bayes-consistent algorithm.
We derive a general result in multiclass classification, showing that every learning algorithm A can be transformed to a monotone one with similar performance. Further, the transformation is efficient and only uses a black-box oracle access to A. This demonstrates that one can provably avoid non-monotonic behaviour without compromising performance, thus answering questions asked by Devroye, Gyorfi, and Lugosi (1996), Viering, Mey, and Loog (2019), Viering and Loog (2021), and by Mhammedi (2021).
Our general transformation readily implies monotone learners in a variety of contexts: for example, Pestov’s result follows by applying it on any Bayes-consistent algorithm (e.g., k-Nearest-Neighbours). In fact, our transformation extends Pestov’s result to classification tasks with an arbitrary number of labels. This is in contrast with Pestov’s work which is tailored to binary classification.
In addition, we provide uniform bounds on the error of the monotone algorithm. This makes our transformation applicable in distribution-free settings. For example, in PAC learning it implies that every learnable class admits a monotone PAC learner. This resolves questions asked by Viering, Mey, and Loog (2019); Viering and Loog (2021); Mhammedi (2021).
View details
Preview abstract
We present differentially private efficient algorithms for learning polygons in the plane (which are not necessarily convex). Our algorithm achieves $(\alpha,\beta)$-PAC learning and $(\eps,\delta)$-differential privacy using a sample of size $O\left(\frac{k}{\alpha\eps}\log\left(\frac{|X|}{\beta\delta}\right)\right)$, where the domain is $X\times X$ and $k$ is the number of edges in the (potentially non-convex) polygon.
View details
Preview abstract
In this work we study the problem of differentially private (DP) quantiles, in which given data $X$ set and quantiles $q_1, ..., q_m \in [0,1]$, we want to output $m$ quantile estimations such that the estimation is as close as possible to the optimal solution and preserves DP. In this work we provide \algoname~(AQ), an algorithm and implementation for the DP-quantiels problem. We analyze our algorithm and provide a mathematical proof of its error bounds for the general case and for the concrete case of uniform quantiles utility. We also experimentally evaluate \algoref~and conclude that it obtains higher accuracy than the existing baselines while having lower run time. We reduce the problem of DP-data-sanitization to the DP-uniform-quantiles problem and analyze the resulting mathematical bounds for the error in this case. We analyze our algorithm under the definition of zero Concentrated Differential Privacy (zCDP), and supply the error guarantees of our \algoref~in this case. Finally, we show the empirical benefit our algorithm gains under the zCDP definition.
View details
Preview abstract
Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results.
We propose a simple and practical tool $\mathsf{FriendlyCore}$ that takes a set of points $\cD$ from an unrestricted (pseudo) metric space as input. When $\cD$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a ``stable'' subset $\cC \subseteq \cD$ that includes all points, except possibly few outliers, and is {\em guaranteed} to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods.
View details