
Silvio Lattanzi
Silvio received his bachelor (2005), master (2007) and PhD(2011) degree from the Computer Science department of Sapienza University of Rome, under the supervision of Alessandro Panconesi. Silvio joined Google Research in the New York office in January 2011. Since April 2017 Silvio moved to Google Research Zurich.
Authored Publications
Sort By
Deletion Robust Non-Monotone Submodular Maximization over Matroids
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Journal of Machine Learning Research, 26 (2025), pp. 1-28
Preview abstract
Maximizing a submodular function is a fundamental task in machine learning and in this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(4.597+O(\eps))$-approximation algorithm with summary size $O( \frac{k+d}{\eps^2}\log \frac{k}{\eps})$ that is improved to a $(3.582+O(\eps))$-approximation with $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$ summary size when the objective is monotone. In the streaming setting we provide a $(9.435 + O(\eps))$-approximation algorithm with summary size and memory $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$; the approximation factor is then improved to $(5.582+O(\eps))$ in the monotone case.
View details
The Cost of Consistency: Submodular Maximization with Constant Recourse
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Ola Svensson
Proceedings of the 57th Annual ACM Symposium on Theory of Computing (2025), 1406–1417
Preview abstract
In this work, we study online submodular maximization and how the requirement of maintaining a stable solution impacts the approximation. In particular, we seek bounds on the best-possible
approximation ratio that is attainable when the algorithm is allowed to make, at most, a constant number of updates per step. We show a tight information-theoretic bound of $2/3$ for general monotone submodular functions and an improved (also tight) bound of $3/4$ for coverage functions. Since both these bounds are attained by non poly-time algorithms, we also give a poly-time randomized algorithm that achieves a $0.51$-approximation. Combined with an
information-theoretic hardness of $1/2$ for deterministic algorithms from prior work, our work thus shows a separation between deterministic and randomized algorithms, both information theoretically and for poly-time algorithms.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning (2024)
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning, PMLR (2024), pp. 11979-11991
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning, PMLR (2024), pp. 11979-11991
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning (2024), pp. 11979-11991
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Preview abstract
Learning graph cluster structure using few queries is a classical question in property testing, with the fundamental special case, namely expansion testing, considered in the seminal work of Goldreich and Ron[STOC'96].
The most recent result in this line of work, due to Gluch et al.[SODA'21], designs {\em clustering oracles} for $(k, \e)$-clusterable graphs. These oracles, given a graph whose vertex set can be partitioned into a disjoint union of $k$ clusters (i.e., good expanders) with outer conductances bounded by $\e\ll 1$, provide query access to an $O(\e \log k)$-approximation to this ground truth clustering in time $\approx 2^{\text{poly}(k/\e)} n^{1/2+O(\e)}$ per query.
In this paper we show that it is possible to learn the {\em hierarchical} cluster structure of $(k, \e)$-clusterable graphs in sublinear time. First, we show how to simulate the hierarchical clustering algorithm of Charikar-Chatziafratis[SODA'17] to approximate the Dasgupta cost of a $k$-clusterable graph to within a factor of $O(\sqrt{\log k})$ in $\approx \text{poly}(k)\cdot n^{1/2+O(\e)}$ time assuming oracle access to the clustering. Second, we introduce a natural hierarchical model of clusterable graphs, and give a bona fide clustering oracle for this model, i.e. a small space data structure that can answer hierarchical clustering queries in $\approx\text{poly}(k) \cdot n^{1/2+O(\e)}$ time per query. Notably, in both cases the query time depends on polynomially on the number $k$ of clusters. The second result is the main technical contribution of the paper, and relies on several structural properties of hierarchically clusterable graphs that we hope will be of independent interest in sublinear time spectral graph algorithms.
View details
Active Learning of Classifiers with Label and Seed Queries
Andrea Paudice
Marco Bressan
Maximilian Thiessen
Nicolo Cesa-Bianchi
NeurIPS 2022 (to appear)
Preview abstract
We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \R^m$, we want to learn any unknown classifier on $X$ whose classes have finite \emph{strong convex hull margin}, a new notion extending the SVM margin.
On the other hand, using the more powerful \emph{seed} queries (a variant of equivalence queries), the target classifier could be learned in $\scO(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient.
In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\poly(n+m)$ using only $\scO(m^2 \log n)$ label queries and $\scO\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement these upper bounds by showing that in the worst case any algorithm needs $\Omega\big(\frac{k m \log \nicefrac{1}{\gamma}}{\log m}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.
View details
Near-Optimal Correlation Clustering with Privacy
Ashkan Norouzi Fard
Chenglin Fan
Jakub Tarnawski
Slobodan Mitrović
NeurIPS 2022 (2022) (to appear)
Preview abstract
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labeling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our additive error is stronger than the one shown in prior work and is optimal up to polylogarithmic factors for fixed privacy parameters.
View details
Preview abstract
Designing algorithms for machine learning problems beyond worst-case analysis and, in particular,analyzing the effect of side-information on the complexity of such problems is a very important line of research with many practical applications. In this paper we study the classic k-means clustering problem in the presence of noisy labels. In this problem we receive as input a set of points and a set of clustering labels generated by either an adversarial or random perturbation of the optimal solution. Our main goal is to formally study the effect of this extra information on the complexity of the k-means problem. In particular, in the context of random perturbations, we give an efficient algorithm that finds a clustering of cost within a factor 1 +o(1)of optimum even when the label of each point is perturbed with a large probability (think 99%). In contrast, we show that side-information with adversarial perturbations is as hard as the original problem even if only a small fraction of the labels are perturbed. We complement this negative result by giving a simple algorithm in the case when the adversary is only allowed to perturb anfraction of each cluster.
View details