 Combinatorial Optimization: Approximation algorithms for problems like Graph Partitioning, finding Dense Subgraphs.
 Theoretical Machine Learning: Unsupervised learning, and learning probabilistic models using tools like Tensor decompositions.
 Beyond WorstCase Analysis:
Realistic AverageCase instances and Smoothed analysis of algorithms.

AlphaExpansion is Exact on Stable Instances
(With Hunter Lang and David Sontag).
Preprint. [ArXiv Abstract]Approximate algorithms for structured prediction problemssuch as the popular alphaexpansion algorithm (Boykov et al. 2001) in computer visiontypically far exceed their theoretical performance guarantees on realworld instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alphaexpansion on MAP inference in Ferromagnetic Potts models (FPMs). Our main results use the connection between energy minimization in FPMs and the Uniform Metric Labeling problem to give a stability condition under which the alphaexpansion algorithm provably recovers the optimal MAP solution. This theoretical result complements the numerous empirical observations of alphaexpansion's performance. Additionally, we give a different stability condition under which an LPbased algorithm recovers the optimal solution.

On Learning Mixtures of WellSeparated Gaussians
(With Oded Regev).
FOCS 2017. [ArXivPDF  Abstract]We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of $k$ standard spherical Gaussians with means \mu_1,\ldots,\mu_k in R^d, and the goal is to estimate the means up to accuracy $\delta$ using $poly(k,d, 1/\delta)$ samples. In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly $\min\{k,d\}^{1/4}$. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation $o(1)$, exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.
1. We show that with separation $o(\sqrt{\log k})$, superpolynomially many samples are required. In fact, this holds even when the $k$ means of the Gaussians are picked at random in $d=O(\log k)$ dimensions.
2. We show that with separation $\Omega(\sqrt{\log k})$, $\poly(k,d,1/\delta)$ samples suffice. Notice that the bound on the separation is independent of $\delta$. This result is based on a new and efficient ``accuracy boosting'' algorithm that takes as input coarse estimates of the true means and in time (and samples) $\poly(k,d, 1/\delta)$ outputs estimates of the means up to arbitrarily good accuracy $\delta$ assuming the separation between the means is $\Omega(\min\{\sqrt{\log k},\sqrt{d}\})$ (independently of $\delta$). The idea of the algorithm is to iteratively solve a ``diagonally dominant'' system of nonlinear equations.
We also (1) present a computationally efficient algorithm in $d=O(1)$ dimensions with only $\Omega(\sqrt{d})$ separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of $k$ spherical Gaussians with polynomial samples.

Clustering Stable Instances of Euclidean kmeans
(With Abhratanu Dutta and Alex Wang).
NIPS 2017. 
Approximation Algorithms for Label Cover and the LogDensity Threshold
(With Eden Chlamtac, Pasin Manurangsi and Dana Moshkovitz).
SODA 2017. [ArXivSODA  Abstract]Many known optimal NPhardness of approximation results are reductions from a problem called \textsc{LabelCover}. The input is a bipartite graph $G=(L,R,E)$ and each edge $e=(x,y)\in E$ carries a projection $\pi_e$ that maps labels to $x$ to labels to $y$. The objective is to find a labeling of the vertices that satisfies as many of the projections as possible. It is believed that the best approximation ratio efficiently achievable for LabelCover is of the form $N^{c}$ where $N=nk$, $n$ is the number of vertices, $k$ is the number of labels, and $c \in (0,1)$ is some constant.
Inspired by a framework originally developed for Densest $k$Subgraph, we propose a ``log density threshold'' for the approximability of LabelCover. Specifically, we suggest the possibility that the LabelCover approximation problem undergoes a computational phase transition at the same threshold at which local algorithms for its random counterpart fail. This threshold is $N^{32\sqrt{2}}\approx N^{0.17}$. We then design, for any $\varepsilon>0$, a polynomialtime approximation algorithm for {\em semirandom} \textsc{LabelCover} whose approximation ratio is $N^{32\sqrt{2}+\varepsilon}$. In our semirandom model, the input graph is random (or even just expanding), and the projections on the edges are arbitrary.
For worstcase LabelCover we show a polynomialtime algorithm whose approximation ratio is roughly $N^{0.233}$. The previous best efficient approximation ratio was $N^{0.25}$. We present some evidence towards an $N^{c}$ threshold by constructing integrality gaps for $N^{\Omega(1)}$ rounds of the Sumofsquares/Lasserre hierarchy of the natural relaxation of Label Cover. For general 2CSP the ``log density threshold'' is $N^{0.25}$, and we give a polynomialtime algorithm in the semirandom model whose approximation ratio is $N^{0.25+\varepsilon}$ for any $\varepsilon>0$.

Learning Communities in the Presence of Errors
(With Konstantin Makarychev and Yury Makarychev).
COLT 2016. [ ArXiv  Abstract]We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors. In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, FeigeKilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add o(n) edges. Our work answers this question affirmatively even in the case of k>2 communities.
We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is \epsilon m close to SBM in the KullbackLeibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.

Beating the Random Assignment on Constraint Satisfaction Problems of Bounded Degree
(With Boaz Barak, Ankur Moitra, Ryan O'Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, David Witmer and John Wright).
APPROX 2015. [ ArXiv  Abstract]We show that for any odd k and any instance of the MaxkXOR constraint satisfaction problem, there is an efficient algorithm that finds an assignment satisfying at least a 1/2+ c/\sqrt{D} fraction of constraints, where D is a bound on the number of constraints that each variable occurs in. This improves both qualitatively and quantitatively on the recent work of Farhi, Goldstone, and Gutmann (2014), which gave a quantum algorithm to find an assignment satisfying a 1/2+ D^{−3/4} fraction of the equations. For arbitrary constraint satisfaction problems, we give a similar result for "trianglefree" instances; i.e., an efficient algorithm that finds an assignment satisfying at least a \mu +c/\sqrt{D} fraction of constraints, where \mu is the fraction that would be satisfied by a uniformly random assignment.

Correlation Clustering with Noisy, Partial Information
(With Konstantin Makarychev and Yury Makarychev).
COLT 2015. [ ArXiv  Abstract]In this paper, we propose and study a semirandom model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value (1+\delta)optcost+O_\delta(n log^3 n) with high probability, where optcost is the value of the optimal solution (for every \delta>0). The second algorithm finds the ground truth clustering with an arbitrarily small classification error (under some additional assumptions on the instance).

Learning Mixtures of Ranking Models
(With Pranjal Awasthi, Avrim Blum and Or Sheffet).
NIPS 2014 . [ ArXiv Abstract]This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model . Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima.
We present the first polynomial time algorithm which provable learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the topk prefix in both the rankings. In particular, our results imply that in order to recover the top krankings from a heterogeneous population, it is enough for each user to just specify their top3 (unordered) preferences.

Constant Factor Approximations for Balanced Cut in the random PIE model
(With Konstantin Makarychev and Yury Makarychev).
STOC 2014. [ ArXiv  Abstract]We propose and study a new averagecase model for Balanced Cut, a planted model with permutationinvariant random edges (PIE). Our model is much more general than previously studied averagecase models like semirandom models, random planted partitioning models and stochastic block models, and can capture realworld network models like preferential attachment models. We give constant factor approximations for Balanced Cut in this model.
The model : Consider a set of vertices V partitioned into two clusters L and R of equal size. Let G be an arbitrary graph on V with no edges between L and R. Let E_random be a set of edges sampled from an arbitrary permutationinvariant distribution (a distribution that is invariant under renaming of vertices in L and in R). Then we say that G + E_random is a graph with permutationinvariant random edges. For example, in a social network formed by the superposition of separate networks representing different types of ties (say geographic and professional), these different networks are independent.
We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost O(E_random) + n.polylog(n) in this model. In the most interesting regime, this is a constant factor approximation with respect to the cost of the planted cut.

Smoothed Analysis of Tensor Decompositions
(With Aditya Bhaskara, Moses Charikar and Ankur Moitra)
STOC 2014. [ PDF  ArXiv  Abstract ]Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error  a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, these are not known to be stable to noise.
Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multiview models and mixtures of axisaligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.

Uniqueness of Tensor Decompositions and Polynomial Identifiability of Latent Variable Models
(With Aditya Bhaskara and Moses Charikar)
COLT 2014. [PDF  ArXiv  Abstract ]We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal's rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error.
Kruskal's theorem has found many applications in proving the identifiability of parameters for variouslatent variable models and mixture models such as Hidden Markov models, topic models etc. Our robust version immediately implies identifiability using only polynomially many samples in many of these settings. This polynomial identifiability is an essential first step towards efficient learning algorithms for these models.
Recently, algorithms based on tensor decompositions have been used to estimate the parameters of various hidden variable models efficiently in special cases as long as they satisfy certain "nondegeneracy" properties. Our methods give a way to go beyond this nondegeneracy barrier, and establish polynomial identifiability of the parameters under much milder conditions. Given the importance of Kruskal's theorem in the tensor literature, we expect that this robust version will have several applications beyond the settings we explore in this work.
 BiluLinial Stable Instances of Max Cut and Minimum Multiway Cut
(With Konstantin Makarychev and Yury Makarychev)
SODA 2014. [PDF  ArXiv  Abstract]We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomialtime algorithm for $\gamma$stable Max Cut instances with $\gamma >= c\sqrt{log n}loglog n$ for some absolute constant $c > 0$. Our algorithm is robust: it never returns an incorrect answer; if the instance is $\gamma$stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not $\gamma$stable. We prove that there is no robust polynomialtime algorithm for $\gamma$stable instances of Max Cut when $\gamma < \alpha_{SC}(n/2)$, where $\alpha_{SC}$ is the best approximation factor for Sparsest Cut with nonuniform demands.
Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with $\ell_2^2$ triangle inequalities) is integral if $\gamma \geq D_{\ell_2^2\to \ell_1}(n)$, where $D_{\ell_2^2\to \ell_1}(n)$ is the least distortion with which every n point metric space of negative type embeds into $\ell_1$. On the negative side, we show that the SDP relaxation is not integral when $\gamma < D_{\ell_2^2\to \ell_1}(n/2)$. Moreover, there is no tractable convex relaxation for $\gamma$stable instances of Max Cut when $\gamma < \alpha_{SC}(n/2)$. That suggests that solving $\gamma$stable instances with $\gamma =o(\sqrt{\log n})$ might be difficult or impossible.
Our results significantly improve previously known results. The best previously known algorithm for $\gamma$stable instances of Max Cut required that $\gamma \geq c\sqrt{n}$ (for some $c > 0$) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.
 Sorting noisy data with partial information
(With Konstantin Makarychev and Yury Makarychev)
ITCS 2013. [PDF  Abstract]nd the minimum set of edges (arcs) whose removal makes a given graph a directed acyclic graph (DAG). The interest in FAS stems from its numerous applications in sorting and ranking based on pairwise information, removing cyclic dependencies etc. However, the best known algorithm for this problem only gives a polylogarithmic approximation in the worstcase, and constant factor approximations have been elusive.Since realworld instances are unlikely to behave like worstcase instances, a compelling question is : can we design better algorithms for realistic average case instances? We propose two realistic averagecase models for the problem, and present new approximation algorithms which obtain a PTAS and constant factor approximations in these two models.
 Approximation algorithms for Semirandom Graph Partitioning problems
(With Konstantin Makarychev and Yury Makarychev)
STOC 2012. [PDF  ArXiv  Abstract]We propose and study a new semirandom model for graph partitioning problems. We believe that it captures many properties of real
WThe model is more flexible than the semirandom model of Feige and Kilian and planted random model of Bui, Chaudhuri, Leighton and Sipser.
In our semirandom graph partitioning model, we need just edges across different clusters to be (semi)random. The edges inside clusters can be completely arbitrary (unlike previous models).
We develop a general framework for solving semirandom instances using new SDPbased techniques and apply it to several problems of interest. We present constant factor approximation algorithms for semirandom instances of the Balanced Cut, Multicut, Min Uncut and SmallSet Expansion problems. We also show how to almost recover the optimal solution if the instance satisfies an additional expanding condition. Our algorithms work in a wider range of parameters than algorithms for previously studied random and semirandom models.
 Polynomial Integrality gaps for Strong relaxations of Densest ksubgraph
(With Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami and Yuan Zhou)
SODA 2012. [PDF  ArXiv  Abstract]The densest ksubgraph (DkS) problem (i.e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for DkS: the current best algorithm gives an ~ $O(n^{1/4})$ approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P != NP. In addition to interest in designing better algorithms, a number of recent results have exploited the conjectured hardness of densest ksubgraph and its variants. Thus, understanding the approximability of DkS is an important challenge. In this work, we give evidence for the hardness of approximating DkS within polynomial factors. Specifically, we expose the limitations of strong semidefinite programs from SDP hierarchies in solving densest ksubgraph. Our results include:
* A lower bound of $\Omega(n^{1/4}/\log^3 n)$ on the integrality gap for $\Omega(\log n/\log \log n)$ rounds of the SheraliAdams relaxation for DkS. This also holds for the relaxation obtained from SheraliAdams with an added SDP constraint. Our gap instances are in fact ErdosRenyi random graphs.
* For every $\epsilon > 0$, a lower bound of $n^{2/53\epsilon}$ on the integrality gap of $n^{\Omega(\epsilon)}$ rounds of the Lasserre SDP relaxation for DkS, and an $n^{Omega_\epsilon(1)}$ gap for $n^{1\epsilon}$ rounds. Our construction proceeds via a reduction from random instances of a certain MaxCSP over large domains. In the absence of inapproximability results for DkS, our results show that even the most powerful SDPs are unable to beat a factor of $n^{\Omega(1)}$, and in fact even improving the best known $n^{1/4}$ factor is a barrier for current techniques.  Algorithms and Hardness of the kroute cuts problem
(With Julia Chuzhoy, Yury Makarychev and Yuan Zhou)
SODA 2012. [PDF  ArXiv  Abstract]We study the kroute cut problem: given an undirected edgeweighted graph G=(V,E), a collection {(s_1,t_1),(s_2,t_2),...,(s_r,t_r)} of sourcesink pairs, and an integer connectivity requirement k, the goal is to find a minimumweight subset E' of edges to remove, such that the connectivity of every pair (s_i, t_i) falls below k. Specifically, in the edgeconnectivity version, ECkRC, the requirement is that there are at most (k1) edgedisjoint paths connecting s_i to t_i in G \ E', while in the vertexconnectivity version, NCkRC, the same requirement is for vertexdisjoint paths. Prior to our work, polylogarithmic approximation algorithms have been known for the special case where k >= 3, but no nontrivial approximation algorithms were known for any value k>3, except in the singlesource setting. We show an O(k log^{3/2}r)approximation algorithm for ECkRC with uniform edge weights, and several polylogarithmic bicriteria approximation algorithms for ECkRC and NCkRC, where the connectivity requirement k is violated by a constant factor. We complement these upper bounds by proving that NCkRC is hard to approximate to within a factor of k^{eps} for some fixed eps>0.
We then turn to study a simpler version of NCkRC, where only one sourcesink pair is present. We give a simple bicriteria approximation algorithm for this case, and show evidence that even this restricted version of the problem may be hard to approximate. For example, we prove that the single sourcesink pair version of NCkRC has no constantfactor approximation, assuming Feige's Random kAND assumption.
 On Quadratic Programming with a ratio objective
(With Aditya Bhaskara, Moses Charikar and Rajsekar Manokaran)
ICALP 2012. [PDF  ArXiv ]
 Approximating the matrix pnorm
(With Aditya Bhaskara)
SODA 2011. [PDF  Abstract]We consider the problem of computing the q>p norm of a matrix A, which is defined for p,q >= 1, as A_{q>p} = Max_{x} Ax_{p}, where x_{q}=1
This is in general a nonconvex optimization problem, and is a natural generalization of the wellstudied question of computing singular values ( when p=q=2). Different settings of parameters give rise to a variety of known interesting problems (such as the Grothendieck problem when p=1 and q=\infty ). Our first result is an efficient algorithm for computing the q>p norm of matrices with nonnegative entries, when q >= p >= 1. The algorithm we analyze can be seen as an analog of power iteration for computing eigenvalues.
We then present an application of our techniques in the oblivious routing setting: we make constructive a recent existential result of Englert and Racke on O(log n) competitive oblivious routing schemes in the l_p norm.
On the other hand, for general matrices, we show "almost polynomial" factor hardness for 2 < p <= q and p <= q < 2 if NP does not have quasipolynomial time algorithms.  Detecting High LogDensities  an O(n^{1/4}) Approximation for Densest kSubgraph
(With Aditya Bhaskara, Moses Charikar, Eden Chlamtac and Uri Feige)
STOC 2010. [PDF  Abstract]In the Densest kSubgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NPhard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg~\cite{FKP}, gives an approximation ratio of $n^{1/3  \epsilon}$ for some specific $\epsilon > 0$ (estimated by those authors at around $\epsilon = 1/60$).
We present an algorithm that for every $\epsilon> 0$ approximates the Densest $k$Subgraph problem within a ratio of $n^{1/4 + \epsilon}$ in time $n^{O(1/\epsilon)}$. If allowed to run for time $n^{O(\log n)}$, our algorithm achieves an approximation ratio of $O(n^{1/4})$. Our algorithm is inspired by studying an averagecase version of the problem where the goal is to distinguish random graphs from random graphs with planted dense subgraphs  the approximation ratio we achieve for the general case matches the ``distinguishing ratio'' we obtain for this planted problem. Achieving a distinguishing ratio of $o(n^{1/4})$ for the planted problem (in polynomial time) is beyond the reach of our current techniques.
At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in $G$, and using these counts to identify the vertices of the dense subgraph. Our algorithm is based on the following principle. We say that a graph $G(V,E)$ has logdensity $\alpha$ if its average degree is $\Theta(V^{\alpha})$. The algorithmic core of our result is a family of algorithms that output $k$subgraphs of nontrivial density whenever the logdensity of the densest $k$subgraph is larger than the logdensity of the host graph.
Finally, we extend this algorithm to obtain an $O(n^{1/4  \epsilon})$approximation algorithm which runs in time $O(2^{n^{O(\epsilon)}})$ and also explore various approaches to obtain better approximation algorithms in restricted parameter settings for random instances.

Graph Isomorphism: Approximate and Robust
(With Yu Wu, Yuichi Yoshida and Yuan Zhou).
 Noncooperative Bundling games
(With Ravishankar Krishnaswamy, Pandu Rangan Chandrasekaran and Ravi Sundaram)
[Manuscript Abstract]Bundling is a basic problem in commerce. We define and study the Bundling Game: each of n players with a unique product to sell can choose to sell their products in bundles with other players' products so as to maximize payoff; the players have knowledge of the valuations of m (potential) buyers for each of the products. We define two natural classes of payoff functions  Fair Profit Sharing Functions (FPSFs) and Incentive Functions (IFs). For FPSFs we prove the surprising impossibility result that there exists no Fair Profit Sharing Payoff Function that can always guarantee stability. We counterbalance this result by showing that pure Nash equilibria always exist for IFs and obtain tight bounds on the Price of Anarchy. In particular, the revenue maximizing bundling is a pure Nash equilibrium. Motivated by the bundling game, we present hardness results and approximation algorithms for finding revenue maximizing bundlings and bundles and associated variants.
Some older work includes