 Algorithms: Designing efficient algorithms for problems in combinatorial optimization and data analysis.
 Theoretical Machine Learning: Designing efficient and robust algorithms for problems in machine learning.
 Beyond WorstCase Analysis: Improved algorithmic guarantees using paradigms like realistic averagecase models, stable instances and smoothed analysis.

Efficient Tensor Decomposition.
Book Chapter in Beyond the Worst Case Analysis of Algorithms
Edited by Tim Roughgarden, Cambridge University Press 2020.
[ArXiv Book Link  Abstract]This chapter studies the problem of decomposing a tensor into a sum of constituent rank one tensors. While tensor decompositions are very useful in designing learning algorithms and data analysis, they are NPhard in the worstcase. We will see how to design efficient algorithms with provable guarantees under mild assumptions, and using beyond worstcase frameworks like smoothed analysis. 
Adversarial robustness via robust low rank representations.
(With Pranjal Awasthi, Himanshu Jain and Ankit Singh Rawat).
[ArXiv Abstract]Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees.
Our first contribution is for certified robustness to perturbations measured in ℓ2 norm. We exploit low rank data representations to provide improved guarantees over stateoftheart randomized smoothingbased approaches on standard benchmark datasets such as CIFAR10 and CIFAR100.
Our second contribution is for the more challenging setting of certified robustness to perturbations measured in ℓ∞ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to ℓ∞ perturbations in those representations. Our certificate of ℓ∞ robustness relies on a natural quantity involving the ∞→2 matrix operator norm associated with the representation, to translate robustness guarantees from ℓ2 to ℓ∞ perturbations.
A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.

Scheduling PrecedenceConstrained Jobs on Related Machines with Communication Delay.
(With Biswaroop Maiti, Rajmohan Rajaraman, David Stalfa and Zoya Svitkina).
FOCS 2020. [ArXiv Abstract]We consider the problem of scheduling $n$ precedenceconstrained jobs on $m$ uniformlyrelated machines in the presence of an arbitrary, fixed communication delay $\rho$. We consider a model that allows job duplication, i.e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i.e., its makespan) by a logarithmic factor. Our main result is an $O(\log m \log \rho/\log\log \rho)$approximation algorithm for minimizing makespan, assuming the minimum makespan is at least $\rho$. Our algorithm is based on rounding a linear programming relaxation for the problem, which includes carefully designed constraints capturing the interaction among communication delay, precedence requirements, varying speeds, and job duplication. Our result builds on two previous lines of work, one with communication delay but identical machines (Lepere, Rapine 2002) and the other with uniformlyrelated machines but no communication delay (Chudak, Shmoys 1999).
We next show that the integrality gap of our mathematical program is $\Omega(\sqrt{\log \rho})$. Our gap construction employs expander graphs and exploits a property of robust expansion and its generalization to paths of longer length. Finally, we quantify the advantage of duplication in scheduling with communication delay. We show that the best schedule without duplication can have makespan $\Omega(\rho/\log\rho)$ or $\Omega(\log m/\log\log m)$ or $\Omega(\log n/\log\log n)$ times that of an optimal schedule allowing duplication. Nevertheless, we present a polynomial time algorithm to transform any schedule to a schedule without duplication at the cost of a $O(\log^2 n\log m)$ factor increase in makespan. Together with our makespan approximation algorithm for schedules allowing duplication, this also yields a polylogarithmicapproximation algorithm for the setting where duplication is not allowed.

Estimating Principal Components under Adversarial Perturbations.
(With Pranjal Awasthi and Xue Chen).
COLT 2020. [ArXiv Abstract]Robustness is a key requirement for widespread deployment of machine learning algorithms, and has received much attention in both statistics and computer science. We study a natural model of robustness for highdimensional statistical estimation problems that we call the adversarial perturbation model. An adversary can perturb every sample arbitrarily up to a specified magnitude δ measured in some $\ell_q$ norm, say $\ell_\infty$. Our model is motivated by emerging paradigms such as low precision machine learning and adversarial training.
We study the classical problem of estimating the topr principal subspace of the Gaussian covariance matrix in high dimensions, under the adversarial perturbation model. We design a computationally efficient algorithm that given corrupted data, recovers an estimate of the top$r$ principal subspace with error that depends on a robustness parameter that we identify. This parameter corresponds to the $q \to 2$ operator norm of the projector onto the principal subspace, and generalizes wellstudied analytic notions of sparsity. Additionally, in the absence of corruptions, our algorithmic guarantees recover existing bounds for problems such as sparse PCA and its higher rank analogs. We also prove that the above dependence on the robustness parameter is almost optimal asymptotically, not just in a minimax sense, but remarkably for every instance of the problem. This instanceoptimal guarantee shows that the $q \to 2$ operator norm of the subspace essentially characterizes the estimation error under adversarial perturbations.

Adversarially Robust Low Dimensional Representations.
(With Pranjal Awasthi, Vaggos Chatziafratis and Xue Chen).
[ArXiv Abstract]Adversarial or test time robustness measures the susceptibility of a machine learning system to small perturbations made to the input at test time. This has attracted much interest on the empirical side, since many existing ML systems perform poorly under imperceptible adversarial perturbations to the test inputs. On the other hand, our theoretical understanding of this phenomenon is limited, and has mostly focused on supervised learning tasks.
In this work we study the problem of computing adversarially robust representations of data. We formulate a natural extension of Principal Component Analysis (PCA) where the goal is to find a low dimensional subspace to represent the given data with minimum projection error, and that is in addition robust to small perturbations measured in $\ell_q$ norm (say $q=\infty$). Unlike PCA which is solvable in polynomial time, our formulation is computationally intractable to optimize as it captures the wellstudied sparse PCA objective.
We show the following algorithmic and statistical results.  Polynomial time algorithms in the worstcase that achieve constant factor approximations to the objective while only violating the robustness constraint by a constant factor.  We prove that our formulation (and algorithms) also enjoy significant statistical benefits in terms of sample complexity over standard PCA on account of a ``regularization effect'', that is formalized using the wellstudied spiked covariance model.  Surprisingly, we show that our algorithmic techniques can also be made robust to corruptions in the training data, in addition to yielding representations that are robust at test time! Here an adversary is allowed to corrupt potentially every data point up to a specified amount in the $\ell_q$ norm. We further apply these techniques for mean estimation and clustering under adversarial corruptions to the training data.

On Robustness to Adversarial Examples and Polynomial Optimization.
(With Pranjal Awasthi and Abhratanu Dutta).
NeurIPS 2019.[ArxivSlides  Abstract]We study the design of computationally efficient algorithms with provable guarantees, that are robust to adversarial (test time) perturbations. While there has been an explosion of recent work on this topic due to its connections to test time robustness of deep networks, there is limited theoretical understanding of several basic questions including \emph{when and how can one design provably robust learning algorithms?}, and \emph{what is the price of achieving robustness to adversarial examples in a computationally efficient manner?}
The main contribution of this work is to exhibit a strong connection between achieving robustness to adversarial examples, and a rich class of polynomial optimization problems, thereby making progress on the above questions. In particular, we leverage this connection to (a) design the first computationally efficient robust algorithms with provable guarantees for a large class of hypothesis, namely linear classifiers and degree$2$ polynomial threshold functions~(PTFs), (b) give a precise characterization of the price of achieving robustness in a computationally efficient manner for these classes, (c) design efficient algorithms to certify robustness and generate adversarial attacks in a principled manner for $2$layer neural networks. We empirically demonstrate the effectiveness of these attacks on real data.

Smoothed Analysis in Unsupervised Learning via Decoupling
(With Aditya Bhaskara, Aidao Chen and Aidan Perreault).
FOCS 2019. [ArXiv Abstract]Smoothed analysis is a powerful paradigm in overcoming worstcase intractability in unsupervised learning and highdimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worstcase intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by lowdegree polynomials of a few base underlying random variables.
In this work, we address this challenge by obtaining highconfidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to obtain polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning:
1. Robust subspace recovery, when the fraction $\alpha$ of inliers in the ddimensional subspace $T \subset \mathbb{R}^n$ is at least $\alpha >(d/n)^{\ell}$ for any constant integer $\ell>0$. This contrasts with the known worstcase intractability when $\alpha< d/n$, and the previous smoothed analysis result which needed $\alpha > d/n$ (Hardt and Moitra, 2013).
2. Higher order tensor decompositions, where we generalize the socalled FOOBI algorithm of Cardoso to find order$\ell$ rankone tensors in a subspace. This allows us to obtain polynomially robust decomposition algorithms for $2\ell$'th order tensors with rank $O(n^{\ell})$.
3. Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in a smoothed analysis model. 
Block Stability for MAP inference
(With Hunter Lang and David Sontag).
AISTATS 2019. [ArXiv Abstract]To understand the empirical success of approximate MAP inference, recent work (Lang et al., 2018) has shown that some popular approximation algorithms perform very well when the input instance is stable. The simplest stability condition assumes that the MAP solution does not change at all when some of the pairwise potentials are (adversarially) perturbed. Unfortunately, this strong condition does not seem to be satisfied in practice. In this paper, we introduce a significantly more relaxed condition that only requires blocks (portions) of an input instance to be stable. Under this block stability condition, we prove that the pairwise LP relaxation is persistent on the stable blocks. We complement our theoretical results with an empirical evaluation of realworld MAP inference instances from computer vision. We design an algorithm to find stable blocks, and find that these real instances have large stable regions. Our work gives a theoretical explanation for the widespread empirical phenomenon of persistency for this LP relaxation.

Towards Learning Sparsely Used Dictionaries with Arbitrary Supports
(With Pranjal Awasthi).
FOCS 2018. [ArXiv Abstract]Dictionary learning is a popular approach for inferring a hidden basis or dictionary in which data has a sparse representation. Data generated from the dictionary A (an n by m matrix, with m > n in the overcomplete setting) is given by $Y = AX$ where X is a matrix whose columns have supports chosen from a distribution over ksparse vectors, and the nonzero values chosen from a symmetric distribution. Given Y, the goal is to recover A and X in polynomial time. Existing algorithms give polytime guarantees for recovering incoherent dictionaries, under strong distributional assumptions both on the supports of the columns of X, and on the values of the nonzero entries. In this work, we study the following question: Can we design efficient algorithms for recovering dictionaries when the supports of the columns of X are arbitrary? To address this question while circumventing the issue of nonidentifiability, we study a natural semirandom model for dictionary learning where there are a large number of samples y=Ax with arbitrary ksparse supports for x, along with a few samples where the sparse supports are chosen uniformly at random. While the few samples with random supports ensures identifiability, the support distribution can look almost arbitrary in aggregate. Hence existing algorithmic techniques seem to break down as they make strong assumptions on the supports. Our main contribution is a new polynomial time algorithm for learning incoherent overcomplete dictionaries that works under the semirandom model. Additionally the same algorithm provides polynomial time guarantees in new parameter regimes when the supports are fully random. Finally using these techniques, we also identify a minimal set of conditions on the supports under which the dictionary can be (information theoretically) recovered from polynomial samples for almost linear sparsity, i.e., $k=\widetilde{O}(n)$.

Clustering SemiRandom Mixtures of Gaussians
(With Pranjal Awasthi).
ICML 2018. [ArXiv Abstract]Gaussian mixture models (GMM) are the most widely used statistical model for the kmeans clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semirandom model for kmeans clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. In our model, a semirandom adversary is allowed to make arbitrary "monotone" or helpful changes to the data generated from the Gaussian mixture model. Our first contribution is a polynomial time algorithm that provably recovers the groundtruth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for kmeans clustering that is the methodofchoice in practice. Our second result complements the upper bound by giving a nearly matching informationtheoretic lower bound on the number of misclassified points incurred by any kmeans clustering algorithm on the semirandom model.

Optimality of Approximate Inference Algorithms on Stable Instances
(With Hunter Lang and David Sontag).
AISTATS 2018. [ArXiv Abstract]Approximate algorithms for structured prediction problemssuch as the popular alphaexpansion algorithm (Boykov et al. 2001) in computer visiontypically far exceed their theoretical performance guarantees on realworld instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alphaexpansion on MAP inference in Ferromagnetic Potts models (FPMs). Our main results use the connection between energy minimization in FPMs and the Uniform Metric Labeling problem to give a stability condition under which the alphaexpansion algorithm provably recovers the optimal MAP solution. This theoretical result complements the numerous empirical observations of alphaexpansion's performance. Additionally, we give a different stability condition under which an LPbased algorithm recovers the optimal solution.

On Learning Mixtures of WellSeparated Gaussians
(With Oded Regev).
FOCS 2017. [ArXivPDF  Abstract]We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of $k$ standard spherical Gaussians with means \mu_1,\ldots,\mu_k in R^d, and the goal is to estimate the means up to accuracy $\delta$ using $poly(k,d, 1/\delta)$ samples. In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly $\min\{k,d\}^{1/4}$. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation $o(1)$, exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.
1. We show that with separation $o(\sqrt{\log k})$, superpolynomially many samples are required. In fact, this holds even when the $k$ means of the Gaussians are picked at random in $d=O(\log k)$ dimensions.
2. We show that with separation $\Omega(\sqrt{\log k})$, $\poly(k,d,1/\delta)$ samples suffice. Notice that the bound on the separation is independent of $\delta$. This result is based on a new and efficient ``accuracy boosting'' algorithm that takes as input coarse estimates of the true means and in time (and samples) $\poly(k,d, 1/\delta)$ outputs estimates of the means up to arbitrarily good accuracy $\delta$ assuming the separation between the means is $\Omega(\min\{\sqrt{\log k},\sqrt{d}\})$ (independently of $\delta$). The idea of the algorithm is to iteratively solve a ``diagonally dominant'' system of nonlinear equations.
We also (1) present a computationally efficient algorithm in $d=O(1)$ dimensions with only $\Omega(\sqrt{d})$ separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of $k$ spherical Gaussians with polynomial samples.

Clustering Stable Instances of Euclidean kmeans
(With Abhratanu Dutta and Alex Wang).
NIPS 2017.[ArXiv Abstract]The Euclidean kmeans problem is arguably the most widelystudied clustering problem in machine learning. While the kmeans objective is NPhard in the worstcase, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of realworld instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances. Stable instances have unique optimal kmeans solutions that do not change even when each point is perturbed a little (in Euclidean distance). This captures the property that the kmeans optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we show an efficient algorithm with provable guarantees that is also robust to outliers. We complement these results by studying the amount of stability in real datasets and demonstrating that our algorithm performs well on these benchmark datasets.

Approximation Algorithms for Label Cover and the LogDensity Threshold
(With Eden Chlamtac, Pasin Manurangsi and Dana Moshkovitz).
SODA 2017. [ArXivSODA  Abstract]Many known optimal NPhardness of approximation results are reductions from a problem called \textsc{LabelCover}. The input is a bipartite graph $G=(L,R,E)$ and each edge $e=(x,y)\in E$ carries a projection $\pi_e$ that maps labels to $x$ to labels to $y$. The objective is to find a labeling of the vertices that satisfies as many of the projections as possible. It is believed that the best approximation ratio efficiently achievable for LabelCover is of the form $N^{c}$ where $N=nk$, $n$ is the number of vertices, $k$ is the number of labels, and $c \in (0,1)$ is some constant.
Inspired by a framework originally developed for Densest $k$Subgraph, we propose a ``log density threshold'' for the approximability of LabelCover. Specifically, we suggest the possibility that the LabelCover approximation problem undergoes a computational phase transition at the same threshold at which local algorithms for its random counterpart fail. This threshold is $N^{32\sqrt{2}}\approx N^{0.17}$. We then design, for any $\varepsilon>0$, a polynomialtime approximation algorithm for {\em semirandom} \textsc{LabelCover} whose approximation ratio is $N^{32\sqrt{2}+\varepsilon}$. In our semirandom model, the input graph is random (or even just expanding), and the projections on the edges are arbitrary.
For worstcase LabelCover we show a polynomialtime algorithm whose approximation ratio is roughly $N^{0.233}$. The previous best efficient approximation ratio was $N^{0.25}$. We present some evidence towards an $N^{c}$ threshold by constructing integrality gaps for $N^{\Omega(1)}$ rounds of the Sumofsquares/Lasserre hierarchy of the natural relaxation of Label Cover. For general 2CSP the ``log density threshold'' is $N^{0.25}$, and we give a polynomialtime algorithm in the semirandom model whose approximation ratio is $N^{0.25+\varepsilon}$ for any $\varepsilon>0$.

Learning Communities in the Presence of Errors
(With Konstantin Makarychev and Yury Makarychev).
COLT 2016. [ ArXiv  Abstract]We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors. In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, FeigeKilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add o(n) edges. Our work answers this question affirmatively even in the case of k>2 communities.
We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is \epsilon m close to SBM in the KullbackLeibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.

Beating the Random Assignment on Constraint Satisfaction Problems of Bounded Degree
(With Boaz Barak, Ankur Moitra, Ryan O'Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, David Witmer and John Wright).
APPROX 2015. [ ArXiv  Abstract]We show that for any odd k and any instance of the MaxkXOR constraint satisfaction problem, there is an efficient algorithm that finds an assignment satisfying at least a 1/2+ c/\sqrt{D} fraction of constraints, where D is a bound on the number of constraints that each variable occurs in. This improves both qualitatively and quantitatively on the recent work of Farhi, Goldstone, and Gutmann (2014), which gave a quantum algorithm to find an assignment satisfying a 1/2+ D^{−3/4} fraction of the equations. For arbitrary constraint satisfaction problems, we give a similar result for "trianglefree" instances; i.e., an efficient algorithm that finds an assignment satisfying at least a \mu +c/\sqrt{D} fraction of constraints, where \mu is the fraction that would be satisfied by a uniformly random assignment.

Correlation Clustering with Noisy, Partial Information
(With Konstantin Makarychev and Yury Makarychev).
COLT 2015. [ ArXiv  Abstract]In this paper, we propose and study a semirandom model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value (1+\delta)optcost+O_\delta(n log^3 n) with high probability, where optcost is the value of the optimal solution (for every \delta>0). The second algorithm finds the ground truth clustering with an arbitrarily small classification error (under some additional assumptions on the instance).

Learning Mixtures of Ranking Models
(With Pranjal Awasthi, Avrim Blum and Or Sheffet).
NIPS 2014 . [ ArXiv Abstract]This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model . Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima.
We present the first polynomial time algorithm which provable learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the topk prefix in both the rankings. In particular, our results imply that in order to recover the top krankings from a heterogeneous population, it is enough for each user to just specify their top3 (unordered) preferences.

Constant Factor Approximations for Balanced Cut in the random PIE model
(With Konstantin Makarychev and Yury Makarychev).
STOC 2014. [ ArXiv  Abstract]We propose and study a new averagecase model for Balanced Cut, a planted model with permutationinvariant random edges (PIE). Our model is much more general than previously studied averagecase models like semirandom models, random planted partitioning models and stochastic block models, and can capture realworld network models like preferential attachment models. We give constant factor approximations for Balanced Cut in this model.
The model : Consider a set of vertices V partitioned into two clusters L and R of equal size. Let G be an arbitrary graph on V with no edges between L and R. Let E_random be a set of edges sampled from an arbitrary permutationinvariant distribution (a distribution that is invariant under renaming of vertices in L and in R). Then we say that G + E_random is a graph with permutationinvariant random edges. For example, in a social network formed by the superposition of separate networks representing different types of ties (say geographic and professional), these different networks are independent.
We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost O(E_random) + n.polylog(n) in this model. In the most interesting regime, this is a constant factor approximation with respect to the cost of the planted cut.

Smoothed Analysis of Tensor Decompositions
(With Aditya Bhaskara, Moses Charikar and Ankur Moitra)
STOC 2014. [ PDF  ArXiv  Abstract ]Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error  a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, these are not known to be stable to noise.
Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multiview models and mixtures of axisaligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.

Uniqueness of Tensor Decompositions and Polynomial Identifiability of Latent Variable Models
(With Aditya Bhaskara and Moses Charikar)
COLT 2014. [PDF  ArXiv  Abstract ]We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal's rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error.
Kruskal's theorem has found many applications in proving the identifiability of parameters for variouslatent variable models and mixture models such as Hidden Markov models, topic models etc. Our robust version immediately implies identifiability using only polynomially many samples in many of these settings. This polynomial identifiability is an essential first step towards efficient learning algorithms for these models.
Recently, algorithms based on tensor decompositions have been used to estimate the parameters of various hidden variable models efficiently in special cases as long as they satisfy certain "nondegeneracy" properties. Our methods give a way to go beyond this nondegeneracy barrier, and establish polynomial identifiability of the parameters under much milder conditions. Given the importance of Kruskal's theorem in the tensor literature, we expect that this robust version will have several applications beyond the settings we explore in this work.
 BiluLinial Stable Instances of Max Cut and Minimum Multiway Cut
(With Konstantin Makarychev and Yury Makarychev)
SODA 2014. [PDF  ArXiv  Abstract]We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomialtime algorithm for $\gamma$stable Max Cut instances with $\gamma >= c\sqrt{log n}loglog n$ for some absolute constant $c > 0$. Our algorithm is robust: it never returns an incorrect answer; if the instance is $\gamma$stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not $\gamma$stable. We prove that there is no robust polynomialtime algorithm for $\gamma$stable instances of Max Cut when $\gamma < \alpha_{SC}(n/2)$, where $\alpha_{SC}$ is the best approximation factor for Sparsest Cut with nonuniform demands.
Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with $\ell_2^2$ triangle inequalities) is integral if $\gamma \geq D_{\ell_2^2\to \ell_1}(n)$, where $D_{\ell_2^2\to \ell_1}(n)$ is the least distortion with which every n point metric space of negative type embeds into $\ell_1$. On the negative side, we show that the SDP relaxation is not integral when $\gamma < D_{\ell_2^2\to \ell_1}(n/2)$. Moreover, there is no tractable convex relaxation for $\gamma$stable instances of Max Cut when $\gamma < \alpha_{SC}(n/2)$. That suggests that solving $\gamma$stable instances with $\gamma =o(\sqrt{\log n})$ might be difficult or impossible.
Our results significantly improve previously known results. The best previously known algorithm for $\gamma$stable instances of Max Cut required that $\gamma \geq c\sqrt{n}$ (for some $c > 0$) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.
 Sorting noisy data with partial information
(With Konstantin Makarychev and Yury Makarychev)
ITCS 2013. [PDF  Abstract]nd the minimum set of edges (arcs) whose removal makes a given graph a directed acyclic graph (DAG). The interest in FAS stems from its numerous applications in sorting and ranking based on pairwise information, removing cyclic dependencies etc. However, the best known algorithm for this problem only gives a polylogarithmic approximation in the worstcase, and constant factor approximations have been elusive.Since realworld instances are unlikely to behave like worstcase instances, a compelling question is : can we design better algorithms for realistic average case instances? We propose two realistic averagecase models for the problem, and present new approximation algorithms which obtain a PTAS and constant factor approximations in these two models.
 Approximation algorithms for Semirandom Graph Partitioning problems
(With Konstantin Makarychev and Yury Makarychev)
STOC 2012. [PDF  ArXiv  Abstract]We propose and study a new semirandom model for graph partitioning problems. We believe that it captures many properties of real
WThe model is more flexible than the semirandom model of Feige and Kilian and planted random model of Bui, Chaudhuri, Leighton and Sipser.
In our semirandom graph partitioning model, we need just edges across different clusters to be (semi)random. The edges inside clusters can be completely arbitrary (unlike previous models).
We develop a general framework for solving semirandom instances using new SDPbased techniques and apply it to several problems of interest. We present constant factor approximation algorithms for semirandom instances of the Balanced Cut, Multicut, Min Uncut and SmallSet Expansion problems. We also show how to almost recover the optimal solution if the instance satisfies an additional expanding condition. Our algorithms work in a wider range of parameters than algorithms for previously studied random and semirandom models.
 Polynomial Integrality gaps for Strong relaxations of Densest ksubgraph
(With Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami and Yuan Zhou)
SODA 2012. [PDF  ArXiv  Abstract]The densest ksubgraph (DkS) problem (i.e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for DkS: the current best algorithm gives an ~ $O(n^{1/4})$ approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P != NP. In addition to interest in designing better algorithms, a number of recent results have exploited the conjectured hardness of densest ksubgraph and its variants. Thus, understanding the approximability of DkS is an important challenge. In this work, we give evidence for the hardness of approximating DkS within polynomial factors. Specifically, we expose the limitations of strong semidefinite programs from SDP hierarchies in solving densest ksubgraph. Our results include:
* A lower bound of $\Omega(n^{1/4}/\log^3 n)$ on the integrality gap for $\Omega(\log n/\log \log n)$ rounds of the SheraliAdams relaxation for DkS. This also holds for the relaxation obtained from SheraliAdams with an added SDP constraint. Our gap instances are in fact ErdosRenyi random graphs.
* For every $\epsilon > 0$, a lower bound of $n^{2/53\epsilon}$ on the integrality gap of $n^{\Omega(\epsilon)}$ rounds of the Lasserre SDP relaxation for DkS, and an $n^{Omega_\epsilon(1)}$ gap for $n^{1\epsilon}$ rounds. Our construction proceeds via a reduction from random instances of a certain MaxCSP over large domains. In the absence of inapproximability results for DkS, our results show that even the most powerful SDPs are unable to beat a factor of $n^{\Omega(1)}$, and in fact even improving the best known $n^{1/4}$ factor is a barrier for current techniques.  Algorithms and Hardness of the kroute cuts problem
(With Julia Chuzhoy, Yury Makarychev and Yuan Zhou)
SODA 2012. [PDF  ArXiv  Abstract]We study the kroute cut problem: given an undirected edgeweighted graph G=(V,E), a collection {(s_1,t_1),(s_2,t_2),...,(s_r,t_r)} of sourcesink pairs, and an integer connectivity requirement k, the goal is to find a minimumweight subset E' of edges to remove, such that the connectivity of every pair (s_i, t_i) falls below k. Specifically, in the edgeconnectivity version, ECkRC, the requirement is that there are at most (k1) edgedisjoint paths connecting s_i to t_i in G \ E', while in the vertexconnectivity version, NCkRC, the same requirement is for vertexdisjoint paths. Prior to our work, polylogarithmic approximation algorithms have been known for the special case where k >= 3, but no nontrivial approximation algorithms were known for any value k>3, except in the singlesource setting. We show an O(k log^{3/2}r)approximation algorithm for ECkRC with uniform edge weights, and several polylogarithmic bicriteria approximation algorithms for ECkRC and NCkRC, where the connectivity requirement k is violated by a constant factor. We complement these upper bounds by proving that NCkRC is hard to approximate to within a factor of k^{eps} for some fixed eps>0.
We then turn to study a simpler version of NCkRC, where only one sourcesink pair is present. We give a simple bicriteria approximation algorithm for this case, and show evidence that even this restricted version of the problem may be hard to approximate. For example, we prove that the single sourcesink pair version of NCkRC has no constantfactor approximation, assuming Feige's Random kAND assumption.
 On Quadratic Programming with a ratio objective
(With Aditya Bhaskara, Moses Charikar and Rajsekar Manokaran)
ICALP 2012. [PDF  ArXiv ]
 Approximating the matrix pnorm
(With Aditya Bhaskara)
SODA 2011. [PDF  Abstract]We consider the problem of computing the q>p norm of a matrix A, which is defined for p,q >= 1, as A_{q>p} = Max_{x} Ax_{p}, where x_{q}=1
This is in general a nonconvex optimization problem, and is a natural generalization of the wellstudied question of computing singular values ( when p=q=2). Different settings of parameters give rise to a variety of known interesting problems (such as the Grothendieck problem when p=1 and q=\infty ). Our first result is an efficient algorithm for computing the q>p norm of matrices with nonnegative entries, when q >= p >= 1. The algorithm we analyze can be seen as an analog of power iteration for computing eigenvalues.
We then present an application of our techniques in the oblivious routing setting: we make constructive a recent existential result of Englert and Racke on O(log n) competitive oblivious routing schemes in the l_p norm.
On the other hand, for general matrices, we show "almost polynomial" factor hardness for 2 < p <= q and p <= q < 2 if NP does not have quasipolynomial time algorithms.  Detecting High LogDensities  an O(n^{1/4}) Approximation for Densest kSubgraph
(With Aditya Bhaskara, Moses Charikar, Eden Chlamtac and Uri Feige)
STOC 2010. [PDF  Abstract]In the Densest kSubgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NPhard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg~\cite{FKP}, gives an approximation ratio of $n^{1/3  \epsilon}$ for some specific $\epsilon > 0$ (estimated by those authors at around $\epsilon = 1/60$).
We present an algorithm that for every $\epsilon> 0$ approximates the Densest $k$Subgraph problem within a ratio of $n^{1/4 + \epsilon}$ in time $n^{O(1/\epsilon)}$. If allowed to run for time $n^{O(\log n)}$, our algorithm achieves an approximation ratio of $O(n^{1/4})$. Our algorithm is inspired by studying an averagecase version of the problem where the goal is to distinguish random graphs from random graphs with planted dense subgraphs  the approximation ratio we achieve for the general case matches the ``distinguishing ratio'' we obtain for this planted problem. Achieving a distinguishing ratio of $o(n^{1/4})$ for the planted problem (in polynomial time) is beyond the reach of our current techniques.
At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in $G$, and using these counts to identify the vertices of the dense subgraph. Our algorithm is based on the following principle. We say that a graph $G(V,E)$ has logdensity $\alpha$ if its average degree is $\Theta(V^{\alpha})$. The algorithmic core of our result is a family of algorithms that output $k$subgraphs of nontrivial density whenever the logdensity of the densest $k$subgraph is larger than the logdensity of the host graph.
Finally, we extend this algorithm to obtain an $O(n^{1/4  \epsilon})$approximation algorithm which runs in time $O(2^{n^{O(\epsilon)}})$ and also explore various approaches to obtain better approximation algorithms in restricted parameter settings for random instances.

Graph Isomorphism: Approximate and Robust
(With Yu Wu, Yuichi Yoshida and Yuan Zhou).
 Noncooperative Bundling games
(With Ravishankar Krishnaswamy, Pandu Rangan Chandrasekaran and Ravi Sundaram)
[Manuscript Abstract]Bundling is a basic problem in commerce. We define and study the Bundling Game: each of n players with a unique product to sell can choose to sell their products in bundles with other players' products so as to maximize payoff; the players have knowledge of the valuations of m (potential) buyers for each of the products. We define two natural classes of payoff functions  Fair Profit Sharing Functions (FPSFs) and Incentive Functions (IFs). For FPSFs we prove the surprising impossibility result that there exists no Fair Profit Sharing Payoff Function that can always guarantee stability. We counterbalance this result by showing that pure Nash equilibria always exist for IFs and obtain tight bounds on the Price of Anarchy. In particular, the revenue maximizing bundling is a pure Nash equilibrium. Motivated by the bundling game, we present hardness results and approximation algorithms for finding revenue maximizing bundlings and bundles and associated variants.
Some older work includes