# Poster Session and Reception

Monday, May 16, 2016 - 5:30pm - 6:30pm

Lind 400

**When are Nonconvex Optimization Problems Not Scary?**

Qing Qu (Columbia University)

We focus on smooth nonconvex optimization problems that obey: (1) all local minimizers are also global; and (2) around any saddle point or local maximizer, the objective has a negative directional curvature. Concrete applications such as dictionary learning, generalized phase retrieval, and orthogonal tensor decomposition are known to induce such structures. We describe a second-order trust-region algorithm that provably converges to a global minimizer efficiently, without special initializations.**Distributed Multi-Task Learning**

Jialei Wang (University of Chicago)

We consider the problem of distributed multi-task learning, where each machine learns a separate, but related, task. Specifically, each machine learns a linear predictor in high-dimensional space,where all tasks share the same small support. We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.**Harnessing Smoothness to Accelerate Distributed Optimization**

Guannan Qu (Harvard University)

There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. Literature has developed consensus-based distributed (sub)gradient descent (DGD) methods and has shown that they have the same convergence rate $O(\frac{1}{\sqrt{t}})$ as the centralized (sub)gradient methods (CGD), when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of $O(\frac{1}{t})$. If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.**Estimation with Norm Regularization**

Vidyashankar Sivakumar (University of Minnesota, Twin Cities)

Analysis of non-asymptotic estimation error and structured statistical recovery based on norm regularized regression, such as Lasso, needs to consider four aspects: the norm, the loss function, the design matrix, and the noise model. This paper presents generalizations of such estimation error analysis on all four aspects compared to the existing literature. We characterize the restricted error set where the estimation error vector lies, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to any norm. Precise characterizations of the bound is presented for isotropic as well as anisotropic subGaussian design matrices, subGaussian noise models, and convex loss functions, including least squares and generalized linear models. Generic chaining and associated results play an important role in the analysis. A key result from the analysis is that the sample complexity of all such estimators depends on the Gaussian width of a spherical cap corresponding to the restricted error set. Further, once the number of samples n crosses the required sample complexity, the estimation error decreases as c/\sqrt{n}, where c depends on the Gaussian width of the unit norm ball.**Matrix Completion has no Spurious Local Minimum**

Tengyu Ma (Princeton University)

We show that the non-convex objective function of matrix completion has no spurious local minimum. This geometric result together with [GHJY'15, LSJR'16] implies that that gradient descent converges to a global minimum in polynomial time from an arbitrary starting point.**Distributed Regularized Primal-Dual Method**

Masoud Badiei Khuzani (Harvard University)

We study deterministic and stochastic primal-dual sub-gradient methods for distributed optimization of a separable objective function with global constraints over a network. In both algorithms, the norm of dual variables is controlled by augmenting the corresponding Lagrangian function with a regularizer on the dual variables. Specifically, for each underlying algorithm we show that as long as its step size satisfies a certain restriction, the norm of dual variables is modulated by the inverse of the regularizer's curvature. In the deterministic optimization case, we leverage the bound on dual variables to analyze the consensus terms and subsequently establish the convergence rate of the distributed primal-dual algorithm. In the stochastic optimization case, the upper bound on dual variables is used to derive a high probability bound on the convergence rate via the method of bounded martingale difference. For both algorithms, we exhibit a tension between the convergence rate of underlying algorithm and the decay rate associated with the constraint violation.**Online Learning in Repeated Auctions**

Jonathan Weed (Massachusetts Institute of Technology)

Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical. Comparing our performance against that of the best fixed bid in hindsight, we show that sublinear regret is also achievable in this case. For both the stochastic and adversarial models, we prove matching minimax lower bounds showing our strategies to be optimal up to lower-order terms. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.**Balancing Sample Size, Risk, and Computational Cost**

John Bruer (California Institute of Technology)

We describe a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization. When we have a large amount of data, we can exploit excess samples to decrease statistical risk, to decrease computational cost, or to trade off between the two. We propose to achieve this tradeoff by varying the amount of smoothing applied to the optimization problem. Our work uses regularized linear regression as a case study to argue for the existence of this tradeoff both theoretically and experimentally. We also apply our method to an image interpolation problem demonstrating the ability to create such a tradeoff with real data.