# Variations on the alternating direction method of multipliers

Thursday, May 21, 2015 - 4:40pm - 5:30pm

Keller 3-180

Jose Bento (Boston College)

The alternating direction method of multipliers (ADMM) originated in the 1970s as a decomposition-coordination method for solving optimization problems. It has recently experienced a revival due to its highly distributed and parallel nature. It is typically applied when the objective function can be decomposed into a sum of functions over a set of shared variables. ADMM iteratively optimizes each function independently and uses update rules that enforce a consensus among the values of the shared variables. These iterations are specified up to the value of a penalty parameter and are guaranteed to find the global optimum when the problem is convex.

In the first part of this talk I will review ADMM, treating it as a message-passing algorithm on a bipartite graph. In this interpretation, each message has its own penalty parameter, now called a “message-weight”, that can be seen as measuring the reliability or confidence of the messages. Using the intuition gained from this message-passing interpretation, I justify the Three Weight Algorithm (TWA) (Derbinsky et al. 2013), a set of rules for dynamically updating the message-weights. The message-weights are restricted to three different values: infinity for absolute confidence, zero for complete lack of confidence and 1 as a default confidence level.

In the second part of this talk I show how the TWA, a simple change in the ADMM, can have an enormous impact on the speed with which it finds good approximate solutions for non-convex problems. In a first example, TWA is used to solve very large Sudoku puzzles. Here, the infinite weights play a major role and allow the propagation of information through hard constraints. Next I will show how TWA can be used to pack a very large number of hard disks in a box. In this context the zero weights play the major role and speed up convergence by several orders of magnitude compared to standard ADMM. Third, we show that TWA can solve difficult multi-robot trajectory planning problems. Finally, I show how the TWA can be used to do protein folding.

In the third part of this talk I will discuss how ADMM can be implemented on modern Graphics Processing Units (GPUs). Many GPU-based ADMM implementations are problem dependent. They often break the objective function into two or three parts and use the GPU to accelerate linear algebra computations inside the optimization of each of the terms in the objective. I explain another choice for decomposition which consists in breaking the objective into many small terms and having each core in the GPU solve for each term in parallel. This decomposition allows us to build an optimization framework that can exploit parallelism in a problem independent fashion. It also brings to light the importance of developing asynchronous decomposition-coordination algorithms with theoretical convergence guarantees to cope with the synchronization and memory-sharing limitations imposed by most GPUs.

In the first part of this talk I will review ADMM, treating it as a message-passing algorithm on a bipartite graph. In this interpretation, each message has its own penalty parameter, now called a “message-weight”, that can be seen as measuring the reliability or confidence of the messages. Using the intuition gained from this message-passing interpretation, I justify the Three Weight Algorithm (TWA) (Derbinsky et al. 2013), a set of rules for dynamically updating the message-weights. The message-weights are restricted to three different values: infinity for absolute confidence, zero for complete lack of confidence and 1 as a default confidence level.

In the second part of this talk I show how the TWA, a simple change in the ADMM, can have an enormous impact on the speed with which it finds good approximate solutions for non-convex problems. In a first example, TWA is used to solve very large Sudoku puzzles. Here, the infinite weights play a major role and allow the propagation of information through hard constraints. Next I will show how TWA can be used to pack a very large number of hard disks in a box. In this context the zero weights play the major role and speed up convergence by several orders of magnitude compared to standard ADMM. Third, we show that TWA can solve difficult multi-robot trajectory planning problems. Finally, I show how the TWA can be used to do protein folding.

In the third part of this talk I will discuss how ADMM can be implemented on modern Graphics Processing Units (GPUs). Many GPU-based ADMM implementations are problem dependent. They often break the objective function into two or three parts and use the GPU to accelerate linear algebra computations inside the optimization of each of the terms in the objective. I explain another choice for decomposition which consists in breaking the objective into many small terms and having each core in the GPU solve for each term in parallel. This decomposition allows us to build an optimization framework that can exploit parallelism in a problem independent fashion. It also brings to light the importance of developing asynchronous decomposition-coordination algorithms with theoretical convergence guarantees to cope with the synchronization and memory-sharing limitations imposed by most GPUs.

MSC Code:

20C25

Keywords: