On Wasserstein Gradient Flows and the Search of Neural Network Architectures
Neural networks have revolutionized machine learning and artificial intelligence in unprecedented ways, establishing new benchmarks in performance in applications such as image recognition and language processing. Such success has motivated researchers and practitioners in multiple fields to develop further applications. This environment has driven several novel research directions. In particular, one crucial question that has received increased recent attention is related to the design of good neural architectures using data-driven approaches and minimal human intervention. In this talk I will discuss a framework in which ideas from optimal transport can be used to motivate algorithms for the exploration of the architecture space. In the first part of the talk I will abstract the problem of neural architecture search slightly and discuss how optimal transport can motivate first order and second order gradient descent schemes for the optimization of a semi-discrete objective function. I will then return to the original neural architecture search problem, and using the ideas discussed during the first part of the talk, I will motivate two algorithms for neural architecture search called NASGD and NASAGD. I will wrap up by discussing the performance of our algorithms when searching an architecture for a classification problem with the CIFAR-10 data set, and providing some perspective on future research directions. This talk is based on joint work with Felix Morales and Javier Morales.