# Poster Session

**Optimization Theory for ReLU Neural Networks Trained with Normalization Layers**

Yonatan Dukler (University of California, Los Angeles)

This is joint work with Guido Montufar and Quanquan Gu as presented at ICML 2020.

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization performance and speed up training significantly. Nonetheless, the vast majority of current deep learning theory and non-convex optimization literature focuses on the un-normalized setting, where the functions under consideration do not exhibit the properties of commonly normalized neural networks. In this paper, we bridge this gap by giving the first global convergence result for two-layer neural networks with ReLU activations trained with a normalization layer, namely Weight Normalization. Our analysis shows how the introduction of normalization layers changes the optimization landscape and can enable faster convergence as compared with un-normalized neural networks.**ANN Modeling of Plasticity of Metals**

Koffi Eankoutsa (University of California, Los Angeles)There is an increasing demand to apply ANN techniques to finite element analysis of complex engineering problems. While some of these efforts focus on data science, where available data are used to predict the behavior of any model without using constitutive laws, other efforts are based on a mixed data/modeling approach. In this contribution we assessed the feasibility of ANN for modeling nonlinear complex behaviors of materials. First, we present a constitutive model based on ANN which has the ability to capture complex nonlinear of materials. Then, the ANN is trained on a data set of stress-strain curves from numerical simulations. Finally, the trained ANN is integrated to an Abaqus UMAT subroutine. The proposed framework is applied to several boundary values problems, the material obeying an elastic-plastic behavior.

**Building Towards a Surrogate Model for Nuclear Reaction Systems**

Ty Frazier (University of Minnesota, Twin Cities)Creating surrogate models for nonlinear stiff ordinary differential equation (ODE) systems. Nuclear reaction system ODEs that are split from a hydrodynamics PDE are our particular goal and what we will show work on is a less complex version of that. We used neural networks to construct our surrogate models. The neural networks were trained from numerical simulation of the nuclear reactions system. We used an implicit multistep ODE integrator in order to perform the simulations.

**Dynamic functional connectivity analysis based on time-varying partial correlation with a copula-DCC-GARCH model**

Jong-Min Kim (University of Minnesota, Morris)Namgil Lee (Kangwon (Kangweon) National University)We suggest a time-varying partial correlation as a statistical measure of dynamic functional connectivity (dFC) in the human brain. Traditional statistical models often assume specific distributions on the measured data such as the Gaussian distribution, which prohibits their application to neuroimaging data analysis. First, we use the copula-based dynamic conditional correlation (DCC), which does not rely on a specific distribution assumption, for estimating time-varying correlation between regions-of-interest (ROIs) of the human brain. Then, we suggest a time-varying partial correlation based on the Gaussian copula-DCC-GARCH model as an effective method for measuring dFC in the human brain.Arecursive algorithm is explained for computation of the time-varying partial correlation. Numerical simulation results demonstrate effectiveness of the partial correlation-based methods against pairwise correlation-based methods. In addition, a two-step procedure is described for the inference of sparse dFC structure using functional magnetic resonance imaging (fMRI) data. We illustrate the proposed method by analyzing an fMRI data set of human participants watching a Pixar animated movie. Based on twelve a priori selected brain regions in the cortex, we demonstrate that the proposed method is effective for inferring sparse dFC network structures and robust to noise distribution and a preprocessing step of fMRI data.

**Graph regularization inspired multitask learning**

Harlin Lee (Carnegie Mellon University)In multitask learning, there are multiple learning tasks that are related, and the goal is to take advantage of those similarities and differences to improve the performance on all tasks. Inspired by the graph regularization framework, we propose a very simple linear mixing model that takes convex combinations of local estimates to produce a more accurate set of multitask solutions. Theoretically, we show that our multitask solution using optimal mixing matrix has smaller expected MSE for each task than both the local estimate and the naive graph regularization solution. Algorithmically, we can easily estimate the optimal mixing matrix for linear regression, and produce multitask solutions accordingly. This approach is inherently suitable to a hub-and-spoke distributed model. Experimentally, simulations confirm that this fusion approach is more beneficial as the original tasks get harder. This project can potentially be generalized to multitask PCA, as well as other linear estimators.

**Haar Graph Pooling**

Guido Montufar (University of California, Los Angeles)Yuguang Wang (Max Planck Institute for Mathematics in the Sciences)

Deep Graph Neural Networks (GNNs) are useful models for graph classification and graph-based regression tasks. In these tasks, graph pooling is a critical ingredient by which GNNs adapt to input graphs of varying size and structure. We propose a new graph pooling operation based on compressive Haar transforms --- HaarPooling. HaarPooling implements a cascade of pooling operations; it is computed by following a sequence of clusterings of the input graph. A HaarPooling layer transforms a given input graph to an output graph with a smaller node number and the same feature dimension; the compressive Haar transform filters out fine detail information in the Haar wavelet domain. In this way, all the HaarPooling layers together synthesize the features of any given input graph into a feature vector of uniform size. Such transforms provide a sparse characterization of the data and preserve the structure information of the input graph. GNNs implemented with standard graph convolution layers and HaarPooling layers achieve a state of the art performance on diverse graph classification and regression problems.