# Poster Session and Reception

Thursday, February 22, 2018 - 4:30pm - 6:00pm

Lind 400

**Predictive effect of economic and market variations on structural breaks in credit market**

Haipeng Xing (State University of New York, Stony Brook (SUNY))

The financial crisis of 2007-2008 has caused severe economic and political consequences over the world. An interesting question from this crisis is whether or to what extent such sharp changes or structural breaks in the market can be explained by economic and market fundamentals. To address this issue, we consider a model that extracts the information of market structural breaks from firms' credit rating records, and connects probabilities of market structural breaks to observed and latent economic variables. We also discuss the issue of selecting significant variables when the number of economic covariates is large. We then analyze market structural breaks that involve U.S. firms' credit rating records and historical data of economic and market fundamentals from 1986 to 2015. We find that the probabilities of structural breaks are positively correlated with changes of S\&P500 returns and volatilities and changes of inflation, and negatively correlated with changes of corporate bond yield. The significance of other variables depends on the inclusion of latent variables in the study or not.**Robust calibration and emulation for imperfect mathematical models**

Mengyang Gu (Johns Hopkins University)

We focus on the problem of calibrating imperfect mathematical models using experimental data. To compensate for the misspecification of the mathematical model, a discrepancy function is usually included and modeled via a Gaussian stochastic process (GaSP), leading to better results of prediction. The calibrated mathematical model itself, however, sometimes fits the experimental data poorly, as the calibration parameters become unidentifiable. In this work, we propose the scaled Gaussian stochastic process (S-GaSP), a novel stochastic process for calibration and prediction. This new approach bridges the gap between two predominant methods, namely the L2 calibration and GaSP calibration. A computationally feasible approach is introduced for this new model under the Bayesian paradigm. New robust and computationally efficient statistical models will also be discussed for emulating computationally expensive mathematical models with massive output. The spatio-temporal outputs from TITAN2D, a computer model that simulates volcanic eruption, and the Interferometric synthetic aperture radar (InSAR) data will be used to demonstrate the performance of the proposed statistical methods for emulation and calibration.**Spectral and imaging data analysis on organic matter in shale**

Jing Yang (NONE)

The recent boom in unconventional oil and gas production from shale has revolutionized the energy landscape in the United States and stimulated both considerable research as well as environmental and political concerns. Shale is a fine-grained sedimentary rock, composed of solid organic matter (OM) scattered in a mineral framework. Solid organic matter (OM) plays an essential role in the generation, migration, storage, and production of hydrocarbons from economically important shale rock formations. This work demonstrates the first simultaneous measurement of chemical and mechanical heterogeneity of OM in shale at nanoscale using microscopy and spectroscopy. We use a combination of Lorentzian–Gaussian curve functions to provide a good statistical reconstruction of the measured IR spectra of complex organic matter and quantify its chemical characteristics using band ratios calculated from IR spectra. Using this approach, we examine the evolution of OM chemical composition during petroleum generation at nanoscale. Additionally, we register the mechanical stiffness image with microscopy image and obtain the distribution of the mechanical stiffness of individual maceral (optically discernible organic matter). The composition of the macerals controls their mechanical properties, with macerals enriched in aromatic carbon (and lean in aliphatic carbon) having relatively high mechanical stiffnesses. The results document the evolution of individual organic macerals with maturation, providing a microscopic picture of the heterogeneous process of petroleum generation.**Pruning and Nonparametric Multiple Change Point Detection**

Wenyu Zhang (Cornell University)

Change point analysis is a statistical tool to identify homogeneity within time series data. We propose a pruning approach for approximate nonparametric estimation of multiple change points. This general purpose change point detection procedure ‘cp3o’ applies a pruning routine within a dynamic program to greatly reduce the search space and computational costs. Existing goodness-of-fit change point objectives can immediately be utilized within the framework. We further propose novel change point algorithms by applying cp3o to two popular nonparametric goodness of fit measures: ‘e-cp3o’ uses E-statistics, and ‘ks-cp3o’ uses Kolmogorov-Smirnov statistics.**Testing for Trends in High-dimensional Time Series**

Likai Chen (University of Chicago)

We considers statistical inference for trends of high-dimensional time series. Based on a modified L2L2-distance between parametric and nonparametric trend estimators, we propose a de-diagonalized quadratic form test statistic for testing patterns on trends, such as linear, quadratic or parallel forms. We develop an asymptotic theory for the test statistic. A Gaussian multiplier testing procedure is proposed and it has an improved finite sample performance. Our testing procedure is applied to a spatial temporal temperature data gathered from various locations across America. A simulation study is also presented to illustrate the performance of our testing method.**Testing Sparsity-Inducing Penalties**

Maryclare Griffin (University of Washington)

Many penalized maximum likelihood estimators correspond to posterior mode estimators under specific prior distributions. Appropriateness of a particular class of penalty functions can therefore be interpreted as the appropriateness of a prior for the parameters. For example, the appropriateness of a lasso penalty for regression coefficients depends on the extent to which the empirical distribution of the regression coefficients resembles a Laplace distribution. We give a testing procedure of whether or not a Laplace prior is appropriate and accordingly, whether or not using a lasso penalized estimate is appropriate. This testing procedure is designed to have power against exponential power priors which correspond to l_q penalties. Via simulations, we show that this testing procedure achieves the desired level and has enough power to detect violations of the Laplace assumption when the numbers of observations and unknown regression coefficients are large. We then introduce an adaptive procedure that chooses a more appropriate prior and corresponding penalty from the class of exponential power priors when the null hypothesis is rejected. We show that this can improve estimation of the regression coefficients both when they are drawn from an exponential power distribution and when they are drawn from a spike-and-slab distribution.**Poverty Prediction with Satellite Imagery**

Binh Tang (Cornell University)

Despite worldwide efforts to reduce poverty, there exists a very limited amount of economic data to help policy-makers in African countries. Fortunately, many satellite images that are cheaply collected and made publicly available can provide a rich source of information on economic conditions at the global scale. In this work, we combine daytime images and vegetation indices to predict poverty indicators for several African countries. Our methods are based on recent advances in machine learning such as convolutional neural networks and attain favorable results compared to state-of-the-art methods.**Topological Data Analysis on Human Red Blood Cells**

Yu-Min Chung (University of North Carolina, Greensboro)

Human red blood cells (RBCs) exhibit spontaneous vibratory motions, referred to as flickering. Previous work using measurements of cell roughness as well as detrended fluctuation analysis and multiscale entropy methods has shown that the short-term flickering motions of RBCs exhibit complex structure and dynamics over multiple spatial and time scales. In addition, these properties (both roughness and temporal complexity) have been shown to degrade with age or disease such that older or diseased cells show significantly less roughness and temporal complexity than newly-formed and healthy cells. Here, using algorithms adapted from the field of computation topology, we quantify spatial and spatio-temporal characteristics of RBC structure and flickering as depicted in phase contrast microscopy images, and of their changes with in vivo aging.**Forecasting Using Random Subspace Methods**

Didier Nibbering (Erasmus Universiteit Rotterdam)

Random subspace methods are a novel approach to obtain accurate forecasts in high-dimensional regression settings. Forecasts are constructed from random subsets of predictors or randomly weighted predictors. We provide a theoretical justification for these strategies by deriving bounds on their asymptotic mean squared forecast error, which are highly informative on the scenarios where the methods work well. Monte Carlo simulations confirm the theoretical findings and show improvements in predictive accuracy relative to widely used benchmarks. The predictive accuracy on monthly macroeconomic FRED-MD data increases substantially, with random subspace methods outperforming all competing methods for at least 66% of the series.**Partially Specified Spatial Autoregressive Model with Artificial Neural Network**

Wenqian Wang (Northwestern University)

The spatial autoregressive model has been widely applied in science, in areas such as economics, public finance, political science, agricultural economics, environmental studies and transportation analyses. The classical spatial autoregressive model is a linear model for describing spatial correlation. In this work, we expand the classical model to include related exogenous variables, possibly non-Gaussian, high volatility errors, and a nonlinear neural network component. The nonlinear neural network component allows for more model flexibility — the ability to learn and model nonlinear and complex relationships. We use a maximum likelihood approach for model parameter estimation. We establish consistency and asymptotic normality for these estimators under some standard conditions on the spatial model and neural network component. We investigate the quality of the asymptotic approximations for finite samples by means of numerical simulation studies. For illustration, we include a real world application.**Multi-resolution filters for massive spatio-temporal data**

Matthias Katzfuss (Texas A & M University)

Spatio-temporal datasets are rapidly increasing in size. For example, environmental variables are often measured by automatic high-resolution sensors mounted on satellites and aircraft. However, despite their massive size, the resulting spatio-temporal datasets are noisy and incomplete, and so statistical inference is required to obtain complete maps of a spatio-temporal process of interest, together with proper uncertainty quantification. We focus here on (near-) real-time filtering inference in linear Gaussian state-space models, where the state at each time point is a spatial field evaluated on a very large spatial grid. For these models, exact inference using the Kalman filter is computationally infeasible. Instead, we propose a multi-resolution filter (MRF), which approximates the distribution of the spatial field using a large number of spatial basis functions at multiple resolutions, which are automatically determined to capture the spatial structure at all scales. The MRF is highly scalable in the size of the spatial grid, and it is well-suited for massively distributed computations. An interactive illustration of the MRF can be found at http://spatial.stat.tamu.edu/. This is joint work with Marcin Jurek (Texas A&M).**A Switching Kalman Filter for Modeling Classical Music Performances**

Daniel McDonald (Indiana University)

Musical recordings are complex data files that describe the intensity and onset time for every keystroke made by the performer. Matching this data to a musical score, removing incorrect notes, anticipating note onsets for automated accompaniment, comparing diverse performances, and discovering the relationship between performer choice and listener enjoyment all require smoothing to recover low-dimensional structure. Standard statistical techniques like smoothing splines presume small changes in a derivative, but musical performances do not conform to these assumptions because tempo and dynamic interpretations rely on the juxtaposition of local smoothness with sudden changes and emphases to create listener interest. It is exactly the parts of a performance that are poorly described by statistical smoothers that render a performance interesting. Furthermore, many of these inflections are notated by the composer or are implicit in performance practice developed over centuries of musical expressivity. We present a Markov-switching statespace model for classical piano performances. We give an algorithm for greedily solving the NP-hard optimization problem and illustrate our methods on professional recordings of Chopin's Mazurka Op. 68, No. 3.**Time series forecasting using functional partial least square regression with stochastic volatility, GARCH, and exponential smoothing**

Jong-Min Kim (University of Minnesota, Morris)

We propose a method for improving the predictive ability of standard forecasting models used in financial economics. Our approach is based on the functional partial least squares (FPLS) model, which is capable of avoiding multicollinearity in regression by efficiently extracting information from the high-dimensional market data. By using its well-known ability, we can incorporate auxiliary variables that improve the predictive accuracy. We provide an empirical application of our proposed methodology in terms of its ability to predict the conditional average log return and the volatility of crude oil prices via exponential smoothing, Bayesian stochastic volatility, and GARCH (generalized autoregressive conditional heteroskedasticity) models, respectively. In particular, what we call functional data analysis (FDA) traces in this article are obtained via the FPLS regression from both the crude oil returns and auxiliary variables of the exchange rates of major currencies. For forecast performance evaluation, we compare out-of-sample forecasting accuracy of the standard models with FDA traces to the accuracy of the same forecasting models with the observed crude oil returns, principal component regression (PCR), and least absolute shrinkage and selection operator (LASSO) models. We find evidence that the standard models with FDA traces significantly outperform our competing models. Finally, they are also compared with the test for superior predictive ability and the reality check for data snooping. Our empirical results show that our new methodology significantly improves predictive ability of standard models in forecasting the latent average log return and the volatility of financial time series.**Joint Structural Break Detection and Parameter Estimation in High-Dimensional Non-Stationary VAR Models**

Abolfazl Safikhani (Columbia University)

Assuming stationarity is unrealistic in many time series applications. A more realistic alternative is to allow for piecewise stationarity, where the model is allowed to change at given time points. We propose a three-stage procedure for consistent estimation of both structural change points and parameters of high-dimensional piecewise vector autoregressive (VAR) models. In the first step, we reformulate the change point detection problem as a high-dimensional variable selection one, and propose a penalized least square estimator using a total variation penalty. We show that the proposed penalized estimation method over-estimates the number of change points. We then propose a backward selection criterion in conjunction with a penalized least square estimator to tackle this issue. In the last step of our procedure, we estimate the VAR parameters in each of the segments. We prove that the proposed procedure consistently detects the number of change points and their locations. We also show that the procedure consistently estimates the VAR parameters. The performance of the method is illustrated through several simulation scenarios and real data examples.**Improved return level estimation via a weighted likelihood latent spatial extremes model**

Joshua Hewitt (Colorado State University)

Uncertainty in return level estimates for rare events, like the intensity of large rainfall events, makes it difficult to develop strategies to mitigate related hazards, like flooding. Latent spatial extremes models reduce uncertainty by exploiting spatial dependence in statistical characteristics of extreme events to borrow strength across locations. However, these estimates can have poor properties due to model misspecification: latent spatial extremes models do not account for tail dependence, which is spatial dependence in the extreme events themselves. We improve estimates from latent spatial extremes models by proposing a weighted likelihood that uses the extremal coefficient to incorporate information about tail dependence during estimation. While max-stable process models directly incorporate tail dependence, latent spatial extremes models are still popular because max-stable process models are intractable for many real datasets. We adopt a hierarchical Bayesian framework to conduct inference, use simulation to evaluate the weighted model, and apply our model to improve return level estimates for Colorado rainfall events with 1% annual exceedance probability.**Partial Distance Correlation Screening for High Dimensional Time Series**

Kashif Yousuf (Columbia University)

High dimensional time series datasets are becoming increasingly common in various fields such as economics, finance, meteorology, and neuroscience. Given this ubiquity of time series data, it is surprising that very few works on variable screening are directly applicable to time series data, and even fewer methods developed which utilize the unique aspects of time series data. This paper introduces several model free screening methods developed specifically to deal with dependent and/or heavy tailed response and covariate time series. These methods are based on the distance correlation and the partial distance correlation. Methods are developed both for univariate response models, such as nonlinear autoregressive models with exogenous predictors, and multivariate response models such as linear or nonlinear VAR models. Sure screening properties are proved for our methods, which depend on the moment conditions, and the strength of dependence in the response and covariate processes, amongst other factors. Dependence is quantified by functional dependence measures (Wu 2005), and $\beta$-mixing coefficients, and the results rely on the use of Nagaev and Rosenthal type inequalities for dependent random variables. Finite sample performance of our methods is shown through extensive simulation studies, and we include an application to macroeconomic forecasting.**Properties and Bayesian fitting of restricted Boltzmann machines**

Andee Kaplan (Duke University)

A restricted Boltzmann machine (RBM) is an undirected graphical model constructed for discrete or continuous random variables, with two layers, one hidden and one visible, and no conditional dependency within a layer. In recent years, RBMs have risen to prominence due to their connection to deep learning. By treating a hidden layer of one RBM as the visible layer in a second RBM, a deep architecture can be created. RBMs are thought to thereby have the ability to encode very complex and rich structures in data, making them attractive for supervised learning. However, the generative behavior of RBMs is largely unexplored. In this presentation, we discuss the relationship between RBM parameter specification in the binary case and model properties such as degeneracy, instability and uninterpretability. We also describe the difficulties that arise in likelihood-based and Bayes fitting of such (highly flexible) models, especially as Gibbs sampling (quasi-Bayes) methods are often advocated for the RBM model structure.**A Longitudinal Model for Functional Connectivity Networks Using Resting-State fMRI**

Brian Hart (University of Minnesota, Twin Cities)

Many neuroimaging studies collect functional magnetic resonance imaging (fMRI) data in a longitudinal manner. However, the current network modeling literature lacks a general framework for analyzing functional connectivity (FC) networks in fMRI data obtained from a longitudinal study. In this work, we build a novel longitudinal FC network model using a variance components approach. First, for all subjects' visits, we account for the autocorrelation inherent in the fMRI time series data using a non-parametric technique. Second, we use a generalized least squares approach to estimate 1) the within-subject variance component shared across the population, 2) the FC network, and 3) the FC network's longitudinal trend. Our novel method for longitudinal FC networks seeks to account for the within-subject dependence across multiple visits, the variability due to the subjects being sampled from a population, and the autocorrelation present in fMRI data, while restricting the number of parameters in order to make the method computationally feasible and stable. We develop a permutation testing procedure to draw valid inference on group differences in the baseline FC network and change in the FC network over longitudinal time between a set of patients and a comparable set of controls. To examine performance, we run a series of simulations and apply the model to longitudinal fMRI data collected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Overall, we found no difference in the global FC networks between Alzheimer's disease patients and healthy controls, but did find differing local aging patterns in the FC between the left hippocampus and the right posterior cingulate cortex.**Nonparametric estimation of the conditional distribution at regression boundary points**

Srinjoy Das (University of California, San Diego)**Predictive inference for locally stationary time series**

Srinjoy Das (University of California, San Diego)**A Bayesian Multivariate Functional Dynamic Linear Model**

David S. Matteson (Cornell University)

We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data—functional, time dependent, and multivariate components—we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach**Interpretable Vector AutoRegressions with Exogenous Time Series**

David S. Matteson (Cornell University)

The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. While several proposals have been made to sparsely estimate large VAR models, the estimation of large VARX models is under-explored. Moreover, typically these sparse proposals involve a lasso-type penalty and do not incorporate lag selection into the estimation procedure. As a consequence, the resulting models may be difficult to interpret. We propose a lag-based hierarchically sparse estimator, called “HVARX”,

for large VARX models. HVARX provides a highly interpretable model.**High-dimensional Spectral Density Regularization by Thresholding**

Sumanta Basu (Cornell University)