# Poster session and reception

Tuesday, April 24, 2018 - 4:00pm - 6:00pm

Lind 400

**High-dimensional Spectral Density Regularization by Thresholding**

Yiming Sun (Cornell)**Detect Hidden Correlation from Large-Scale Police Report Data**

Shixiang Zhu (Georgia Institute of Technology)

The main scope of the project is to develop an efficient algorithm that can detect the correlation between crime incidences, using streaming police report data, both the structured (e.g., time, location, category) and unstructured (the so-called “free-text”), as quickly as possible.**Functional Stochastic Volatility**

Phillip Jang (Cornell University)**A Bayesian Multivariate Functional Dynamic Linear Model**

Daniel Kowal (Rice University)

We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data-- functional, time dependent, and multivariate components--we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time–frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online.**Simulating the German day-ahead power market: Preserving the within-day correlation structure**

Rune Nielsen (Aalborg University)

This poster is on the simulation of the hour-based German day-ahead power auction, where I apply high-dimensional vector autoregressive (VAR) models to capture the effects of the market infrastructure on the day-ahead auction. This approach supports that the rich within-day correlation structure of the auction is simulated, which will be important for valuing power related real assets with production timing issues (e.g. batteries). In order to handle the large dimensionality of the data created by the VAR approach, lasso, adaptive lasso and elastic net shrinkage methods are applied. The assessment of these methods is done by performing a classic forecast quality assessment, combined with an evaluation of the relevant statistical properties of each model. After estimating the model parameters, simulation from the fitted model is carried out using the residuals.**Clustering and Similarity for Spatial Time Series Data**

Laura Tupper (Williams College)

We present a comparison of methods for the analysis of spatio-temporal data, examining wind speed over time at a range of locations. Clustering, representative selection, and outlier detection rely on the choice of a measure of similarity or distance between observations. We show four approaches--Euclidean distance, band distance, Dynamic Time Warping, and the variogram--that make use of different levels of the data's inherent structure, from unordered vectors to true spatio-temporal data. We illustrate the effects of using each distance measure by applying the same clustering algorithm to each, and comparing the resulting distance matrices and cluster assignments for each observation. We find that each measure highlights different behavior, and that for the most part there is little agreement between the results, indicating that the selection of a distance or similarity is ultimately problem-dependent.**Hypothesis Testing in High-dimensional Auto-regressive Models**

Lili Zheng (University of Wisconsin, Madison)

High-dimensional auto-regressive models arise in a number of applications including neuroscience and social networks. There has seen substantial development in theory and methodology for parameter estimation involved in this model, while the hypothesis testing method is less developed. We constructed our test statistic based on decorrelated score function, and proved uniform convergence rate of the statistic both under null and alternative hypothesis. The major challenges we tackled are concentration inequalities in time series data, and we assume white noise follows sub-Gaussian distribution rather than Gaussian, which makes this more tricky.**Calibrating imperfect mathematical models by Scaled Gaussian Stochastic process and jointly robust prior**

Mengyang Gu (Johns Hopkins University)

In calibrating mathematical models, the calibration parameters are often confounded with the discrepancy function modeled by a Gaussian stochastic process (GaSP). In this work, we propose the scaled Gaussian stochastic process (S-GaSP), a novel stochastic process for calibration and prediction. We provide a computational feasible way for this process and the GaSP model becomes a special case of the S-GaSP. Compared with the GaSP calibration, the calibrated parameters enable the calibrated computer model itself to predict well on the reality by the S-GaSP calibration. A new class of prior, called the jointly robust prior, will be introduced for mathematical model emulation, variable selection and calibration.**Multisensor Fusion of Remotely Sensed Vegetation Indices using Space-Time Dynamic Linear Models**

Maggie Johnson (Statistical and Applied Mathematical Sciences Institute (SAMSI))

Characterizing growth cycle events in vegetation, such as spring green-up, from massive spatiotemporal remotely sensed vegetation index datasets is desirable for a wide area of applications. For example, the timings of plant life cycle events are very sensitive to weather conditions, and are often used to assess the impacts of changes in weather and climate. Likewise, quantifying and predicting changes in crop greenness can have a large impact on agricultural strategies. However, due to the current limitations of imaging spectrometers, remote sensing datasets of vegetation with high temporal frequency of measurements must have lower spatial resolution, and vice versa. In this research, we propose a space-time dynamic linear model to fuse high temporal frequency data (MODIS) with high spatial resolution data (Landsat) to create daily, 30 meter resolution data products of a vegetation greenness index. The method models spatiotemporal dependence within and across different landcover types with a parsimonious multivariate Matern latent process and is able to handle the spatial change-of-support problem, as well as the high percentage of missing values present in the data. To handle the massive size of the data, we utilize a fast variogram/crossvariogram estimation procedure, and a moving window Kalman smoother to produce a daily, 30 meter resolution data product with associated uncertainty.**Approximate factor models with strongly correlated idiosyncratic errors**

Jiahe Lin (University of Michigan)

We consider the estimation of approximate factor models for time series data, where strong serial and cross-sectional correlations amongst the idiosyncratic component are present. This setting comes up naturally in many applications, but existing approaches in the literature rely on the assumption that such correlations are weak, leading to mis-specification of the number of factors selected and consequently inaccurate inference. In this paper, we explicitly incorporate the dependent structure present in the idiosyncratic component through lagged values of the observed multivariate time series. We formulate a constrained optimization problem to estimate the factor space and the transition matrices of the lagged values {\em simultaneously}, wherein the constraints reflect the low rank nature of the factor space and the sparsity of the transition matrices. Theoretical properties of the obtained estimates are established and an easy-to-implement computational procedure for empirical work is introduced. The performance of the model has been evaluated on synthetic data, and further illustrated on a dataset involving weekly log-returns of 75 US large financial institutions for the 2001--16 period.**A Test for Isotropy on a Sphere using Spherical Harmonic Functions**

Indranil Sahoo (North Carolina State University)

Analysis of geostatistical data is often based on the assumption that the spatial random field is isotropic. This assumption, if erroneous, can adversely affect model predictions and statistical inference. Nowadays many applications consider data over the entire globe and hence it is necessary to check the assumption of isotropy on a sphere. In this paper, a test for spatial isotropy on a sphere is proposed. The data are first projected onto the set of spherical harmonic functions. Under isotropy, the spherical harmonic coefficients are uncorrelated whereas they are correlated if the underlying fields are not isotropic. This motivates a test based on the sample correlation matrix of the spherical harmonic coefficients. In particular, we use the largest eigenvalue of the sample correlation matrix as the test statistic. Extensive simulations are conducted to assess the Type I errors of the test under different scenarios. We show how temporal correlation affects the test and provide a method for handling temporal correlation. We also gauge the power of the test as we move away from isotropy. The method is applied to the near-surface air temperature data which is part of the HadCM3 model output. Although we do not expect global temperature fields to be isotropic, we propose several anisotropic models with increasing complexity, each of which has an isotropic process as model component and we apply the test to the isotropic component in a sequence of such models as a method of determining how well the models capture the anisotropy in the fields.**Locally Stationary Processes and their Application to Climate Modeling**

Shreyan Ganguly (The Ohio State University)

In the analysis of climate it is common to build non-stationary spatio-temporal processes, often based on assuming a random walk behavior over time for the error process. Random walk models may be a poor description for the temporal dynamics, leading to inaccurate uncertainty quantification. Likewise, assuming stationarity in time may also not be a reasonable assumption, especially under climate change. Based on ongoing research, we present a class of time-varying processes that are stationary in space, but locally stationary in time. We demonstrate how to carefully parameterize the time-varying model parameters in terms of a transformation of basis functions. We present some properties of parameter estimates when the process is observed at a finite collection of spatial locations, and apply our methodology to a Bayesian spatio-temporal climate analysis.**Changepoints Within the Daily Routine**

Simon Taylor (University of Lancaster)

When we are older, health problems often lead to a change in our daily routine. Howz is a smart system of discreet sensors that monitor your daily routine in order to alert you to abnormal patterns. Natural changes in daily routine (e.g. sleep, mealtime and out-of-house) can be described as a changepoint problem with the repeated daily cycle. This circular timeframe, rather than traditional linear time, poses a number of challenges to changepoint analysis that is discussed in the poster.**Time series modeling in high-resolution functional magnetic resonance imaging**

Benjamin Risk (Emory University)

In functional magnetic resonance imaging (fMRI), three-dimensional images of the blood-oxygen-level dependent signal are recorded across time. Functional MRI is a useful tool for estimating which parts of the brain are activated by tasks, e.g., a motor task in which a subject moves their fingers. Some of the popular neuroimaging software packages use simple models of time series errors, such as AR(1). Recent developments in acquisition protocols have increased the temporal resolution in fMRI, which has renewed concerns that the conventional imaging software does not adequately model serial correlation. We introduce the application of autoregressive moving-average models (ARMA) with regressors to the massive univariate analysis of single-subject fMRI data, where model order is chosen using Akaike’s information criterion corrected for small sample size. We estimated ARMA models at thirty thousand locations for thirty subjects in a motor task from the Human Connectome Project. Control variables orthogonal to the conventional covariate matrix were introduced to gain insight into type one error rates. We also analyzed the factors affecting type one error rates in simulations. Co-authored with Mingrui Liang.**Modeling Precipitation Extremes using Log-Histospline**

Whitney Huang (Statistical and Applied Mathematical Sciences Institute (SAMSI))

One of the commonly used approaches to modeling univariate extremes is the peaks-over-threshold (POT) method. The POT method models exceedances over a (sufficiently high/low) threshold as a generalized Pareto distribution (GPD). This method requires the selection of a threshold that might affect the estimates. Here we propose an alternative method, the Log-Histospline (LHSpline), to explore modeling the tail behavior and the remainder of the density in one step using the full range of the data. LHSpline applies a smoothing spline model to a finely binned histogram of the log transformed data to estimate its log density. By construction, a LHSpline estimation is constrained to have polynomial tail behavior, a feature commonly observed in geophysical observations. We illustrate LHSpline method by analyzing precipitation data collected in Houston, Texas.**The copula directional dependence by stochastic volatility models**

Jong-Min Kim (University of Minnesota, Morris)

This research proposes a copula directional dependence by using a bivariate Gaussian copula beta regression with Stochastic Volatility (SV) models for marginal distributions. With the asymmetric copula generated by the composition of two Plackett copulas, we show that our SV copula directional dependence by the Gaussian copula beta regression model is superior to the Kim and Hwang (2016) copula directional dependence by an asymmetric GARCH model in terms of the percent relative efficiency of bias and mean squared error. To validate our proposed method with the real data, we use Brent Crude Daily Price (BRENT), West Texas Intermediate Daily Price (WTI), the Standard & Poor’s 500 (SP) and US 10-Year Treasury Constant Maturity Rate (TCM) so that our copula SV directional dependence is overall superior to the Kim and Hwang (2016) copula directional dependence by an asymmetric GARCH model in terms of precision by the percent relative efficiency of mean squared error. In terms of forecasting using the real financial data, we also show that the Bayesian SV model of the uniform transformed data by a copula conditional distribution yields an improvement on the volatility models such as GARCH and SV.**Interpretable Vector AutoRegressions with Exogenous Time Series**

Ines Wilms (Katholieke Universiteit Leuven)

The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. While several proposals have been made to sparsely estimate large VAR models, the estimation of large VARX models is under-explored. Moreover, typically these sparse proposals involve a lasso-type penalty and do not incorporate lag selection into the estimation procedure. As a consequence, the resulting models may be difficult to interpret. We propose a lag-based hierarchically sparse estimator, called HVARX, for large VARX models. HVARX provides a highly interpretable model.**Multi-subject EEG Spectral Density Estimation Using Nested Dirichlet Processes**

Brian Hart (University of Minnesota, Twin Cities)

Electroencephalography (EEG) is a non-invasive neuroimaging modality that captures electrical brain activity many times per second. We seek to estimate power spectra from EEG data for three channels (locations) that was gathered for 580 adolescent twin pairs through the Minnesota Twin Family Study (MTFS). Typically, spectral analysis methods treat time series from each subject separately, and independent spectral densities are fit to each time series. In our EEG data collected on twins, it is reasonable to assume that time series may have similar underlying characteristics, and borrowing information across subjects can significantly improve estimation. We propose a Nested Dependent Bernstein Polynomial Dirichlet Process model to estimate the power spectrum of the EEG signal for each subject while incorporating information from subject characteristics. We then leverage the MTFS twin study design to estimate the heritability of EEG power spectra. Adjusting for covariates and clustering subjects allows for a flexible model with the ability to share information across subjects and detect differences in EEG power spectra for differing subject characteristics. We provide a method to estimate spectral densities through data driven smoothing of periodograms within and across subjects while requiring minimal user input to tuning parameters.**Visualization and assessment of spatio-temporal covariance properties**

Huang Huang (Statistical and Applied Mathematical Sciences Institute (SAMSI))

Spatio-temporal covariances are important for describing thespatio-temporal variability of underlying random fields in geostatistical data. For second-order stationary random fields, there exist subclasses of covariance functions that assume a simpler spatio-temporal dependence structure with separability and full symmetry. However, it is challenging to visualize and assess separability and full symmetry from spatio-temporal observations. In this work, we propose a functional data analysis approach that constructs test functions using the cross-covariances from time series observed at each pair of spatial locations. These test functions of temporal lags summarize the properties of separability or symmetry for the given spatial pairs. We use functional boxplots to visualize the functional median and the variability of the test functions, where the extent of departure from zero at all temporal lags indicates the degree of non-separability or asymmetry. We also develop a rank-based nonparametric testing procedure for assessing the significance of the non-separability or asymmetry. Essentially, the proposed methods only require the analysis of temporal covariance functions. Thus, a major advantage over existing approaches is that there is no need to estimate any covariance matrix for selected spatio-temporal lags. The performances of the proposed methods are examined by simulations with various commonly used spatio-temporal covariance models. To illustrate our methods in practical applications, we apply it to real datasets, including weather station data and climate model outputs.**Pseudo-likelihood Based Consistent approach for High Dimensional Bayesian VAR Models**

Satyajit Ghosh (University of Florida)

Vector autoregressive (VAR) models aim to capture linear temporal interdependencies among multiple time series. They have been widely used in macro and financial econometrics and more recently have found novel applications in functional genomics and neuroscience. These applications have also accentuated the need to investigate the behavior of the VAR model in a high dimensional regime, which will provide novel insights into the role of temporal dependence for regularized estimates of the models parameters. However, hardly anything is known regarding posterior model selection consistency for Bayesian VAR models in such regimes. In this work we develop a pseudo-likelihood based Bayesian approach to variable selection in high dimensional VAR models by considering hierarchical normal priors on the autoregressive coefficients as well as on the model space. We show the posterior ratio and strong selection consistency of the proposed method in the sense that the posterior probability of the true model converges to one even when the dimension $p$ of the VAR system grows nearly exponentially with the sample size $n$. Moreover posterior model selection holds without imposing any sparsity or diagonal structure assumption on the error covariance matrix $\Sigma$. As long as the maximum eigenvalue of $\Sigma$ remains bounded above by a constant we can recover the true model as $n \rightarrow \infty$. And most importantly the strong selection consistency does not require any restriction on the true maximum number of edges. As a by-product of these results, we also establish strong selection consistency for the high-dimensional linear regression model with serially correlated errors.**Severity Burn Maps: A multi-temporal application of breakpoint estimation**

Inder Tecuapetla-Gomez (CONACYT-CONABIO)

In recent years the interconnection between statistical methodology and remote sensing techniques has contributed to generate tools to assess structural changes in diverse ecosystems. We have applied a breakpoint estimation method to a data cube of high resolution satellite images taken on La Primavera, Jalisco, Mexico from 2003 to 2016 aiming to identify burn areas and assess its severity automatically; La Primavera is a National Protected Area and is close to the second biggest metropolitan area in Mexico. We propose a series of yearly maps from which burn severity can be monitored.**Multivariate Spectral Downscaling for Multiple Air Pollutants**

Yawen Guan (Statistical and Applied Mathematical Sciences Institute (SAMSI))

Fine particulate matter (PM2.5) is a mixture of air pollutants that, at a high concentration level, has adverse effects on human health. The speciated fine PM have complex spatial-temporal and cross dependence structures that should be accounted for in estimating the spatial-temporal distribution of each component. Two major sources of air quality data are used: monitoring data and the Community Multiscale Air Quality (CMAQ) model. The monitoring stations provide fairly accurate measurements of the pollutants, however they are sparse in space and take measurements at a coarse time resolution, typically 1-in-3 or 1-in-6 days. On the other hand, the CMAQ model provides daily concentration levels of each component with complete spatial coverage on a grid; these model outputs, however, need to be evaluated and calibrated to the monitoring data.

In my poster, I will provide a brief introduction to the data and present a statistical method to combine these two data sources for estimating speciated PM2.5 concentration. Our method models the complex relationships between monitoring data and CMAQ output at different spatial resolutions, and we model the spatial dependence and cross dependence among the components of speciated PM2.5. We apply the method to compare Community Multiscale Air Quality (CMAQ) model output with speciated PM 2.5 measurements in the United States in 2011.