September 7 - 9, 2011
Empirical Mode Decomposition is a multi-resolution data analysis technique that can break down a signal or image into different time-frequency modes which uniquely reflect the variations in the signal or image. The algorithm has gained much attention lately due its performance in a number of applications (especially in climate and biomedical data analysis).
Recently, civil infrastructure managers have begun exploring the potential application of the algorithm to automate the process of detecting cracks in infrastructure images. Unfortunately, the adaptive nature of the algorithm increases its computation cost to an extent that limits a wide practical application of the algorithm.
The approach involves four main steps: Extrema detection, Interpolation, Sifting and Reconstruction. Extrema detection and interpolation consumes about 70% of the computational time. Hence we focus on ways to implement these procedures in parallel by taking advantage of the Matlab Parallel Computing Toolbox.
Keywords of the presentation: High dimensional matrices, multivariate response regression, low rank approximation, variable selection, oracle inequalities, sparse PCA, sparse CCA, matrix sparsity, minimax adaptive estimation
Modeling high dimensional data has become a ubiquitous task,
and reducing the dimensionality a typical solution. This talk is devoted
to optimal dimension reduction in sparse multivariate response regression models in which both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower dimensional approximations of the parameter matrix in such models. Neither of them alone is tailored to simultaneous selection and rank reduction, therefore neither can be minimax rate optimal for low rank models corresponding to
just a few of the total number of available predictors. There are no estimators, to date, proved to have this property. The work presented here attempts to bridge this gap. We point out that, somewhat surprisingly, a procedure consisting in first selecting predictors, then reducing the rank, does not always yield estimates that are minimax adaptive. We show that this can be remedied by performing joint rank and predictor selection. The methods we propose are based on penalized least squares, with new penalties that are designed with the appropriate notions of matrix sparsity in mind. Of special importance is the fact that these penalties are rather robust to data adaptive choices of the tuning parameters, making them particularly appealing in practice. Our results can be immediately applied to standard multivariate analyses such as sparse PCA or CCA, as particular cases, or can be easily extended to inference in functional data. We support our theoretical results with an extensive simulation study and offer a concrete data example.
Please see Lecture Slides
Locally stationary processes are models for nonstationary time series whose behaviour can locally be approximated by a stationary process. In this situation the classical characteristics of the process such as the covariance function at some lag k, the spectral density at some frequency lambda, or eg the parameter of an AR(p)-process are curves which change slowly over time. The theory of locally stationary processes allows for a rigorous asymptotic treatment of various inference problems for such processes. Although technically more difficult many problems are related to classical curve estimation problems.
We give an overview over different methods of nonparametric curve estimation for locally stationary processes. We discuss stationary methods on segments, wavelet expansions, local likelihood methods and nonparametric maximum likelihood estimates.
Furthermore we discuss the estimation of instantaneous frequencies for processes with a nonlinear phase.
We will consider the problem of modeling a class of non-stationary time series with outliers using piecewise autoregressive (AR) processes. The number and locations of the piecewise autoregressive
segments, as well as the orders of the respective AR processes, are assumed to be unknown and each piece may be contaminated with an unknown number of innovational and/or additive outliers. The minimum description length principle is applied to compare various segmented AR fits to the data. The goal is to find the “best” combination of the number of segments, the lengths of the segments, the orders of the piecewise AR processes, the number and type of outliers. Such a “best” combination is implicitly defined as the optimizer of a MDL criterion. Since the optimization is carried over a large number of configurations of segments and positions of outliers, a genetic algorithm is used to find optimal or near optimal solutions.
Strategies for accelerating the procedure will also be described. Numerical results from simulation experiments and real data analyses show that the procedure enjoys excellent empirical properties. (This is joint work with Thomas Lee and Gabriel Rodriguez-Yam.)
Keywords of the presentation: Sampling, multiscale methods, mesh refinement
Simulations of multiscale solutions to differential equations often require a reduction in the number of unknowns compared to those in a standard discretization. This is in order to limit the memory requirement and the computational complexity. We will discuss common reduction techniques based on mesh refinements in the light of classical and novel sampling theorems from information theory.
Keywords of the presentation: sparse representation of multiscale data, compressed sensing, nonlinear optimization
We introduce a sparse time-frequency analysis method for analyzing nonlinear and non-stationary data. This method is inspired by the Empirical Mode Decomposition method (EMD) and the recently developed compressed sensing theory. The main idea is to look for the sparsest representation of multiscale data within the largest possible dictionary consisting of intrinsic mode functions. We formulate this as a nonlinear optimization problem. Further, we propose an iterative algorithm to solve this nonlinear optimization problem recursively. Numerical examples will be given to demonstrate the robustness of our method and comparison will be made with the EMD method. One advantage of performing such decomposition is to preserve some intrinsic physical properties of the signal, such as trend and instantaneous frequency. Our method provides a mathematical foundation for the EMD method and can be considered to some extent
as a nonlinear version of compressed sensing.
In the data analysis community, many recent methods are
based on the so-called Empirical Mode Decomposition (EMD), and
different methods have been proposed to decompose non-linear and
non-stationary signals sampled on non-uniform grids effectively. The
traditional EMD employs a cubic spline method to interpolate the
envelope based on the extrema of the data. This method may cause
over/under-shootings which is a fundamental drawback since it may
destroy some physical properties of the intrinsic mode functions. In
order to generate strictly mathematically defined envelope, we propose
an optimization-based empirical mode decomposition (OEMD).
We demonstrate how to extend our OEMD method from
one-dimensional signals to multi-dimensional data. We illustrate with
several numerical examples that our method is superior over others
with respect to different criteria like relative errors or
the extraction of texture.
Furthermore, we employ our optimization-based interpolation in
normalization-based instantaneous frequency analyses which show its
potential especially for non-uniform sampled data.
We are using the 2D-EEMD to increase our understanding of the
relationships between the
African Easterly Waves and the initiation and development of the
tropical storms/hurricanes over the
Northern Atlantic Ocean.
We are using large scale parameters including zonal and meridional wind,
sea surface temperature,
atmospheric stability parameters, ocean heat capacity, relative
humidity, low level vorticity, and
vertical wind shear to carry out our studies. We will focus on case
studies during July, August, and
September of 2005 and 2006.
by Man-Li C. Wu (1), Siegfried D. Schubert (2), and
Norden E. Huang (3)
(1) and (2) NASA/GSFC/GMAO
(3) NCU, Taiwan.
Keywords of the presentation: Empirical Mode Decomposition, Instantaneous frequency, Trend
As scientific research getting increasingly sophistic, the inadequacy of the traditional data analysis methods is becoming glaringly obvious. The only alternative is to break away from these limitations; we should let data speak for themselves so that the results could reveal the full range of consequence of nonlinearity and nonstationarity. To do so, we need new paradigm of data analysis methodology without a priori basis to fully accommodating the variations of the underlying driving mechanisms. That is an adaptive data analysis method, which will be introduced in this talk. The emphases will be on the Empirical Mode Decomposition method and its applications in determining the trend, instantaneous frequency and the implications on quantifying the degree of nonstationary and nonlinearity.
Recent advances in sensor and data acquisition technologies have brought to light new classes of signals containing typically several data channels. Currently, such signals are almost invariably processed channel-wise, thus, not making use of their full potential. It is, therefore, imperative to design multivariate extensions of the existing nonlinear and nonstationary analysis algorithms, as they are expected to give more insight in to the dynamics and the interdependence between the multiple channels of the signal in hand. To this end, multivariate extensions of empirical mode decomposition algorithm and their advantages with regards to multivariate nonstationary data analysis are presented. Some important properties of such extensions are also explored ,including their ability to exhibit wavelet-like dyadic filter bank structures for white Gaussian noise (WGN), and their capacity to align similar oscillatory modes from multiple channels. Owing to the generality of the proposed methods, an improved multivariate EMD-based algorithm is introduced which solves some inherent problems in the original EMD algorithm. Finally, to demonstrate the potential of the proposed methods, simulations on real world signals (wind, inertial motion data, and RGB images) are presented to support the analysis.
Keywords of the presentation: Time series analysis, empirical mode decomposition, trend filtering, interpolation
In this talk, we will address two fundamental problems in time series analysis: The problem of filtering (or extracting) low-frequency trend, and the problem of interpolating missing data. We propose nonparametric techniques to solve these two problems. These techniques are based on the empirical mode decomposition (EMD), and accordingly they are named EMD trend filtering and EMD
interpolation. The EMD is an algorithm which decomposes a time series into an additive superposition of oscillatory components. These components are known as the intrinsic mode functions (IMFs) of the time series.
The basic observation behind EMD trend filtering is that higher-order IMFs exhibit slower oscillations. Since low-frequency trend is comprised of slow oscillations relative to the residual time series, in many situations it should be captured by one or more of the higher-order IMFs. It remains to answer the question "How many higher-order IMFs are needed?" We propose a method to answer this question automatically. This method is based on empirical evidence, which indicates that certain changes in the IMFs' energies and zero crossing numbers demarcate the trend and residual time series. To illustrate the performance of EMD trend filtering, we apply it to artificial time series containing different types of trend, as well as several real-world time series.
The latter group includes Standards & Poor 500 index data, environmental data, sunspot numbers, and data gathered from an urban bicycle rental system.
On the other hand, EMD interpolation is based on the following basic observation: If a time series has missing data, then the IMFs of the time series have missing data as well. However, interpolating the missing data of each IMF individually should be easier than interpolating the missing data of the original time series. This is because each IMF varies much more slowly than the original time series, and also because the IMFs have regularity properties which can be exploited. The performance of EMD interpolation is illustrated by its application to artificial time series, as well as speech data and pollutant data.
Keywords of the presentation: nonstationary process, harmonizable process
Most analysis methods for nonstationary processes are developed from using the local Fourier transform of the process. Such methods have the theoretical underpinning developed for a number of (overlapping) classes of processes, such as oscillatory processes (Priestley (1965)), and locally stationary processes (Dahlhaus (1997), Silverman (1957) and Grenier (1983)). These processes have strongly related inference mechanisms, that naturally tie in with the model specification. Unfortunately all of these methods rely on implicit strong smoothing of the data, removing much of the observed bandwidth of the process.
The class of Harmonizable processes is considerably larger than that of locally stationary processes. The representation of a harmonizable process in terms of the Loeve spectrum does not naturally suggest any given inference procedure. We shall discuss possible subsets of harmonizable processes for which inference is possible, and discuss natural specification of such inference methods. We shall also treat practical examples in neuroscience and oceanography, showing how viewing a process as harmonizable may yield important insights into the data.
This is joint work with Hernando Ombao and Jonathan Lilly, sponsored by the EPSRC.
Keywords of the presentation: Time series analysis, Empirical Mode Decomposition
In recent years, we developed a framework to study fluctuating signals generated by complex systems, specifically, for biological systems. We have demonstrated that it is possible to gain significant understanding of a complex biological system via studying its spontaneous fluctuations in time. This framework has tremendous utility for biomedical problems. However, a major technical challenge is that those fluctuating time series generated by biological systems are often nonlinear and nonstationary, and thus, require novel analysis techniques that can handle nonstationary trends and quantify instantaneous frequencies.
The Fourth Assessment Report of the Intergovernmental Panel on Climate Change reported that “warming of the climate system is unequivocal” and also that it is “very likely” (probability greater than ninety percent) that “most of the observed warming is due to the observed increase in anthropogenic greenhouse gas concentrations.” The choice of wording implies not only the existence of a statistically significant trend in temperature averages, but also that it is possible to distinguish between trends due to greenhouse gases and those due to other causes, including natural variation. In this talk, I shall describe some of the statistical methods that have been used to justify such statements. Some key points include determining the statistical significance of trends in time series subject to various kinds of autocorrelation assumptions, comparisons between trends in observed data and in climate models, and extensions from temperature averages to other forms of meteorological data, such a extreme precipitation or counts of tropical cyclones, where the statistical conclusions are not so clear-cut.
The purpose of this tutorial is to describe the intellectual apparatus that supports some modern techniques in statistics, machine learning, signal processing, and related areas. The main ingredient is the observation that many types of data admit parsimonious representations, i.e., there are far fewer degrees of freedom in the data than the ambient dimension would suggest. The second ingredient is a collection of tractable algorithms that can effectively search for a parsimonious solution to a data analysis problem, even though these types of constraints tend to be nonconvex. Together, the theory of sparsity and sparse regularization can be viewed as a framework for treating a huge variety of computational problems in data analysis. We conclude with some applications where these two ideas play a dominant role.
Several recent works
have suggested that dynamic MRI series reconstructions can be
significantly improved by promoting low-rank (LR) structure in the
estimated image series when it is reshaped into Casorati form (e.g., for
a 2D acquisition, NxNxT series -> N^2xT matrix). When T<< N2, the rank
of the (reshaped) true underlying image may actually be not much less
than T. For such cases, aggressive rank reduction will result in
temporal/parametric blurring while only modest rank reduction will fail
to remove noise and/or undersampling artifact. In this work, we propose
that a restriction to spatially localized operations can potentially
overcome some of the challenges faced by global LR promoting methods
when the row and column dimensions of the Casorati matrix differ
significantly. This generalization of the LR promoting image series
reconstruction paradigm, which we call Locally Low Rank (LLR) image
recovery, spatially decomposes an image series estimate into a
(redundant) collection of overlapping and promotes that each block, when
put into Casorati form, be independently LR. As demonstrated for
dynamic cardiac MRI, LLR-based image reconstruction can simultaneously
provide improvements in noise reduction and spatiotemporal resolution
relative to global LR-based methods be practically realized using
efficient and highly parallelizable computational strategies.
Keywords of the presentation: trend, detrending, nonlinear nonstationary time series, empirical mode decomposition
Determining trend and implementing detrending operations are important steps in data analysis. Traditionally, various extrinsic methods have been used to determine the trend, and to facilitate a detrending operation. In this talk, a simple and logical definition of trend is given for any nonlinear and non-stationary time series as an intrinsically determined monotonic function within a certain temporal span (most often that of the data span), or a function in which there can be at most one extremum within that temporal span. Being intrinsic, the method to derive the trend has to be adaptive. This definition of trend also presumes the existence of a natural timescale. All these requirements suggest the Empirical Mode Decomposition method (EMD) as the logical choice of algorithm for extracting various trends from a data set. Once the trend is determined, the corresponding detrending operation can be implemented. With this definition of trend, the variability of the data on various timescales can also be derived naturally. Climate data are used to illustrate the determination of the intrinsic trend and natural variability.
Computations involving oscillatory kernels arise in many computational problems associated with high frequency wave phenomena. In this talk, we will discuss recent progress on developing fast linear complexity algorithms for several problems of this type. Two common ingredients of these algorithms are discovering new structures with low-rank property and developing new hierarchical decompositions based on these structures. Examples will include N-body problems of the Helmholtz kernel, sparse Fourier transforms, Fourier integral operators, and fast Helmholtz solvers.