| Nii Attoh-Okine (University of Delaware) |
Toward a Real-Time
Implementation of Adaptive and Automated Digital Image Analysis |
Abstract: Empirical Mode Decomposition is a multi-resolution data analysis technique that can break down a signal or image into different time-frequency modes which uniquely reflect the variations in the signal or image. The algorithm has gained much attention lately due its performance in a number of applications (especially in climate and biomedical data analysis).
Recently, civil infrastructure managers have begun exploring the potential application of the algorithm to automate the process of detecting cracks in infrastructure images. Unfortunately, the adaptive nature of the algorithm increases its computation cost to an extent that limits a wide practical application of the algorithm.
The approach involves four main steps: Extrema detection, Interpolation, Sifting and Reconstruction. Extrema detection and interpolation consumes about 70% of the computational time. Hence we focus on ways to implement these procedures in parallel by taking advantage of the Matlab Parallel Computing Toolbox. |
| Francis Bach (Institut National de Recherche en Informatique Automatique (INRIA)) |
Tutorial - Structured sparsity-inducing norms through submodular functions |
| Abstract: Sparse methods for supervised learning aim at finding good
linear predictors from as few variables as possible, i.e., with small
cardinality of their supports. This combinatorial selection problem is
often turned into a convex optimization problem by replacing the
cardinality function by its convex envelope (tightest convex lower bound),
in this case the L1-norm. In this work, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge
or structural constraints which are common in many applications:
namely, we show that for nondecreasing submodular set-functions, the
corresponding convex envelope can be obtained from its Lovasz extension,
a common tool in submodular analysis. This defines a family of polyhedral
norms, for which we provide generic algorithmic tools (subgradients
and proximal operators) and theoretical results (conditions for support
recovery or high-dimensional inference). By selecting specific submodular
functions, we can give a new interpretation to known norms, such as those
based on rank-statistics or grouped norms with potentially overlapping
groups; we also define new norms, in particular ones that can be used
as non-factorial priors for supervised learning. |
| Florentina Bunea (Cornell University) |
Simultaneous variable and rank selection for optimal estimation
of high dimensional matrices |
| Abstract: Modeling high dimensional data has become a ubiquitous task,
and reducing the dimensionality a typical solution. This talk is devoted
to optimal dimension reduction in sparse multivariate response regression models in which both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower dimensional approximations of the parameter matrix in such models. Neither of them alone is tailored to simultaneous selection and rank reduction, therefore neither can be minimax rate optimal for low rank models corresponding to
just a few of the total number of available predictors. There are no estimators, to date, proved to have this property. The work presented here attempts to bridge this gap. We point out that, somewhat surprisingly, a procedure consisting in first selecting predictors, then reducing the rank, does not always yield estimates that are minimax adaptive. We show that this can be remedied by performing joint rank and predictor selection. The methods we propose are based on penalized least squares, with new penalties that are designed with the appropriate notions of matrix sparsity in mind. Of special importance is the fact that these penalties are rather robust to data adaptive choices of the tuning parameters, making them particularly appealing in practice. Our results can be immediately applied to standard multivariate analyses such as sparse PCA or CCA, as particular cases, or can be easily extended to inference in functional data. We support our theoretical results with an extensive simulation study and offer a concrete data example. |
| Kamalika Chaudhuri (University of California, San Diego) |
Spectral Methods for Learning Multivariate Latent Tree Structure |
Abstract: This talk considers the problem of learning the structure of a broad
class of multivariate latent variable tree models, which include a
variety of continuous and discrete models (including the widely used
linear-Gaussian models, hidden Markov models, and Markov
evolutionary trees). The setting is one where we only have samples
from certain observed variables in the tree and our goal is to
estimate the tree structure (emph{i.e.}, the graph of how the underlying
hidden variables are connected to the observed variables). We
provide the Spectral Recursive Grouping algorithm, an efficient and
simple bottom-up procedure for recovering the tree structure from
independent samples of the observed variables. Our finite sample
size bounds for exact recovery of the tree structure elucidate
certain natural dependencies on underlying statistical and
structural properties of the underlying joint
distribution. Furthermore, our sample complexity guarantees have no
explicit dependence on the dimensionality of the observed variables,
making the algorithm applicable to many high-dimensional settings.
At the heart of our algorithm is a spectral quartet test for
determining the relative topology of a quartet of variables, which
only utilizes certain second order statistics and is based
on the determinants of certain cross-covariance matrices.
This talk is based on joint work with A. Anandkumar, D. Hsu, S. Kakade, L. Song and T. Zhang. |
| Rainer Dahlhaus (Ruprecht-Karls-Universität Heidelberg) |
Survey/Tutorial lecture - Locally stationary processes |
Abstract: Locally stationary processes are models for nonstationary time series whose behaviour can locally be approximated by a stationary process. In this situation the classical characteristics of the process such as the covariance function at some lag k, the spectral density at some frequency lambda, or eg the parameter of an AR(p)-process are curves which change slowly over time. The theory of locally stationary processes allows for a rigorous asymptotic treatment of various inference problems for such processes. Although technically more difficult many problems are related to classical curve estimation problems.
We give an overview over different methods of nonparametric curve estimation for locally stationary processes. We discuss stationary methods on segments, wavelet expansions, local likelihood methods and nonparametric maximum likelihood estimates.
Furthermore we discuss the estimation of instantaneous frequencies for processes with a nonlinear phase. |
| Richard A. Davis (Columbia University) |
Detection of Structural Breaks and Outliers in Time Series
|
Abstract: We will consider the problem of modeling a class of non-stationary time series with outliers using piecewise autoregressive (AR) processes. The number and locations of the piecewise autoregressive
segments, as well as the orders of the respective AR processes, are assumed to be unknown and each piece may be contaminated with an unknown number of innovational and/or additive outliers. The minimum description length principle is applied to compare various segmented AR fits to the data. The goal is to find the “best” combination of the number of segments, the lengths of the segments, the orders of the piecewise AR processes, the number and type of outliers. Such a “best” combination is implicitly defined as the optimizer of a MDL criterion. Since the optimization is carried over a large number of configurations of segments and positions of outliers, a genetic algorithm is used to find optimal or near optimal solutions.
Strategies for accelerating the procedure will also be described. Numerical results from simulation experiments and real data analyses show that the procedure enjoys excellent empirical properties. (This is joint work with Thomas Lee and Gabriel Rodriguez-Yam.) |
| Bjorn Engquist (University of Texas at Austin) |
Sampling strategies and mesh refinements |
| Abstract: Simulations of multiscale solutions to differential equations often require a reduction in the number of unknowns compared to those in a standard discretization. This is in order to limit the memory requirement and the computational complexity. We will discuss common reduction techniques based on mesh refinements in the light of classical and novel sampling theorems from information theory. |
| Jianqing Fan (Princeton University) |
Vast Volatility Matrix Estimation using High Frequency Financial Data |
| Abstract: A stylized feature of high-frequency financial data is non-synchronized, mixed frequency data. This together with market micro-structural noise pose significant challenges on the estimation of the vast-dimensional volatility matrix. This talk outlines the volatility matrix estimation using high-dimensional high-frequency data from the perspective of portfolio selection.
Specifically, we propose the use of ``pairwise-refresh time" and
``all-refresh-time" methods for estimation vast covariance matrix and
compare their merits in the portfolio selection. We also establish the
large deviation results of the estimates, which guarantee good properties
of the estimated volatility matrix in vast asset allocation with gross
exposure constraints. Extensive numerical studies are made via carefully
designed simulation studies. Comparing with the methods based on low
frequency daily data, our methods can capture the most recent trend of the
time varying volatility and correlation, hence provide more accurate
guidance of the portfolio allocation of the next time period. The
advantage of use high-frequency data is significant in our simulation and
empirical studies, which consist of 30 Dow-Jones industrial stocks. |
| Alfred O. Hero III (University of Michigan) |
Correlation screening from random matrices: phase transitions and Poisson limits |
| Abstract: Random matrices are measured in many areas of engineering, social science, and natural science. When the rows of the matrix are random samples of a vector of dependent variables the sample correlations between the columns of the matrix specify a correlation graph that can be used to explore the dependency structure. However, as the number of variables increases screening for highly correlated variables becomes futile due to a high dimensional phase transition phenomenon: with high probability most variables will have large sample correlations even when there is no correlation present. We will present a general theory for predicting the phase transition and provide Poisson limit theorems that can be used to predict finite sample behavior of the correlation graph. We apply our framework to the problem of hub discovery in correlation and partial correlation (concentration) graphs. We illustrate our correlation screening framework in computational biology and, in particular, for discovery of influential variables in gene regulation networks. This is joint work with Bala Rajaratnam at Stanford University. |
| Thomas Yizhao Hou (California Institute of Technology) |
Sparse time-frequency representation of multiscale data by nonlinear optimization |
| Abstract: We introduce a sparse time-frequency analysis method for analyzing nonlinear and non-stationary data. This method is inspired by the Empirical Mode Decomposition method (EMD) and the recently developed compressed sensing theory. The main idea is to look for the sparsest representation of multiscale data within the largest possible dictionary consisting of intrinsic mode functions. We formulate this as a nonlinear optimization problem. Further, we propose an iterative algorithm to solve this nonlinear optimization problem recursively. Numerical examples will be given to demonstrate the robustness of our method and comparison will be made with the EMD method. One advantage of performing such decomposition is to preserve some intrinsic physical properties of the signal, such as trend and instantaneous frequency. Our method provides a mathematical foundation for the EMD method and can be considered to some extent
as a nonlinear version of compressed sensing. |
| Norden Huang (National Central University) |
Instantaneous Frequencies and Trends for Nonstationary Nonlinear Data |
| Abstract: As scientific research getting increasingly sophistic, the inadequacy of the traditional data analysis methods is becoming glaringly obvious. The only alternative is to break away from these limitations; we should let data speak for themselves so that the results could reveal the full range of consequence of nonlinearity and nonstationarity. To do so, we need new paradigm of data analysis methodology without a priori basis to fully accommodating the variations of the underlying driving mechanisms. That is an adaptive data analysis method, which will be introduced in this talk. The emphases will be on the Empirical Mode Decomposition method and its applications in determining the trend, instantaneous frequency and the implications on quantifying the degree of nonstationary and nonlinearity. |
| Norden Huang (National Central University), Man-Li Wu (National Aeronautics and Space Administration (NASA)) |
Using 2D-EEMD to understand African Easterly Waves and their role in initiation and development of tropical storms/hurricanes. |
Abstract: We are using the 2D-EEMD to increase our understanding of the
relationships between the
African Easterly Waves and the initiation and development of the
tropical storms/hurricanes over the
Northern Atlantic Ocean.
We are using large scale parameters including zonal and meridional wind,
sea surface temperature,
atmospheric stability parameters, ocean heat capacity, relative
humidity, low level vorticity, and
vertical wind shear to carry out our studies. We will focus on case
studies during July, August, and
September of 2005 and 2006.
by Man-Li C. Wu (1), Siegfried D. Schubert (2), and
Norden E. Huang (3)
(1) and (2) NASA/GSFC/GMAO
(3) NCU, Taiwan. |
| Wenbo Li (University of Delaware) |
Small Value Probability and Metric Entropy |
| Abstract: Small value probabilities or small deviations study the decay probability that positive random variables behave near zero.
In particular, small ball probabilities provide the asymptotic behavior of the probability measure inside a ball as the radius of the
ball tends to zero.
Metric entropy is defined as the logarithmic of the minimum covering number of compact set by balls of very small radius.
In this talk, we will provide an overview on precise connections between the small value probability
of Gaussian process/measure and the metric entropy of the associated compact operator.
Interplays and applications to many related problems/areas will be given, along with various fundamental tools and techniques from high dimensional probability theory.
Throughout, we use Brownian motion (integral operator) and Brownian
sheets (tensored Brownian motion and operator) as illustrating examples. |
| Mauro Maggioni (Duke University) |
Graph-based and multiscale geometric methods for the analysis of data sets in high dimensions |
| Abstract: We discuss several geometric approaches to the study of data sets in high-dimensional spaces that are assumed to have low-intrinsci dimension. On the one hand, we discuss diffusion geometry type of approaches, based on constructing proximity graphs between the data points in high dimensions, and using diffusion processes on such graphs to construct coordinates for the data and perform learning tasks. On the other hand, we discuss novel approaches based on multiscale geometric analysis, based on studying the behavior of local covariance matrices of the data at different scales. We apply this latter approach to intrinsic dimension estimation, multiscale dictionary learning, and density estimation. |
| Azadeh Moghtaderi (Queen's University) |
Applications of the Empirical Mode Decomposition to Trend Filtering and Gap Filling |
Abstract: In this talk, we will address two fundamental problems in time series analysis: The problem of filtering (or extracting) low-frequency trend, and the problem of interpolating missing data. We propose nonparametric techniques to solve these two problems. These techniques are based on the empirical mode decomposition (EMD), and accordingly they are named EMD trend filtering and EMD
interpolation. The EMD is an algorithm which decomposes a time series into an additive superposition of oscillatory components. These components are known as the intrinsic mode functions (IMFs) of the time series.
The basic observation behind EMD trend filtering is that higher-order IMFs exhibit slower oscillations. Since low-frequency trend is comprised of slow oscillations relative to the residual time series, in many situations it should be captured by one or more of the higher-order IMFs. It remains to answer the question "How many higher-order IMFs are needed?" We propose a method to answer this question automatically. This method is based on empirical evidence, which indicates that certain changes in the IMFs' energies and zero crossing numbers demarcate the trend and residual time series. To illustrate the performance of EMD trend filtering, we apply it to artificial time series containing different types of trend, as well as several real-world time series.
The latter group includes Standards & Poor 500 index data, environmental data, sunspot numbers, and data gathered from an urban bicycle rental system.
On the other hand, EMD interpolation is based on the following basic observation: If a time series has missing data, then the IMFs of the time series have missing data as well. However, interpolating the missing data of each IMF individually should be easier than interpolating the missing data of the original time series. This is because each IMF varies much more slowly than the original time series, and also because the IMFs have regularity properties which can be exploited. The performance of EMD interpolation is illustrated by its application to artificial time series, as well as speech data and pollutant data. |
| Robert Nowak (University of Wisconsin-Madison) |
Sequential Analysis in High Dimensional Multiple Testing and Sparse Recovery |
| Abstract: This talk considers the problem of high-dimensional multiple testing and sparse recovery from the perspective of sequential analysis. For example, consider testing to decide which of n>1 genes are differentially expressed in a certain disease. Suppose each test takes the form H0: X ~ N(0,1) vs. H1: X ~ N(m,1), where N(m,1) is the Gaussian density with mean m and variance 1, and H1 represents the case of differential expression. The signal-to-noise ratio (SNR=m2) must be sufficiently large in order to guarantee a small probability of error. Non-sequential methods, which use an equal number of samples per test, require that the SNR grows like log(n). This is simply because the squared magnitude of the largest of n independent N(0,1) noises is on the order of log(n). Non-sequential methods cannot overcome this curse of dimensionality. Sequential testing, however, is capable of breaking the curse by adaptively focusing experimental resources on certain components at the expense of others. I will discuss a simple sequential method, in the spirit of classic sequential probability ratio testing that only requires the SNR to grow like log(s), where s is the number of tests in which the data are distributed according to H1. In applications like gene testing, s is much much smaller than n and so the gains can be quite significant. For example, if s = log n, then the sequential method is reliable if the SNR is O(log log n), compared to the O(log n) requirement of non-sequential approaches. The distinction is even more dramatic when the tests involve certain one-sided distributions which arise, for example, in cognitive radio spectrum sensing. In this situation the gains can be doubly exponential in n. |
| Sofia C Olhede (University College London) |
Inference for Harmonizable Processes |
Abstract: Most analysis methods for nonstationary processes are developed from using the local Fourier transform of the process. Such methods have the theoretical underpinning developed for a number of (overlapping) classes of processes, such as oscillatory processes (Priestley (1965)), and locally stationary processes (Dahlhaus (1997), Silverman (1957) and Grenier (1983)). These processes have strongly related inference mechanisms, that naturally tie in with the model specification. Unfortunately all of these methods rely on implicit strong smoothing of the data, removing much of the observed bandwidth of the process.
The class of Harmonizable processes is considerably larger than that of locally stationary processes. The representation of a harmonizable process in terms of the Loeve spectrum does not naturally suggest any given inference procedure. We shall discuss possible subsets of harmonizable processes for which inference is possible, and discuss natural specification of such inference methods. We shall also treat practical examples in neuroscience and oceanography, showing how viewing a process as harmonizable may yield important insights into the data.
This is joint work with Hernando Ombao and Jonathan Lilly, sponsored by the EPSRC. |
| Chung-Kang Peng (Harvard Medical School) |
Instantaneous Frequencies and Trends in Biomedical Applications |
| Abstract: In recent years, we developed a framework to study fluctuating signals generated by complex systems, specifically, for biological systems. We have demonstrated that it is possible to gain significant understanding of a complex biological system via studying its spontaneous fluctuations in time. This framework has tremendous utility for biomedical problems. However, a major technical challenge is that those fluctuating time series generated by biological systems are often nonlinear and nonstationary, and thus, require novel analysis techniques that can handle nonstationary trends and quantify instantaneous frequencies. |
| Luis Rademacher (Ohio State University) |
Randomized algorithms for the approximation of matrices |
| Abstract: I will discuss recent algorithmic developments for the classical problem of approximating a given matrix by a low-rank matrix. This is motivated by the need of faster algorithms for very large data and certain applications that want the approximating matrix to have rows living in the span of only a few rows of the original matrix, which adds a combinatorial twist to the problem. The novel algorithms are based on sampling rows randomly (but non-uniformly) and random projection, from which a low rank approximation can be computed. |
| Jack W Silverstein (North Carolina State University) |
Estimating Population Eigenvalues From Large Dimensional Sample Covariance Matrices |
| Abstract: I will begin by reviewing limiting properties of the eigenvalues of a class of sample covariance matrices, where the vector dimension and the sample size
approach infinity, their ratio approaching a positive constant. These properties are relevant in situations in multivariate analysis where the vector dimension is large, but the number of samples needed to adequately approximate the population matrix (as prescribed in standard statistical procedures) cannot be attained. Work has been done in estimating the population eigenvalues from those of the sample covariance matrix. I will introduce a method devised by X. Mestre, and will present an extension of his method to another ensemble of random matrices important in wireless communications. |
| Richard L. Smith (University of North Carolina) |
Trends in Climatic Data |
| Abstract: The Fourth Assessment Report of the Intergovernmental Panel on Climate Change reported that “warming of the climate system is unequivocal” and also that it is “very likely” (probability greater than ninety percent) that “most of the observed warming is due to the observed increase in anthropogenic greenhouse gas concentrations.” The choice of wording implies not only the existence of a statistically significant trend in temperature averages, but also that it is possible to distinguish between trends due to greenhouse gases and those due to other causes, including natural variation. In this talk, I shall describe some of the statistical methods that have been used to justify such statements. Some key points include determining the statistical significance of trends in time series subject to various kinds of autocorrelation assumptions, comparisons between trends in observed data and in climate models, and extensions from temperature averages to other forms of meteorological data, such a extreme precipitation or counts of tropical cyclones, where the statistical conclusions are not so clear-cut. |
| Stanislaw Szarek (Case Western Reserve University) |
Phase transitions for high-dimensional random quantum states |
| Abstract: We study generic properties of high-dimensional quantum states.
Specifically, for a random state on H=C^d otimes C^d obtained by partial tracing a random pure state on H otimes C^s, we consider the problem whether it is typically separable or typically entangled. We show that a threshold occurs when the ancilla dimension s is of order roughly d^3.
Our approach allows to similarly analyze other properties such as
for example positive partial transpose (PPT). Mathematically, each problem reduces to studying properties (albeit somewhat exotic) of high-dimensional complex Wishart matrices. The arguments rely on high-dimensional probability, classical convexity, random matrices, and geometry of Banach spaces.
Based on joint work with G. Aubrun and D. Ye. |
| Vladimir Temlyakov (University of South Carolina) |
Greedy approximation in compressed sensing |
| Abstract: While the ℓ1 minimization technique plays an important role in designing computationally tractable recovery methods in compressed sensing, its complexity is still impractical for many applications. An attractive alternative to the ℓ1 minimization is a family of greedy algorithms. We will discuss several greedy algorithms from the point of view of their practical applicability and theoretical performance. |
| Joel Tropp (California Institute of Technology) |
Tutorial - User-friendly tail bound for sums of random matrices |
| Abstract: We introduce a new methodology for studying the maximum eigenvalue of a sum of independent, symmetric random matrices. This approach results in a complete set of extensions to the classical tail bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Freedman, Hoeffding, and McDiarmid. Results for rectangular random matrices follow as a corollary. This research is inspired by the work of Ahlswede--Winter and Rudelson--Vershynin, but the new methods yield essential improvements over earlier results. We believe that these techniques have the potential to simplify the study a large class of random matrices. |
| Joel Tropp (California Institute of Technology) |
Survey/Tutorial lecture - Sparsity, Regularization, and Applications
|
| Abstract: The purpose of this tutorial is to describe the intellectual apparatus that supports some modern techniques in statistics, machine learning, signal processing, and related areas. The main ingredient is the observation that many types of data admit parsimonious representations, i.e., there are far fewer degrees of freedom in the data than the ambient dimension would suggest. The second ingredient is a collection of tractable algorithms that can effectively search for a parsimonious solution to a data analysis problem, even though these types of constraints tend to be nonconvex. Together, the theory of sparsity and sparse regularization can be viewed as a framework for treating a huge variety of computational problems in data analysis. We conclude with some applications where these two ideas play a dominant role. |
| Joshua Trzasko (Mayo Clinic) |
Locally Low-Rank Promoting Reconstruction Strategies for
Accelerated Dynamic MRI Series Applications |
| Abstract: Several recent works
have suggested that dynamic MRI series reconstructions can be
significantly improved by promoting low-rank (LR) structure in the
estimated image series when it is reshaped into Casorati form (e.g., for
a 2D acquisition, NxNxT series -> N^2xT matrix). When T<< N2, the rank
of the (reshaped) true underlying image may actually be not much less
than T. For such cases, aggressive rank reduction will result in
temporal/parametric blurring while only modest rank reduction will fail
to remove noise and/or undersampling artifact. In this work, we propose
that a restriction to spatially localized operations can potentially
overcome some of the challenges faced by global LR promoting methods
when the row and column dimensions of the Casorati matrix differ
significantly. This generalization of the LR promoting image series
reconstruction paradigm, which we call Locally Low Rank (LLR) image
recovery, spatially decomposes an image series estimate into a
(redundant) collection of overlapping and promotes that each block, when
put into Casorati form, be independently LR. As demonstrated for
dynamic cardiac MRI, LLR-based image reconstruction can simultaneously
provide improvements in noise reduction and spatiotemporal resolution
relative to global LR-based methods be practically realized using
efficient and highly parallelizable computational strategies. |
| Van H. Vu (Yale University) |
Principal component analysis with random noise |
Abstract: Computing the first few singular vectors of a large matrix is a problem
that frequently comes up in statistics and numerical analysis. Given the presence of noise, exact calculation is hard to achieve, and the following problem is of importance:
How much does a small perturbation to the matrix change the singular vectors ?
Answering this question, classical theorems, such as those of Davis-Kahan and Wedin, give tight estimates for the worst-case scenario.
In this paper, we show that if the perturbation (noise) is random and our matrix has low rank, then
better estimates can be obtained. Our method relies on high dimensional geometry and is different from those used an earlier studies. |
| Zhaohua Wu (Florida State University) |
On the Trend, Detrending and Variability of Nonlinear and Non-stationary Time Series |
| Abstract: Determining trend and implementing detrending operations are important steps in data analysis. Traditionally, various extrinsic methods have been used to determine the trend, and to facilitate a detrending operation. In this talk, a simple and logical definition of trend is given for any nonlinear and non-stationary time series as an intrinsically determined monotonic function within a certain temporal span (most often that of the data span), or a function in which there can be at most one extremum within that temporal span. Being intrinsic, the method to derive the trend has to be adaptive. This definition of trend also presumes the existence of a natural timescale. All these requirements suggest the Empirical Mode Decomposition method (EMD) as the logical choice of algorithm for extracting various trends from a data set. Once the trend is determined, the corresponding detrending operation can be implemented. With this definition of trend, the variability of the data on various timescales can also be derived naturally. Climate data are used to illustrate the determination of the intrinsic trend and natural variability. |
| Lexing Ying (University of Texas at Austin) |
Fast algorithms for oscillatory kernels |
| Abstract: Computations involving oscillatory kernels arise in many computational problems associated with high frequency wave phenomena. In this talk, we will discuss recent progress on developing fast linear complexity algorithms for several problems of this type. Two common ingredients of these algorithms are discovering new structures with low-rank property and developing new hierarchical decompositions based on these structures. Examples will include N-body problems of the Helmholtz kernel, sparse Fourier transforms, Fourier integral operators, and fast Helmholtz solvers. |
| Bin Yu (University of California, Berkeley) |
Tutorial - Spectral clustering and high-dim stochastic block model for undirected and directed graphs |
Abstract: In recent years network analysis have become the focus of much
research in many fields including biology, communication studies,
economics, information science, organizational studies,
and social psychology. Communities or clusters of highly connected actors
form an essential feature in the structure of several empirical networks.
Spectral clustering is a popular and computationally feasible method to
discover these communities.
The Stochastic Block Model is a social network model with well defined
communities. This talk will give conditions for spectral clustering to
correctly estimate the community membership of nearly all nodes.
These asymptotic results are the first clustering results that
allow the number of clusters in the model to
grow with the number of nodes, hence the name high-dimensional.
Moreover, I will present on-going work on directed spectral clustering
for networks whose edges are directed, including the enron data as an example. |
| Ofer Zeitouni (University of Minnesota) |
Limiting distributions of eigenvalues for non-normal matrices |
| Abstract: I will describe recent results (obtained jointly with A. Guionnet P. Wood) concerning perturbations of non-normal random matrices and their stabilization by additive noise. This builds on techniques introduced earlier in
the context of the ``single ring theorem'', by Guionnet, Krishnapur, and the speaker. |
| Shuheng Zhou (University of Michigan) |
High-dimensional covariance estimation based on Gaussian graphical models |
| Abstract: Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1-penalization methods. This talk presents the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1-norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. Under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Reference: http://arxiv.org/abs/1009.0530 |
| Alexandre d'Aspremont (Princeton University) |
Tutorial - SDP, GFA, ETC... |
| Abstract: This tutorial will briefly cover recent developments in semidefinite programming and some of their geometrical applications. |