| Institute for Mathematics and its Applications University of Minnesota 400 Lind Hall 207 Church Street SE Minneapolis, MN 55455 |
2011-2012 Program
See http://www.ima.umn.edu/2011-2012/ for a full description of the 2011-2012 program on Mathematics of Information.
| 11:15am-12:15pm | Regularization Methods for Probabilistic Optimization | Gabriela Martínez (University of Minnesota) | Keller Hall 3-180 | PS |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 | ||
| 2:30pm-3:30pm | The "P vs. NP" Problem: Efficient Computation, Internet Security, and the Limits to Human Knowledge | Avi Wigderson (Institute for Advanced Study) | Lind Hall 305 | S |
| 7:00pm-8:00pm | Cryptography: Secrets and Lies, Knowledge and Trust | Avi Wigderson (Institute for Advanced Study) | Willey Hall 175 | PUB11.3.11 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 11:15am-12:15pm | Exact semidefinite relaxation for the clustering and biclustering problems | Brendan P.W. Ames (University of Minnesota) | Keller Hall 3-180 | PS |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 8:00am-8:45am | Coffee and Registration | Keller Hall 3-176 | W11.14-18.11 | |
| 8:45am-9:00am | Welcome and Introduction | Keller Hall 3-180 | W11.14-18.11 | |
| 9:00am-10:00am | Tutorial - Quantitative biological imaging: from cells to numbers | Jean-Christophe Olivo-Marin (Institut Pasteur) | Keller Hall 3-180 | W11.14-18.11 |
| 10:00am-10:15am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 10:15am-11:15am | Tutorial - Quantitative biological imaging: from cells to numbers | Jean-Christophe Olivo-Marin (Institut Pasteur) | Keller Hall 3-180 | W11.14-18.11 |
| 11:15am-11:30am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 11:30am-12:30pm | Scalable methods for analyzing 3D/4D/5D Images of Complex and Dynamic Biological Microenvironments | Badri Roysam (University of Houston) | Keller Hall 3-180 | W11.14-18.11 |
| 12:30pm-2:00pm | Lunch | W11.14-18.11 | ||
| 2:00pm-3:00pm | How is Biology as a Science Possible? | Edward R. Dougherty (Texas A & M University) | Keller Hall 3-180 | W11.14-18.11 |
| 3:00pm-3:15pm | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 3:15pm-4:15pm | Pattern Discovery in Biomedical Modelling | Yi-Ping Phoebe Chen (La Trobe University) | Keller Hall 3-180 | W11.14-18.11 |
| 4:15pm-4:30pm | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 4:30pm-5:30pm | Towards Closing the Semantic Gap in Decision Support for Clinical Sequencing in Oncology | Nevenka Dimitrova (Philips Research Laboratory) | Keller Hall 3-180 | W11.14-18.11 |
| 7:00pm-9:00pm | Social Hour Stub and Herbs 227 SE Oak St Minneapolis, MN 55455 Map | W11.14-18.11 |
| 8:30am-9:00am | Coffee | Keller Hall 3-176 | W11.14-18.11 | |
| 9:00am-10:00am | Tutorial - Translational medical imaging | Guillermo R. Sapiro (University of Minnesota) | Keller Hall 3-180 | W11.14-18.11 |
| 10:00am-10:15am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 10:15am-11:15am | Tutorial - Translational medical imaging | Guillermo R. Sapiro (University of Minnesota) | Keller Hall 3-180 | W11.14-18.11 |
| 11:15am-11:30am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 11:30am-12:30pm | Interactive Segmentation of 3D Imagery | Allen Tannenbaum (Boston University) | Keller Hall 3-180 | W11.14-18.11 |
| 12:30pm-2:00pm | Lunch | W11.14-18.11 | ||
| 2:00pm-3:00pm | Correlation-based variable selection for differential gene expression analysis | Alfred O. Hero III (University of Michigan) | Keller Hall 3-180 | W11.14-18.11 |
| 3:00pm-3:15pm | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 3:15pm-4:15pm | Impact of Sensing Structure in Classification of High-Dimensional Medical Informatics Data | W. Clem Karl (Boston University) | Keller Hall 3-180 | W11.14-18.11 |
| 4:15pm-4:30pm | Group Photo | W11.14-18.11 | ||
| 4:30pm-6:30pm | Reception and Poster Session | Lind Hall 400 | W11.14-18.11 | |
| Poster - Inverse Perturbation for Optimal Intervention in Genetic Regulatory Networks | Nidhal Bouaynaya (University of Arkansas) Dan Schonfeld (University of Illinois) | |||
| Poster - Exact Sample Conditioned Performance of Estimators for Classification Error Under Bayesian Contexts | Lori Dalton (Texas A & M University) | |||
| Poster - 3D Active Meshes: a versatile mathematical tool to study cell shape and motility in live microscopy | Alexandre Dufour (Institut Pasteur) | |||
| Poster - Misaligned Principal Component Analysis | Alfred O. Hero III (University of Michigan) | |||
| Poster - Automatic Detection Of Vaccine Adverse Reactions By Incorporating Historical Medical Conditions | Zhonghua Jiang (University of Minnesota) George Karypis (University of Minnesota) |
| 8:30am-9:00am | Coffee | Keller Hall 3-176 | W11.14-18.11 | |
| 9:00am-10:00am | Inverse problems in tomographic imaging | Charles A. Bouman (Purdue University) | Keller Hall 3-180 | W11.14-18.11 |
| 10:00am-10:15am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 10:15am-11:15am | Inverse problems in tomographic imaging | Charles A. Bouman (Purdue University) | Keller Hall 3-180 | W11.14-18.11 |
| 11:15am-11:30am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 11:30am-12:30pm | Designing fast and robust algorithms for medical image processing | Elsa Angelini (Telecom ParisTech) | Keller Hall 3-180 | W11.14-18.11 |
| 12:30pm-2:00pm | Lunch | W11.14-18.11 | ||
| 2:00pm-3:00pm | Imaging development of the embryonic heart over multiple spatial dimensions, modalities and time-scales | Michael Liebling (University of California, Santa Barbara) | Keller Hall 3-180 | W11.14-18.11 |
| 3:00pm-3:15pm | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 3:15pm-4:15pm | Multiscale Set Estimation in Biomedical Inverse Problems | Rebecca Willett (Duke University) | Keller Hall 3-180 | W11.14-18.11 |
| 4:15pm-4:30pm | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 4:30pm-5:30pm | Large-Scale Multiple Testing in Medical Informatics | Robert Nowak (University of Wisconsin-Madison) | Keller Hall 3-180 | W11.14-18.11 |
| 8:30am-9:00am | Coffee | Keller Hall 3-176 | W11.14-18.11 | |
| 9:00am-10:00am | Tutorial - Massive scale of DNA sequencing data presents challenges in processing and analysis | Fuli Yu (Baylor College of Medicine) | Keller Hall 3-180 | W11.14-18.11 |
| 10:00am-10:15am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 10:15am-11:15am | Tutorial - Massive scale of DNA sequencing data presents challenges in processing and analysis | Fuli Yu (Baylor College of Medicine) | Keller Hall 3-180 | W11.14-18.11 |
| 11:15am-11:30am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 11:30am-12:30pm | Large Data Challenges in Medical Imaging and Bioinformatics | Ahmed H. Tewfik (University of Texas at Austin) | Keller Hall 3-180 | W11.14-18.11 |
| 12:30pm-2:00pm | Lunch | W11.14-18.11 | ||
| 2:00pm-3:00pm | Estimation of Individual’s Risk for Complex Trait Diseases: Methods and Challenges using Allelic Specific Expression and Mapping Cis-variance from NGS Data (RNA and Exome Sequencing Data) | Shipra Agrawal (BioCOS Life Sciences Private Limited) | Keller Hall 3-180 | W11.14-18.11 |
| 3:00pm-3:15pm | Coffee Break | W11.14-18.11 | ||
| 3:15pm-4:15pm | Intervention and Control of Large-Scale Gene Regulatory Networks | Nidhal Bouaynaya (University of Arkansas) Dan Schonfeld (University of Illinois) | Keller Hall 3-180 | W11.14-18.11 |
| 4:15pm-5:00pm | Discussion | Keller Hall 3-180 | W11.14-18.11 |
| 8:30am-9:00am | Coffee | Keller Hall 3-176 | W11.14-18.11 | |
| 9:00am-10:00am | 3-D reconstructions of biological macromolecular complexes by electron microscopy | Peter C Doerschuk (Cornell University) | Keller Hall 3-180 | W11.14-18.11 |
| 10:00am-10:15am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 10:15am-11:15am | Modeling and Acceleration of Maximum A Posteriori Reconstruction from Large CT Datasets | Jean-Baptiste Thibault (GE Healthcare) | Keller Hall 3-180 | W11.14-18.11 |
| 11:15am-11:30am | Coffee Break | Keller Hall 3-176 | W11.14-18.11 | |
| 11:30am-12:30pm | Using Algorithms to Produce High Content Information from Cell and Tissue Images | Jens Rittscher (General Electric) | Keller Hall 3-180 | W11.14-18.11 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 11:15am-12:15pm | TBA | Xin Liu (University of Minnesota) | Keller Hall 3-180 | PS |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| All Day | Thanksgiving Day. The IMA is closed. |
| All Day | Floating Holiday. The IMA is closed. |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 11:15am-12:15pm | A novel M-estimator for robust PCA | Teng Zhang (University of Minnesota) | Keller Hall 3-180 | PS |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
| 2:30pm-3:00pm | Coffee break | Lind Hall 400 |
Event Legend: |
|
| PS | IMA Postdoc Seminar |
| PUB11.3.11 | Cryptography: Secrets and Lies, Knowledge and Trust, Avi Wigderson (Institute for Advanced Study) |
| S | Seminar |
| W11.14-18.11 | Large Data Sets in Medical Informatics |
| Shipra Agrawal (BioCOS Life Sciences Private Limited) | Estimation of Individual’s Risk for Complex Trait Diseases: Methods and Challenges using Allelic Specific Expression and Mapping Cis-variance from NGS Data (RNA and Exome Sequencing Data) |
| Abstract: In current genetic and clinical research, identification of disease specific variations particularly from non-coding RNA and cis-elements is a major bottleneck. Massively parallel sequencing of exome and transcriptome is widely being used to effectively interrogate the key protein-coding and non-coding RNA regions. In such scenarios, the deep sequencing data of exome and transcriptome could be used for estimating levels of allele-specific expression in diseased vs. control samples (case-control cohorts) and hence the identification of disease specific signatures. This provides a functional basis to identify the differentially expressed alleles, mono-allelic expression, imprinting of alleles and allele regulated alternative splicing. All such data and approaches together make a stronger strategy to predict the disease susceptibility alleles and their functional role in disease mechanism.
Our approaches at BioCOS Life Sciences using the Next Generation Sequencing (NGS) data analysis for the precise detection of allele’s differential expression becomes important in identifying causal/susceptibility genes by mapping their variance in both coding and non-coding DNA/RNA regions. I will present our current research work on developing methods and data processing approaches, which can be applied in identification of the susceptibility alleles using the combined approaches from RNA-Seq and Exome-Seq data as well as directly predicting them from RNA-Seq data. The talk will also discuss the existing bottlenecks in the area and approaches to obtain high quality results with a focus on calling genotypes from RNA-Seq data. |
|
| Brendan P.W. Ames (University of Minnesota) | Exact semidefinite relaxation for the clustering and biclustering problems |
| Abstract: Identifying clusters of similar objects in data plays a significant role in a wide range of applications such as information retrieval, pattern recognition, computational biology, and image processing. We consider as a model problem for clustering the average weight k-disjoint clique problem (WKDC), whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the average edge weights over the complete subgraphs induced by these cliques. We show that this problem can be formulated as a nonconvex quadratic maximization problem and subsequently relaxed to a semidefinite program using symmetric matrix lifting. Although the WKDC problem is NP-hard, we show that this relaxation is exact under certain assumptions on the input graph. That is, the optimal solution for the original hard combinatorial problem can be recovered directly from the solution of the relaxed problem for certain program inputs. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a small number of outliers. This approach also yields a semidefinite relaxation for the biclustering problem with similar recovery guarantees. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. We pose this problem as partitioning of a weighted complete bipartite graph such that the edge weight within the resulting bicliques is maximized. As in our analysis of the WKDC problem, we consider a nonconvex quadratic programming formulation for this problem, and relax to semidefinite programming using matrix lifting. As before, we show that the correct partition of the objects and features can be recovered from the optimal solution of the semidefinite relaxation, in the case that the input instance consists of several disjoint sets of objects exhibiting similar features. |
|
| Elsa Angelini (Telecom ParisTech) | Designing fast and robust algorithms for medical image processing |
| Abstract: Quantification from medical images involves three levels of developments:
- Modeling of the organs
- Extraction of the visual features
- Formulation of the quantification task
Regarding organ modeling, geometric encoding of the shape is designed as a tradeoff between flexibility and robustness. Encoding of the variability within a population is a complex task that can have drawbacks when handling pathological cases. On the other hand, generic anatomical knowledge, especially regarding the context of the organ, can provide rich and more robust information, with spatial relations for example.
Regarding the visual features, images are richer than they appear in terms of tissue signature, embedding multiscale information. The field of image processing has evolved slowly in the design of sophisticated organ-specific visual features, the majority of them remaining very basic. Future challenges remain open regarding the need to correlate multi-modal tissue signatures with physiological characteristics.
Formulation of the quantification task such as segmentation, tracking or detection of longitudinal changes can be formulated either with a deterministic or stochastic formalism. Algorithms remain poorly robust to image quality, lack of image calibration, parameter tuning and presence of pathologies. Finer interactions between algorithmic tuning and image content and better calibration of image content is currently under investigation to address this lack of robustness and reproducibility. These three components of the pipeline will be discussed, with illustrations on brain, cardiac liver and obstetric data. Emphasis will be paid to the constraints of being fast and robust, in the context of handling large data sets with great variability and pathologies. |
|
| Nidhal Bouaynaya (University of Arkansas), Dan Schonfeld (University of Illinois) | Intervention and Control of Large-Scale Gene Regulatory Networks |
| Abstract: We formulate the optimal intervention problem in genetic regulatory networks as a minimal-perturbation of the network in order to force it to converge to a desired steady-state distribution of gene regulation. We cast optimal intervention in gene regulation as a convex optimization problem, thus providing a globally optimal solution which can be efficiently computed using standard techniques for convex optimization. The criteria adopted for optimality is chosen to minimize potential adverse effects as a consequence of the intervention strategy. We consider a perturbation that minimizes (i) the overall energy of change between the original and controlled networks and (ii) the time needed to reach the desired steady-state of gene regulation. Moreover, we show that there is an inherent tradeoff between minimizing the energy of the perturbation and the convergence rate to the desired distribution. We further show that the optimal inverse perturbation control is robust to estimation errors in the original network. The proposed control is applied to the Human melanoma gene regulatory network. | |
| Nidhal Bouaynaya (University of Arkansas), Dan Schonfeld (University of Illinois) | Poster - Inverse Perturbation for Optimal Intervention in Genetic Regulatory Networks |
| Abstract: We formulate the optimal intervention problem in genetic regulatory networks as a minimal-perturbation of the network in order to force it to converge to a desired steady-state distribution of gene regulation. We cast optimal intervention in gene regulation as a convex optimization problem, thus providing a globally optimal solution which can be efficiently computed using standard techniques for convex optimization. The criteria adopted for optimality is chosen to minimize potential adverse effects as a consequence of the intervention strategy. We consider a perturbation that minimizes (i) the overall energy of change between the original and controlled networks and (ii) the time needed to reach the desired steady-state of gene regulation. Moreover, we show that there is an inherent tradeoff between minimizing the energy of the perturbation and the convergence rate to the desired distribution. We further show that the optimal inverse perturbation control is robust to estimation errors in the original network. The proposed control is applied to the Human melanoma gene regulatory network. | |
| Yi-Ping Phoebe Chen (La Trobe University) | Pattern Discovery in Biomedical Modelling |
| Abstract: Solving modern biomedical problems, especially, those involving genome data, requires advanced computational and analytical methods. The huge quantities of data and escalating demands of modern biomedical research increasingly require the sophistication and power of computational techniques for their pattern discovery. Key techniques include relational data management, pattern recognition, data mining, modelling and visualization of biomedical data. In this talk, I will demonstrate recent methodologies and data structures for gathering high-quality approximations and modelling of genomic information, and will use these innovations as the basis for developing methods to cluster and visualize biomedical data in pattern discovery. | |
| Lori Dalton (Texas A & M University) | Poster - Exact Sample Conditioned Performance of Estimators for Classification Error Under Bayesian Contexts |
| Abstract: In recent years, biomedicine has been faced with difficult high-throughput small-sample classification problems, which are typically validated with re-sampling error estimation methods such as cross-validation. While heuristically designed error estimation techniques may be acceptable in problems where large amounts of data are available, the small-sample setting is different because asymptotic results are not meaningful and validation becomes a critical issue. A recently proposed classifier error estimator places the problem in a signal estimation framework in the presence of uncertainty, thereby permitting a rigorous optimal solution in a minimum-mean-square error (MMSE) sense. The uncertainty in this model is relative to the parameters of the feature-label distributions, resulting in a Bayesian approach to error estimation. The same Bayesian framework also produces the theoretical MSE for both Bayesian error estimators and arbitrary error estimators, where uncertainty is again relative to the unknown model parameters and conditioned on the observed sample. Thus, the Bayesian error estimator has a unique advantage over classical error estimators in that its mathematical framework naturally gives rise to a practical expected measure of performance given a fixed sample. | |
| Nevenka Dimitrova (Philips Research Laboratory) | Towards Closing the Semantic Gap in Decision Support for Clinical Sequencing in Oncology |
| Abstract: Within only a decade since the first draft of the human genome, we’ve witness astonishing pace of development of technologies for high throughput molecular profiling that probe various aspects of genome biology and its relationship to tumorigenesis and cancer treatment. There have been giant steps towards cataloging massive amounts of data and providing fairly good annotation information. However, computational methods that tried to tease out the relationship between the genotype and its functional readout - in normal and cancer states – revealed a semantic gap that is yet to be bridged. Narrowing this gap is essential in order to develop meaningful clinical decision support technologies. In addition to imaging modalities which give the gross tissue level properties, crucial decisions in the context of oncology therapy selection require molecular level information that are increasingly captured by the emerging sequencing modalities. We undertook part in multiple studies aiming to understand the tumor heterogeneity and response to chemotherapy. Our efforts span several complementary modalities. In this talk I will provide several examples from our recent high throughput genomic studies: 1. DNA Sequencing: Assembly and downstream analysis of genomic data from normal individuals to understand and establish variation within normal individuals at the single nucleotide and structural level as well as the functional impact of these variations. 2. RNA Sequencing, CNV and DNA methylation: analysis in the context of chemo- and biological therapy response in breast and ovarian cancer. 3. Integration into a computational framework that combines genome-wide DNA methylation, gene expression and copy number variation data in a comprehensive fashion with the aim of finding mechanistic associations as well as signatures indicative of therapy resistance. Our goal is to include these modalities in a Comprehensive Clinical Decision Support system where we need to integrate sequencing with imaging, pathology and other clinical data. |
|
| Peter C Doerschuk (Cornell University) | 3-D reconstructions of biological macromolecular complexes by electron microscopy |
| Abstract: ingle-particle cryo electron microscopy provides images of biological macromolecular complexes with spatial sampling on the order of 1-2 Angstrom. Combining on the order of 100,000 such images can result in 3-D reconstructions of the electron scattering intensity of the complex with a spatial resolution as fine as 4-5 Angstrom. Due to damage in the imaging process, each complex is imaged only once and therefore having a homogeneous ensemble of complexes is important. Algorithms and results will be presented for the case where the complexes are not homogeneous and the reconstruction yields a statistical description of the electron scattering intensity rather than a single unique intensity. Related work on computed electron tomography, where the electron scattering intensity of individual complexes are determined but at lower resolution will also be presented. | |
| Edward R. Dougherty (Texas A & M University) | How is Biology as a Science Possible? |
| Abstract: A perusal of the contemporary biological literature involving high-throughput data sets reveals the generation of a vast amount of data and an enormous number of models (classifiers, clusters, networks) derived from this data via a plethora of algorithms. There tends to be four interrelated characteristics common to these publications: (1) no experimental deign, (2) data sets where the number of measured variables greatly exceeds the number of replications, (3) algorithms whose performance is unknown for the populations to which they are applied – and often known to work poorly when applied to a small number of replicates, and (4) models that are epistemologically meaningless because they have not been validated. Hence, we find ourselves in a position somewhat akin to that confronted by Immanuel Kant in the Eighteenth Century when he famously asked, “How is metaphysics as a science possible?” Certainly there was a lot of “metaphysical” talk in the air, but to what sureties had it led? To address the problem, Kant had to tackle the meaning of science and then appreciate what constraints had to be placed on metaphysical statements to make them “scientific.” Fortunately for us, we do not have to take on the monumental task of characterizing scientific knowledge, an endeavor that stretched from Galileo to Einstein. But we do have to consider what constraints must be placed on biological statements to make them meaningful, that is, so that they constitute biological scientific knowledge. Moreover, we need to address a critical methodological scientific issue addressed by Kant: What differentiates productive observation of Nature from “groping in the dark,” to use his phrase? | |
| Alexandre Dufour (Institut Pasteur) | Poster - 3D Active Meshes: a versatile mathematical tool to study cell shape and motility in live microscopy |
| Abstract: Dynamic processes such as cell motility and deformation are key components of numerous scenarios including cell division & differentiation, morphogenesis, immune response strategies, but also parasite invasion, cancer development & proliferation and host-pathogen interactions. Continuous advances in microscopy imaging techniques have allowed scientists to shed light on many of these processes over extended periods of time, yielding huge amounts of time-lapse imaging data in multiple colors and various experimental conditions. In such context, visual interpretation and manual analysis have proved to be limited by the lack of reproducibility, user bias and fatigue. Scientists thus progressively turn to automatic quantification methods able to process spatiotemporal data in a robust and systematic manner. In this work we present a novel framework for automatic cell segmentation and tracking based on the theory of deformable models, and show how such a versatile mathematical tools can be used to extract various information related to cellular motility, shape analysis and morpho-dynamic studies. | |
| Alfred O. Hero III (University of Michigan) | Correlation-based variable selection for differential gene expression analysis |
| Abstract: The problem of variable selection is useful for identifying the principal drivers of differential gene-response under one or more treatments, phenotypes, or conditions. Once identified, such drivers can be targeted as potential knockouts or enhancers in drug discovery or diagnostic testing. In high throughput data such as gene or protein expression the large number of variables has made it impractical to implement all but the simplest univariate methods for variable selection, e.g., detecting significant shifts in t-test or Wilcoxon test statistics. We propose an alternative approach based on detecting significant shifts in patterns of connectivity of genes in a correlation graph or concentration graph. Remarkably, it is precisely when the sample size is small that the approach is scalable, e.g., to whole genome analysis. Furthermore a statistical performance analysis establishes phase transition behaviors and tight approximations to false discovery rate that can be used for error control. We will illustrate the approach on several gene expression datasets. | |
| Alfred O. Hero III (University of Michigan) | Poster - Misaligned Principal Component Analysis |
| Abstract: Principal component analysis (PCA) is a widely applied method for extracting structure from samples of high dimensional biological data.
Often there exist misalignments between different samples and this can cause severe problems in PCA if not properly taken into account. For example, subject-dependent temporal differences in gene expression response to a treatment will create relative time shifts in the samples that decohere the PCA analysis. The sensitivity of PCA to such misalignments is severe, leading to phase transitions that can be studied using the spectral the theory of high dimensional matrices. With this as motivation, we propose a new method of PCA, called misPCA, that explicitly accounts for the effects of misalignments in the samples. We illustrate misPCA on clustering longitudinal temporal gene expression data. With Arnau Tibau-Puig, Ami Wiesel, and Raj Rao Nadakuditi |
|
| Zhonghua Jiang (University of Minnesota), George Karypis (University of Minnesota) | Poster - Automatic Detection Of Vaccine Adverse Reactions By Incorporating Historical Medical Conditions |
| Abstract: This paper extends the problem of vaccine adverse reaction detection by incorporating historical medical conditions. We propose a novel measure called dual-lift for this task, and formulate this problem in the framework of constraint pattern mining. We present a pattern mining algorithm DLiftMiner which utilizes a novel approach to upper bound the dual-lift measure for reducing the search space. Experimental results on both synthetic and real world datasets show that our method is effective and promising. | |
| W. Clem Karl (Boston University) | Impact of Sensing Structure in Classification of High-Dimensional Medical Informatics Data |
| Abstract: There has been an explosion of non-invasive biomedical sensing modalities that have revolutionized our ability to probe the biomedical world. Often decisions have to be made on the basis of these increasingly high-dimensional observations. An example would be the determination of cancer or stroke from indirect tomographic projection measurements. The problem is frequently exacerbated by the lack of labeled training samples from which to learn class models. In many cases, however, there exists a latent low-dimensional sensing structure that can potentially be exploited for inferencing aims. This work investigates the impact of latent sensing structure on supervised classification performance when the data dimension scales to infinity faster than the number of samples. In contrast to some existing studies, here the classification difficulty is held fixed and finite as the data dimension scales. For a binary supervised classification problem with Gaussian likelihood functions, it is shown that the asymptotic error probability converges to that of pure guessing if the sensing structure is totally ignored, whereas it converges to the Bayes risk if the sensing structure is sufficiently regular and the classification method is "sensing aware". It is also shown, however, that without suitable regularity in the latent low-dimensional sensing structure, it is impossible to attain nontrivial asymptotic error probability. These findings are validated through various simulations. Additional numerical results for support vector machines and sensitivity to mismatch between true and assumed structure are also provided. | |
| Michael Liebling (University of California, Santa Barbara) | Imaging development of the embryonic heart over multiple spatial dimensions, modalities and time-scales |
| Abstract: Recent breakthroughs in optical microscopy have enabled in vivo imaging of the embryonic heart as it develops and gains function. Despite these advances, it remains difficult to simultaneously characterize heart morphology, heart function (the embryonic heart is beating before it is fully developed), and gene expression levels. We have developed computational tools to capture, process, and combine images acquired with different microscopy modalities, at different temporal and spatial scales, and over multiple samples, in an effort to build a multi-dimensional model of the beating and developing heart where morphology, function, and genetics can be simultaneously studied. Here, I will discuss image acquisition protocols and reconstruction strategies to overcome instrumentation and biological limitations that prevent simultaneous acquisition of these large, high-dimensional data sets. These tools will facilitate quantitative and systematic characterization of both morphology and function and study their relationship to genetic and epi-genetic factors that affect development in normal and diseased hearts. | |
| Gabriela Martínez (University of Minnesota) | Regularization Methods for Probabilistic Optimization |
| Abstract: We analyze nonlinear stochastic optimization problems with joint probabilistic constraints using the concept of a $p$-efficient point of a probability distribution. If the problem is described by convex functions, we develop two algorithms based on first order optimality conditions and a dual approach to the problem. The algorithms yield an optimal solution for problems involving $alpha$-concave probability distributions. For arbitrary distributions, the algorithms provide upper and lower bounds for the optimal value and nearly optimal solutions. When the problem is described by continuously differentiable non-convex functions, we describe the tangent and the normal cone to the level set of the underlying probability function. Furthermore, we formulate first order and second order conditions of optimality based on the notion of $p$-efficient points. For the case of discrete distribution functions, we developed an augmented Lagrangian method based on progressive inner approximation of the level set of the probability function by generation of $p$-efficient points. Numerical experience is provided. | |
| Robert Nowak (University of Wisconsin-Madison) | Large-Scale Multiple Testing in Medical Informatics |
| Abstract: In this talk I will discuss the novel experimental designs for
large-scale multiple hypothesis testing problems. Testing to determine
which genes are differentially expressed in a certain disease is a
classic instance of multiple testing in medical informatics. Tremendous
progress has been made in high-dimensional inference and testing
problems by exploiting intrinsic low-dimensional structure. Sparsity is
perhaps the simplest model for low-dimensional structure. It is based on
the assumption that the signal of interest can be represented as a
combination of a small number of elementary components. Sparse recovery
is the problem of determining which components are needed in the
representation based on measurements of the signal. For example,
diseases are often characterized by a relatively small number of genes,
which can be identified using high-throughput experimental techniques.
This talk focuses on two issues related to this line of research. 1. Most theory and methods for sparse recovery are based on non-adaptive measurements. I will discuss the advantages of sequential measurement schemes that adaptively focus sensing using information gathered throughout the measurement process. In particular, I will show that sequential testing procedures can be significantly more powerful than non-sequential methods in the high-dimensional setting. 2. The standard sparse recovery problem involves inferring sparse linear functions. I will discuss generalizations of the standard problem to the recovery of sparse multilinear functions. Such functions are characterized by multiplicative interactions between the input variables, with sparsity meaning that relatively few of all conceivable interactions are present. This problem is motivated by the study of interactions between processes in complex networked systems (e.g., among genes and proteins in living cells). Our results extend the notion of compressed sensing from the linear sparsity model to nonlinear forms of sparsity encountered in complex systems. In contrast to linear sparsity models, in the multilinear case the pattern of sparsity can significantly affect sensing requirements. |
|
| Jean-Christophe Olivo-Marin (Institut Pasteur) | Tutorial - Quantitative biological imaging: from cells to numbers |
| Abstract: The lecture will present biological imaging topics ranging from fundamentals in microscopy to specific methods and algorithms for the processing and quantification of 2- and 3-D+t images sequences in biological microscopy. We will demonstrate algorithms of PSF approximations for image deconvolution, image segmentation, multi-particle tracking and active contours models for cell shape and deformation analysis. We will illustrate the application of our methods in projects related to the study of the dynamics of genes in cell nuclei, the movement of parasites in cells and the detection and tracking of microbes in cells. One specific goal in biological imaging is indeed to automate the quantification of dynamics parameters or the characterization of phenotypic and morphological changes occurring as a consequence of cell/cell or pathogens/host cells interactions. The availability of this information and its thorough analysis is indeed of key importance to help deciphering underlying molecular mechanisms of e.g. infectious diseases. | |
| Jens Rittscher (General Electric) | Using Algorithms to Produce High Content Information from Cell and Tissue Images |
| Abstract: While the chemical structure of DNA is well understood, determining how genome-encoded components function in an integrated manner to perform cellular and organismal function is still an open challenge. The talk will motivate that imaging, more specifically the extraction of quantitative information, plays a critical role in this process. Such measurements will enable the automatic monitoring of cellular and intracellular events, and providing information about specific molecular mechanisms in individual cells. By providing some specific examples it will be illustrated how specific computer vision algorithms enable the analysis of data sets and complex biological specimens that cannot be analyzed through manual inspection. The talk will highlight specific examples on how image analysis algorithms can be used to extract high content data. Specifically I will show how image segmentation methods are used to extract protein expression information in a novel sequential multiplexing process GE developed. In addition it will be discussed how statistical shape analysis methods can be applied to assess cellular morphology as well as the structure of entire organisms. Finally, it will be shown how the analysis of apparent motion can be used to monitor cardiomyocyte populations. While imaging data potentially has much to add to models for systems biology, the usefulness of imaging information is dependent on the quantitative nature of the data and other aspects of its quality. Developing an awareness of the important long-term factors and challenge will help ensure acceptance of image analysis methods. Today image analysis methods are already used to study complex biological processes. |
|
| Badri Roysam (University of Houston) | Scalable methods for analyzing 3D/4D/5D Images of Complex and Dynamic Biological Microenvironments |
| Abstract: Modern optical microscopy has grown into a multi-dimensional imaging
tool. It is now possible to record dynamic processes in living specimens in
their spatial context and temporal order, yielding information-rich 5-D
images
(3-D space, time, spectra).Of particular interest are complex and dynamic
tissuemicroenvironments that play critical roles in health and disease,
e.g., tumors,
stem-cell niches, brain tissue surrounding neuroprosthetic devices, retinal
tissue, cancer stem-cell niches, glands, and immune system tissues. The task of analyzing these images exceeds human ability due to the sheer volume of the data (images routinely exceed 20GB in size), its structural complexity, and the dynamic behaviors of cells and organelles. First, there is a need for automated systems to assist the human analyst to map the tissue anatomy, quantify structural associations, identify critical events, map event locations and timing to the tissue anatomic context, identify and quantify spatial and temporal dependencies, produce meaningful summaries of multivariate measurement data, and compare perturbed and normal datasets for testing hypotheses, exploration, and systems modeling. Beyond automation, there is a need for ³computational sensing² of tissue patterns and cell behaviors that are too subtle for the human visual system to detect. In this talk, I will describe large-scale application of image processing, active machine learning, multivariate clustering, and parallel computation methods that enable scalable analysis of multi-dimensional microscopy data. A particularly valuable application of these methods is to validate the large-scale automated analysis results. All of the software from this work is free and open source (www.farsight-toolkit,org). |
|
| Guillermo R. Sapiro (University of Minnesota) | Tutorial - Translational medical imaging |
| Abstract: In this talk I will describe some of our efforts in the area of translational medical imaging, and illustrate how mathematics and formalism play a fundamental role. I will start with our work on brain imaging, where we have developed entire analysis pipelines, going from fixing basic mathematical errors in the classical formulas of high resolution diffusion imaging (HARDI), all the way to studying gender and kinship in brain connectivity networks and to helping neuro-surgeons in deep brain stimulation procedures. I will then present some of our work on the analysis of the structure of HIV and other viruses with data obtained from cryo-tomography, a critical step in vaccine development. Additional applications for helping surgeons in the operating room will be mentioned as well. | |
| Allen Tannenbaum (Boston University) | Interactive Segmentation of 3D Imagery |
| Abstract: In this talk, we will describe a new interactive procedure for segmenting 3D data sets using a mixture of ideas from control and image processing. More precisely, using a Lyapunov control design, a balance is established between the influence of a data-driven gradient flow and the human’s input over time. Automatic segmentation is thus smoothly coupled with interactivity. An application of the mathematical methods to orthopedic segmentation is shown, demonstrating the expected transient and steady state behavior of the implicit segmentation function and auxiliary observer. | |
| Jean-Baptiste Thibault (GE Healthcare) | Modeling and Acceleration of Maximum A Posteriori Reconstruction from Large CT Datasets |
| Abstract: Recent increases in detector coverage and trigger frequencies have opened up new clinical applications in modern Computed Tomography, but have also led to an explosion in the volume of CT raw datasets. This represents a particular challenge for accurate tomographic image reconstruction, particularly when using a model-based iterative framework based on Maximum A Posteriori estimation. Inclusion of detector physics, tube response, noise statistics, and image modeling involves certain complexity that drives up reconstruction time. However, recent results have started to demonstrate the significant potential of model-based iterative reconstruction for ultra-low-dose imaging aimed at improving patient safety, as well as high quality results in other targeted applications such as low contrast complex abdomen imaging and high-resolution medullar and cortical bone. This poses a particular challenge to come up with fast convergent algorithms that do not trade-off significant quality for speed, and are amenable to modern parallel computing hardware. This talk will present the modeling framework for high quality model-based tomographic reconstruction and its advantages relative to alternative iterative approaches designed primarily with concern about reconstruction speed. In the proposed approach, speed and quality can be thought of as relatively orthogonal design elements, to the extent that convergence is reasonably achieved. First, the formulation of the optimization problem fully defines the target quality level as a function of the number and accuracy of the models designed to explicitly explain x-ray attenuation measurements based on realistic modeling of scanner behavior and non-idealities. Second, a globally convergent optimization algorithm chosen among a variety of potential alternatives is optimized to realize the performance targets for fast convergence, efficient implementation, and massive parallelization for practical applications. The development and continuous amelioration of such tools and models for tomographic reconstruction promise the establishment of a new platform for iterative reconstruction in modern CT that may someday replace standard analytical methods for routine high-quality low-dose imaging. |
|
| Rebecca Willett (Duke University) | Multiscale Set Estimation in Biomedical Inverse Problems |
| Abstract: Sparse decomposition methods are effective tools in a myriad biomedical inverse problems. However, in many settings reconstruction is only an intermediate goal preceding additional quantitative analysis. For instance, we may wish to classify tissue types in microscope images or identify tumors or lesions based on computed tomography data. This talk describes how sparse image decomposition methods can be used in conjunction with multiscale set estimation methods to improve subsequent quantitative analyses on large medical datasets. For instance, sparse decomposition for tissue differentiation breaks down in images with boundaries, but multiscale set estimation can be used to accurately identify regions where sparse decomposition can be effectively applied. Similarly, sparse image reconstruction methods alone can spend significant computing resources on estimating features irrelevant to the quantitative goals, but by incorporating multiscale set estimation metrics into the objective function we can perform accurate quantitative analysis much more quickly. This talk will cover both the theoretical underpinnings of these methods and their application to challenging large-scale problems in microendoscopy and tomography. | |
| Fuli Yu (Baylor College of Medicine) | Tutorial - Massive scale of DNA sequencing data presents challenges in processing and analysis |
| Abstract: Recent advancements in DNA sequencing technologies have led to wide dissemination of instrumentation, resulting data and excitement. As a result of declining costs and increasing in throughput, there is a rapid growth trajectory in the amount of sequence data production. It is predicted that DNA sequence data will soon become one of the largest data types requiring powerful infrastructure development and deployment in both software and hardware in order to enable routine and robust handling and analysis. This tutorial will guide participants through multiple topics regarding the next generation sequencing (NGS) data production and processing. Emphasis will be placed on both didactic presentation and group discussion in the following areas: (1) What is happening; (2) The excitement; (3) Best practice-lessons from the 1000 Genomes Project; (4) Remaining bottlenecks in data handling; and (5) A view toward the future. The HGSC has been pioneering the deployment of multiple NGS platforms (Roche 454, Illumina, SOLiD, PacBio, Ion Torrent), and spearheaded personal genomics (Waston Genome, Lupski Genome, and Beery Family), population genomics (1000 Genomes), cohort disease mapping (ARIC Studies), and Cancer Studies (TCGA, familial cancer). A great deal of experience in processing and handling NGS data and variant calling have been accumulated, which form a solid foundation to meet future challenges. My group has been a major part of the 1000 Genomes Project for variant calling, imputation and integration for both low-coverage (~4X/genome) and exome data. We developed integrative variant analysis pipelines-Atlas2 and SNPTools (http://www.hgsc.bcm.tmc.edu/cascade-tech-software-ti.hgsc), which achieved high quality SNP and INDEL datasets in the 1000 Genomes Phase I project. I will share this experience as one example. |
|
| Teng Zhang (University of Minnesota) | A novel M-estimator for robust PCA |
| Abstract: We formulate a convex minimization to robustly recover a subspace from a contaminated data set, partially sampled around it, and propose a fast iterative algorithm to achieve the corresponding minimum. We establish exact recovery by this minimizer, quantify the effect of noise and regularization, explain how to take advantage of a known intrinsic dimension and establish linear convergence of the iterative algorithm. We compare our method with many other algorithms for Robust PCA on synthetic and real data sets and demonstrate state-of-the-art speed and accuracy. | |
| Osama Y Abuomar | Mississippi State University | 11/13/2011 - 11/19/2011 |
| Shipra Agrawal | BioCOS Life Sciences Private Limited | 11/13/2011 - 11/20/2011 |
| Brendan P.W. Ames | University of Minnesota | 8/31/2011 - 8/30/2012 |
| Elsa Angelini | Telecom ParisTech | 11/14/2011 - 11/18/2011 |
| James Ashe | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Bubacarr Bah | University of Edinburgh | 9/15/2011 - 12/15/2011 |
| Arindam Banerjee | University of Minnesota | 9/1/2011 - 6/30/2012 |
| Andrew John Beveridge | Macalester College | 9/1/2011 - 5/15/2012 |
| Peter Beyerlein | Technische Hochschule Wildau (FH) | 11/13/2011 - 11/19/2011 |
| Sergey G Bobkov | University of Minnesota | 9/1/2011 - 6/30/2012 |
| Nidhal Bouaynaya | University of Arkansas | 11/13/2011 - 11/18/2011 |
| Charles A. Bouman | Purdue University | 11/13/2011 - 11/18/2011 |
| Luca Capogna | University of Minnesota | 8/15/2011 - 6/10/2012 |
| Aycil Cesmelioglu | University of Minnesota | 9/30/2010 - 8/30/2012 |
| Yi-Ping Phoebe Chen | La Trobe University | 11/14/2011 - 11/19/2011 |
| Paolo Codenotti | University of Minnesota | 9/1/2011 - 8/30/2012 |
| Jintao Cui | University of Minnesota | 8/31/2010 - 8/30/2012 |
| Lori Dalton | Texas A & M University | 11/13/2011 - 11/18/2011 |
| Isabel K. Darcy | University of Iowa | 9/1/2011 - 6/30/2012 |
| Nevenka Dimitrova | Philips Research Laboratory | 11/14/2011 - 11/18/2011 |
| Peter C Doerschuk | Cornell University | 11/13/2011 - 11/18/2011 |
| Edward R. Dougherty | Texas A & M University | 11/13/2011 - 11/16/2011 |
| Alexandre Dufour | Institut Pasteur | 11/13/2011 - 11/18/2011 |
| Dainius Dzindzalieta | Vilnius State University | 9/1/2011 - 12/31/2011 |
| Leonardo Espin | NONE | 9/1/2011 - 6/30/2012 |
| Arie Feuer | Technion-Israel Institute of Technology | 11/13/2011 - 11/18/2011 |
| Qiang Fu | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Zoltan Furedi | Hungarian Academy of Sciences (MTA) | 9/22/2011 - 11/21/2011 |
| Carlos Andres Garavito-Garzon | University of Minnesota | 9/8/2011 - 6/30/2012 |
| Wuming Gong | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Marshall Hampton | University of Minnesota | 11/13/2011 - 11/15/2011 |
| Alfred O. Hero III | University of Michigan | 11/13/2011 - 11/16/2011 |
| Yulia Hristova | University of Minnesota | 9/1/2010 - 8/31/2012 |
| Xiaoping Philip Hu | Emory University | 11/14/2011 - 11/16/2011 |
| Zhonghua Jiang | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Chiu-Yen Kao | Ohio State University | 11/14/2011 - 11/17/2011 |
| W. Clem Karl | Boston University | 11/13/2011 - 11/18/2011 |
| George Karypis | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Rui Kuang | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Gilad Lerman | University of Minnesota | 9/1/2011 - 6/30/2012 |
| Wenbo Li | University of Delaware | 9/1/2011 - 5/30/2012 |
| Michael Liebling | University of California, Santa Barbara | 11/13/2011 - 11/17/2011 |
| Xin Liu | University of Minnesota | 8/31/2011 - 8/30/2012 |
| Shiqian Ma | University of Minnesota | 8/31/2011 - 8/30/2013 |
| Rakesh Malladi | Rice University | 11/13/2011 - 11/19/2011 |
| Yi Mao | University of Washington | 11/13/2011 - 11/18/2011 |
| Yu (David) Mao | University of Minnesota | 8/31/2010 - 8/30/2012 |
| Gabriela Martínez | University of Minnesota | 8/31/2011 - 8/30/2013 |
| Saurabh Mishra | Eagan High School | 8/22/2011 - 12/31/2011 |
| Dimitrios Mitsotakis | University of Minnesota | 10/27/2010 - 8/31/2012 |
| Prateek Mittal | University of Illinois at Urbana-Champaign | 11/13/2011 - 11/19/2011 |
| Chad Myers | University of Minnesota | 11/14/2011 - 11/18/2011 |
| Linda A. Ness | Telcordia | 11/10/2011 - 11/14/2011 |
| Robert Nowak | University of Wisconsin-Madison | 11/16/2011 - 11/17/2011 |
| Jean-Christophe Olivo-Marin | Institut Pasteur | 11/13/2011 - 11/19/2011 |
| Luke Olson | University of Illinois at Urbana-Champaign | 9/1/2011 - 12/31/2011 |
| Broderick O. Oluyede | Georgia Southern University | 11/13/2011 - 11/18/2011 |
| Mary Therese Padberg | University of Iowa | 8/16/2011 - 6/1/2012 |
| Candice Renee Price | University of Iowa | 8/1/2011 - 7/31/2012 |
| Weifeng (Frederick) Qiu | University of Minnesota | 8/31/2010 - 8/30/2012 |
| Jens Rittscher | General Electric | 11/13/2011 - 11/18/2011 |
| Badri Roysam | University of Houston | 11/13/2011 - 11/17/2011 |
| Guillermo R. Sapiro | University of Minnesota | 9/1/2011 - 5/31/2012 |
| Dan Schonfeld | University of Illinois | 11/13/2011 - 11/18/2011 |
| Arthur Szlam | University of Minnesota | 8/31/2011 - 8/30/2012 |
| Allen Tannenbaum | Boston University | 11/13/2011 - 11/18/2011 |
| Jared Tanner | University of Edinburgh | 9/20/2011 - 12/15/2011 |
| Ahmed H. Tewfik | University of Texas at Austin | 11/13/2011 - 11/19/2011 |
| Jean-Baptiste Thibault | GE Healthcare | 11/13/2011 - 11/18/2011 |
| Kursad Tosun | Southern Illinois University | 11/13/2011 - 11/18/2011 |
| Divyanshu Vats | University of Minnesota | 8/31/2011 - 8/30/2012 |
| Lan Wang | University of Minnesota | 9/1/2011 - 5/12/2012 |
| Rachel Ward | University of Texas at Austin | 11/13/2011 - 11/26/2011 |
| Rachel Ward | University of Texas at Austin | 10/23/2011 - 11/5/2011 |
| Ke Wei | University of Edinburgh | 10/10/2011 - 12/10/2011 |
| Elisabeth Werner | Case Western Reserve University | 9/1/2011 - 12/20/2011 |
| Avi Wigderson | Institute for Advanced Study | 11/2/2011 - 11/4/2011 |
| Rebecca Willett | Duke University | 11/15/2011 - 11/17/2011 |
| Steve Wright | University of Wisconsin-Madison | 11/27/2011 - 12/3/2011 |
| Lingzhou Xue | University of Minnesota | 9/1/2011 - 6/30/2012 |
| Byung-Jun Yoon | Texas A & M University | 11/13/2011 - 11/18/2011 |
| Fuli Yu | Baylor College of Medicine | 11/15/2011 - 11/18/2011 |
| Ofer Zeitouni | University of Minnesota | 9/1/2011 - 12/9/2011 |
| Teng Zhang | University of Minnesota | 8/31/2011 - 8/30/2012 |