HOME    »    SCIENTIFIC RESOURCES    »    Volumes
Abstracts and Talk Materials
Large Data Sets in Medical Informatics
November 14-18, 2011


Shipra Agrawal (BioCOS Life Sciences Private Limited)
http://www.biocosls.com/

Estimation of Individual’s Risk for Complex Trait Diseases: Methods and Challenges using Allelic Specific Expression and Mapping Cis-variance from NGS Data (RNA and Exome Sequencing Data)
November 17, 2011

Keywords of the presentation: Allele specific expression, RNA-Seq, Exome-Seq, Genotypes, Susceptibility alleles

In current genetic and clinical research, identification of disease specific variations particularly from non-coding RNA and cis-elements is a major bottleneck. Massively parallel sequencing of exome and transcriptome is widely being used to effectively interrogate the key protein-coding and non-coding RNA regions. In such scenarios, the deep sequencing data of exome and transcriptome could be used for estimating levels of allele-specific expression in diseased vs. control samples (case-control cohorts) and hence the identification of disease specific signatures. This provides a functional basis to identify the differentially expressed alleles, mono-allelic expression, imprinting of alleles and allele regulated alternative splicing. All such data and approaches together make a stronger strategy to predict the disease susceptibility alleles and their functional role in disease mechanism. Our approaches at BioCOS Life Sciences using the Next Generation Sequencing (NGS) data analysis for the precise detection of allele’s differential expression becomes important in identifying causal/susceptibility genes by mapping their variance in both coding and non-coding DNA/RNA regions.

I will present our current research work on developing methods and data processing approaches, which can be applied in identification of the susceptibility alleles using the combined approaches from RNA-Seq and Exome-Seq data as well as directly predicting them from RNA-Seq data. The talk will also discuss the existing bottlenecks in the area and approaches to obtain high quality results with a focus on calling genotypes from RNA-Seq data.

Read More...

Elsa Angelini (Telecom ParisTech)
http://perso.telecom-paristech.fr/~angelini/

Designing fast and robust algorithms for medical image processing
November 16, 2011

Quantification from medical images involves three levels of developments: - Modeling of the organs - Extraction of the visual features - Formulation of the quantification task Regarding organ modeling, geometric encoding of the shape is designed as a tradeoff between flexibility and robustness. Encoding of the variability within a population is a complex task that can have drawbacks when handling pathological cases. On the other hand, generic anatomical knowledge, especially regarding the context of the organ, can provide rich and more robust information, with spatial relations for example. Regarding the visual features, images are richer than they appear in terms of tissue signature, embedding multiscale information. The field of image processing has evolved slowly in the design of sophisticated organ-specific visual features, the majority of them remaining very basic. Future challenges remain open regarding the need to correlate multi-modal tissue signatures with physiological characteristics. Formulation of the quantification task such as segmentation, tracking or detection of longitudinal changes can be formulated either with a deterministic or stochastic formalism. Algorithms remain poorly robust to image quality, lack of image calibration, parameter tuning and presence of pathologies. Finer interactions between algorithmic tuning and image content and better calibration of image content is currently under investigation to address this lack of robustness and reproducibility.

These three components of the pipeline will be discussed, with illustrations on brain, cardiac liver and obstetric data. Emphasis will be paid to the constraints of being fast and robust, in the context of handling large data sets with great variability and pathologies.

Juan Andres Bazerque (University of Minnesota, Twin Cities)
http://www.tc.umn.edu/~bazer002/

Poster - Gene Network Inference via Sparse Structural Equation Modeling with Genetic Perturbations
December 31, 1969

Deciphering the structure of gene regulatory networks is crucial to understanding the functionality of genes as well as the behavior of cells. To this end, a network topology estimator is developed in this work based on the structural equation model (SEM) approach, which capitalizes on naturally occurring genetic variations viewed as statistical perturbations that enable inference of the causal relationships between genes. The SEM offers a suitable framework for the estimation of cyclic directed networks, but typically requires searching over a huge parameter space, incurring prohibitively high computational complexity. As gene networks are sparse, meaning that the number of edges is relatively small when compared to the number of all possible edges, the present work contributes a SEM-based sparsity-aware inference methodology. Simulated tests demonstrate that the novel method can markedly improve inference accuracy.

A joint work with G. B. Giannakis

Nidhal Bouaynaya (University of Arkansas)
http://syen.ualr.edu/nxbouaynaya/
Dan Schonfeld (University of Illinois, Chicago)
http://www.ece.uic.edu/~ds

Poster - Inverse Perturbation for Optimal Intervention in Genetic Regulatory Networks
December 31, 1969

We formulate the optimal intervention problem in genetic regulatory networks as a minimal-perturbation of the network in order to force it to converge to a desired steady-state distribution of gene regulation. We cast optimal intervention in gene regulation as a convex optimization problem, thus providing a globally optimal solution which can be efficiently computed using standard techniques for convex optimization. The criteria adopted for optimality is chosen to minimize potential adverse effects as a consequence of the intervention strategy. We consider a perturbation that minimizes (i) the overall energy of change between the original and controlled networks and (ii) the time needed to reach the desired steady-state of gene regulation. Moreover, we show that there is an inherent tradeoff between minimizing the energy of the perturbation and the convergence rate to the desired distribution. We further show that the optimal inverse perturbation control is robust to estimation errors in the original network. The proposed control is applied to the Human melanoma gene regulatory network.

Nidhal Bouaynaya (University of Arkansas)
http://syen.ualr.edu/nxbouaynaya/
Dan Schonfeld (University of Illinois, Chicago)
http://www.ece.uic.edu/~ds

Intervention and Control of Large-Scale Gene Regulatory Networks
November 17, 2011

Keywords of the presentation: genetic regulatory networks, control, perturbation.

We formulate the optimal intervention problem in genetic regulatory networks as a minimal-perturbation of the network in order to force it to converge to a desired steady-state distribution of gene regulation. We cast optimal intervention in gene regulation as a convex optimization problem, thus providing a globally optimal solution which can be efficiently computed using standard techniques for convex optimization. The criteria adopted for optimality is chosen to minimize potential adverse effects as a consequence of the intervention strategy. We consider a perturbation that minimizes (i) the overall energy of change between the original and controlled networks and (ii) the time needed to reach the desired steady-state of gene regulation. Moreover, we show that there is an inherent tradeoff between minimizing the energy of the perturbation and the convergence rate to the desired distribution. We further show that the optimal inverse perturbation control is robust to estimation errors in the original network. The proposed control is applied to the Human melanoma gene regulatory network.

Yi-Ping Phoebe Chen (La Trobe University)
http://homepage.cs.latrobe.edu.au/ypchen/index.htm

Pattern Discovery in Biomedical Modelling
November 14, 2011

Keywords of the presentation: Bioinformatics, Biomedical Modelling, Genomic Analysis

Solving modern biomedical problems, especially, those involving genome data, requires advanced computational and analytical methods. The huge quantities of data and escalating demands of modern biomedical research increasingly require the sophistication and power of computational techniques for their pattern discovery. Key techniques include relational data management, pattern recognition, data mining, modelling and visualization of biomedical data. In this talk, I will demonstrate recent methodologies and data structures for gathering high-quality approximations and modelling of genomic information, and will use these innovations as the basis for developing methods to cluster and visualize biomedical data in pattern discovery.

Lori Dalton (Texas A & M University)

Poster - Exact Sample Conditioned Performance of Estimators for Classification Error Under Bayesian Contexts
December 31, 1969

In recent years, biomedicine has been faced with difficult high-throughput small-sample classification problems, which are typically validated with re-sampling error estimation methods such as cross-validation. While heuristically designed error estimation techniques may be acceptable in problems where large amounts of data are available, the small-sample setting is different because asymptotic results are not meaningful and validation becomes a critical issue. A recently proposed classifier error estimator places the problem in a signal estimation framework in the presence of uncertainty, thereby permitting a rigorous optimal solution in a minimum-mean-square error (MMSE) sense. The uncertainty in this model is relative to the parameters of the feature-label distributions, resulting in a Bayesian approach to error estimation. The same Bayesian framework also produces the theoretical MSE for both Bayesian error estimators and arbitrary error estimators, where uncertainty is again relative to the unknown model parameters and conditioned on the observed sample. Thus, the Bayesian error estimator has a unique advantage over classical error estimators in that its mathematical framework naturally gives rise to a practical expected measure of performance given a fixed sample.

Nevenka Dimitrova (Philips Research Laboratory)
http://www.research.philips.com/profile/people/fellows/dimitrova.html

Towards Closing the Semantic Gap in Decision Support for Clinical Sequencing in Oncology
November 14, 2011

Within only a decade since the first draft of the human genome, we’ve witness astonishing pace of development of technologies for high throughput molecular profiling that probe various aspects of genome biology and its relationship to tumorigenesis and cancer treatment. There have been giant steps towards cataloging massive amounts of data and providing fairly good annotation information. However, computational methods that tried to tease out the relationship between the genotype and its functional readout - in normal and cancer states – revealed a semantic gap that is yet to be bridged. Narrowing this gap is essential in order to develop meaningful clinical decision support technologies.

In addition to imaging modalities which give the gross tissue level properties, crucial decisions in the context of oncology therapy selection require molecular level information that are increasingly captured by the emerging sequencing modalities. We undertook part in multiple studies aiming to understand the tumor heterogeneity and response to chemotherapy. Our efforts span several complementary modalities. In this talk I will provide several examples from our recent high throughput genomic studies:

1. DNA Sequencing: Assembly and downstream analysis of genomic data from normal individuals to understand and establish variation within normal individuals at the single nucleotide and structural level as well as the functional impact of these variations.

2. RNA Sequencing, CNV and DNA methylation: analysis in the context of chemo- and biological therapy response in breast and ovarian cancer.

3. Integration into a computational framework that combines genome-wide DNA methylation, gene expression and copy number variation data in a comprehensive fashion with the aim of finding mechanistic associations as well as signatures indicative of therapy resistance.

Our goal is to include these modalities in a Comprehensive Clinical Decision Support system where we need to integrate sequencing with imaging, pathology and other clinical data.

Peter C Doerschuk (Cornell University)
http://www.bme.cornell.edu/people/faculty/profile.cfm?id=3604

3-D reconstructions of biological macromolecular complexes by electron microscopy
November 18, 2011

Keywords of the presentation: tomography, 3-D signal reconstruction, inverse problems, electron microscopy, statistical image processing

Single-particle cryo electron microscopy provides images of biological macromolecular complexes with spatial sampling on the order of 1-2 Angstrom. Combining on the order of 100,000 such images can result in 3-D reconstructions of the electron scattering intensity of the complex with a spatial resolution as fine as 4-5 Angstrom. Due to damage in the imaging process, each complex is imaged only once and therefore having a homogeneous ensemble of complexes is important. Algorithms and results will be presented for the case where the complexes are not homogeneous and the reconstruction yields a statistical description of the electron scattering intensity rather than a single unique intensity. Related work on computed electron tomography, where the electron scattering intensity of individual complexes are determined but at lower resolution will also be presented.

Edward R. Dougherty (Texas A & M University)
http://www.ece.tamu.edu/People/bios/dougherty.htm

How is Biology as a Science Possible?
November 14, 2011

Keywords of the presentation: biological knowledge, epistemology, high-throughput data, scientific models

A perusal of the contemporary biological literature involving high-throughput data sets reveals the generation of a vast amount of data and an enormous number of models (classifiers, clusters, networks) derived from this data via a plethora of algorithms. There tends to be four interrelated characteristics common to these publications: (1) no experimental deign, (2) data sets where the number of measured variables greatly exceeds the number of replications, (3) algorithms whose performance is unknown for the populations to which they are applied – and often known to work poorly when applied to a small number of replicates, and (4) models that are epistemologically meaningless because they have not been validated. Hence, we find ourselves in a position somewhat akin to that confronted by Immanuel Kant in the Eighteenth Century when he famously asked, “How is metaphysics as a science possible?” Certainly there was a lot of “metaphysical” talk in the air, but to what sureties had it led? To address the problem, Kant had to tackle the meaning of science and then appreciate what constraints had to be placed on metaphysical statements to make them “scientific.” Fortunately for us, we do not have to take on the monumental task of characterizing scientific knowledge, an endeavor that stretched from Galileo to Einstein. But we do have to consider what constraints must be placed on biological statements to make them meaningful, that is, so that they constitute biological scientific knowledge. Moreover, we need to address a critical methodological scientific issue addressed by Kant: What differentiates productive observation of Nature from “groping in the dark,” to use his phrase?

Alexandre Dufour (Institut Pasteur)
http://www.bioimageanalysis.org/~dufour

Poster - 3D Active Meshes: a versatile mathematical tool to study cell shape and motility in live microscopy
December 31, 1969

Dynamic processes such as cell motility and deformation are key components of numerous scenarios including cell division & differentiation, morphogenesis, immune response strategies, but also parasite invasion, cancer development & proliferation and host-pathogen interactions. Continuous advances in microscopy imaging techniques have allowed scientists to shed light on many of these processes over extended periods of time, yielding huge amounts of time-lapse imaging data in multiple colors and various experimental conditions. In such context, visual interpretation and manual analysis have proved to be limited by the lack of reproducibility, user bias and fatigue. Scientists thus progressively turn to automatic quantification methods able to process spatiotemporal data in a robust and systematic manner. In this work we present a novel framework for automatic cell segmentation and tracking based on the theory of deformable models, and show how such a versatile mathematical tools can be used to extract various information related to cellular motility, shape analysis and morpho-dynamic studies.

Arie Feuer (Technion-Israel Institute of Technology)
http://webee.technion.ac.il/people/feuer/feuer_hp.html

Poster - Sparse Sampling of Helical Cone Beam CT
December 31, 1969

One method used to generate a 3D image in medical imaging with Computerized Tomography (CT) is the helical cone beam scan. This method, obviously, generates a large amount of data. As in all methods of using X-ray scans, one is motivated to reduce the amount of radiation to which the patient is exposed. Combined with the desire to reduce the amounts of data being generated, one is motivated to develop efficient sampling methods. In the work presented here, the essential frequency support of a scanned body is estimated, bounded and then used to develop a sparse sampling method resulting in minimal loss of quality in the reconstructed image. Our initial results show that we can use this sparse sampling to reduce the data by at least factor of two.coauthored by Tamir Ben-Dory.

Nathanael Fillmore (University of Wisconsin, Madison)

Poster - Progression and gene expression in cervical cancer
December 31, 1969

We present our work toward a statistical model of changes in gene expression through four stages in the development of cervical cancer. These stages are characterized in part by changes in the proportion of cells of particular types. For example, normal tissue is organized in layers with more well-differentiated cells at the surface and with less differentiated, but more actively dividing cells further inside the tissue. Neoplastic lesions shift the balance of types, at least partly by having relatively more of the less differentiated types and having fewer of the well-differentiated types. In our model, we make use of this insight by postulating the existence of several distinct (and unknown) types of cells which are present in all stages of the progression, but whose relative proportions (also unknown) change during the course of the progression. We then study differential expression across the postulated cell types.

Alfred O. Hero III (University of Michigan)
http://www.eecs.umich.edu/~hero/

Correlation-based variable selection for differential gene expression analysis
November 15, 2011

Keywords of the presentation: high dimensional data analysis, screening for high correlations, empirical correlation graphs, differential genomics.

The problem of variable selection is useful for identifying the principal drivers of differential gene-response under one or more treatments, phenotypes, or conditions. Once identified, such drivers can be targeted as potential knockouts or enhancers in drug discovery or diagnostic testing. In high throughput data such as gene or protein expression the large number of variables has made it impractical to implement all but the simplest univariate methods for variable selection, e.g., detecting significant shifts in t-test or Wilcoxon test statistics. We propose an alternative approach based on detecting significant shifts in patterns of connectivity of genes in a correlation graph or concentration graph. Remarkably, it is precisely when the sample size is small that the approach is scalable, e.g., to whole genome analysis. Furthermore a statistical performance analysis establishes phase transition behaviors and tight approximations to false discovery rate that can be used for error control. We will illustrate the approach on several gene expression datasets.

Alfred O. Hero III (University of Michigan)
http://www.eecs.umich.edu/~hero/

Poster - Misaligned Principal Component Analysis
December 31, 1969

Principal component analysis (PCA) is a widely applied method for extracting structure from samples of high dimensional biological data. Often there exist misalignments between different samples and this can cause severe problems in PCA if not properly taken into account. For example, subject-dependent temporal differences in gene expression response to a treatment will create relative time shifts in the samples that decohere the PCA analysis. The sensitivity of PCA to such misalignments is severe, leading to phase transitions that can be studied using the spectral the theory of high dimensional matrices. With this as motivation, we propose a new method of PCA, called misPCA, that explicitly accounts for the effects of misalignments in the samples. We illustrate misPCA on clustering longitudinal temporal gene expression data.

With Arnau Tibau-Puig, Ami Wiesel, and Raj Rao Nadakuditi

Zhonghua Jiang (University of Minnesota, Twin Cities)
George Karypis (University of Minnesota, Twin Cities)

Poster - Automatic Detection Of Vaccine Adverse Reactions By Incorporating Historical Medical Conditions
December 31, 1969

This paper extends the problem of vaccine adverse reaction detection by incorporating historical medical conditions. We propose a novel measure called dual-lift for this task, and formulate this problem in the framework of constraint pattern mining. We present a pattern mining algorithm DLiftMiner which utilizes a novel approach to upper bound the dual-lift measure for reducing the search space. Experimental results on both synthetic and real world datasets show that our method is effective and promising.

Chiu-Yen Kao (Claremont McKenna College)
http://www.cmc.edu/pages/faculty/CKao/

Poster - Semiautomatic Extraction Algorithm for Images of the Ciliary Muscle
December 31, 1969

In this work, we develop and evaluate a semiautomatic algorithm for segmentation and morphological assessment of the dimensions of the ciliary muscle in Visante Anterior Segment Optical Coherence Tomography images. Furthermore, we investigate the morphology of the ciliary muscle during the act of accommodation in a population of children. Increasing accommodative response was correlated with increases in the thickness of CMTMAX (p=<0.001) and CMT1 (p=<0.001), and decreases in the thickness of CMT3 (p=<0.001).

W. Clem Karl Fatal error: Call to a member function getVisitorOrganization() on a non-object in /srv/web/IMA-Propel/build/classes/discovery3/Person.php on line 78