Poster Presentation and Reception
Tuesday, May 8, 2012 - 3:30pm - 4:30pm
- Poster - BIGDATA in Plant Biology, Agriculture and Ecology: NLP From Biomolecular Networks to Sustainability Science
Amir Assadi (University of Wisconsin, Madison)
The literature in biology is vast and rich with valuable empirical, heuristic and theoretical information. Systematic organization and knowledge mining of research articles play a major role in helping with large-scale project formulation, making groundbreaking discoveries and forming novel hypotheses.
The following is a preliminary report on development of “Big Data” analysis tools tailored for systems biology, such as dynamic models of -omic networks, systems-level perturbation of pathways etc. For example, we demonstrate utility of Natural Language Processing (NLP) in discovery of hidden and implicit correlations among pairs of genes in massive gene expression data for diurnal and circadian rhythms in wild type Arabidopsis thaliana (provided by the Chory Lab). A theory of Collective Intelligence in Plant Biology (PhytoCognito) is under development that provides the synthesis of cognitive, computational and informatics that will sustain collaborative, user-centered efforts that aim at continuing the heritage of the past into the major successes of the future and scientific breakthroughs. One of the most important goals of genomic research is to extract functional information from gene expression time series data. Thanks to DNA microarray development, mRNA sampling of thousand genes is now possible by using a single chip. This technology has made it possible to measure gene expression of whole genome over and over again to explore the model response of an organism to a change in condition, e.g., application of some drug or other treatment.
- Poster - Causally Motivated Attribution for Online Advertising
Brian Dalessandro (media6degrees)
In many online advertising campaigns, multiple vendors, publishers or search engines (herein called channels) are contracted to serve advertisements to internet users on behalf of a client seeking specic types of conversion. In such campaigns,individual users are often served advertisements by more than one channel. The process of assigning conversion credit to the various channels is called attribution, and is a subject of intense interest in the industry. This work presents a causally motivated methodology for conversion attribution in online advertising campaigns. We first propose a need for the standardization of attribution measurement and offer four principles upon which standardization may be based. Stemming from these standards, we offer an attribution solution that generalizes prior attribution work in cooperative game theory and recasts the prior work through the lens of a causal framework. We argue that in cases where causal assumptions are violated, our solution can be interpreted as a variable (or channel) importance measure. Finally, we present a practical solution towards managing the potential complexity of the generalized attribution methodology, and show examples of attribution measurement
on several online advertising campaign data sets.
- Poster - Sequential event prediction
Benjamin Letham (Massachusetts Institute of Technology)
In sequential event prediction, we are given a sequence database of
past sequences to learn from, and we aim to predict the next event
within a current event sequence. We focus on applications where the
set of past events has predictive power and not the specific order of
those past events. Such applications arise in recommender systems,
equipment maintenance, medical informatics, and in other domains. Our
formalization of sequential event prediction draws on ideas from
supervised ranking. We show how specific choices within this approach
lead to different sequential event prediction problems and algorithms.
We apply our approach to an online grocery store recommender system as
well as a novel application in the health event prediction domain.
- Poster - Personalized Biomedical Information Retrieval: A Microbiome Case Study
Paul Thompson (Dartmouth Medical School)
Relevance judgments provided by a neonatal human microbiome researcher were used to predict the relevance of additional publications to the researcher's information need. Six PubMed queries were run to retrieve documents which the researcher judged for relevance. These relevance judgments were used to produce training and test sets for the evaluation of two machine learning algorithms: C4.5 and support vector machines. These algorithms were evaluated in two ways: 1) tenfold cross-validation and 2) training on publications from 2008-2010 and testing on documents from 2011. It was found that the researcher's relevance judgments could be used to accurately predict relevance.
- Poster - Doubly Robust Targeted Maximum Likelihood Estimation (TMLE) Of The Effect Of Display Advertising On Browser Conversion
Ori Stitelman (media6degrees)
The effectiveness of online display ads beyond simple click-through evaluation is not well established in the literature. Are the high conversion rates seen for subsets of browsers the result of choosing to display ads to a group that has a naturally higher tendency to convert or does the advertisement itself cause an additional lift? How does showing an ad to different segments of the population affect their tendencies to take a specific action, or convert? We present an approach for assessing the effect of display advertising on customer conversion that does not require the cumbersome and expensive setup of a controlled experiment, but rather uses the observed events in a regular campaign setting. The general approach can be applied to many additional types of causal questions in display advertising and beyond. The approach relies on four steps:
- Defining the question of interest.
- Using domain knowledge and temporal cues to establish causal assumptions.
- Choosing a parameter of interest that directly answers the question of interest under the causal assumption.
- Estimating the parameter of interest as well as possible using Targeted Maximum Likelihood Estimation (TMLE), a double robust estimating procedure.
We apply the above approach to several display advertising campaigns for m6d, a display advertising company that observes over 5 billion actions a day and uses that data along with machine learning algorithms to determine the best prospects for a brand.
- Poster - Dynamic Logistic Regression and Dynamic Model Averaging for Binary Classification
Tyler McCormick (University of Washington)
We propose an online binary classification procedure for cases when
there is uncertainty about the model to use and
parameters within a model change over time.
We account for model uncertainty
through Dynamic Model Averaging (DMA), a dynamic extension of
Bayesian Model Averaging (BMA) in which
posterior model probabilities may also change with time. We
apply a state-space model to the parameters of
each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a forgetting factor accommodates
different levels of change in the data-generating mechanism. We propose an algorithm which
adjusts the level of forgetting in an online fashion using
the posterior predictive distribution, and so accommodates various levels of change at different times.
We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the seven years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality.
This is a joint work of Tyler H. McCormick (University of Washington), Adrian E. Raftery (University of Washington), David Madigan (Columbia University), Randall Burd (Children's National Medical Center)
- Poster - Computing Google's PageRank by Sparse Approximation
Sou-Cheng (Terrya) Choi (University of Chicago)
The Google's PageRank eigenvector is sparse in the sense that most elements are extremely small. Basis Pursuit De-Noising is a reasonable tool for finding the tiny proportion of significant nonzeros.
This is joint work with Michael Saunders.
- Poster - Interpretable User-Centered Predictions
Cynthia Rudin (Massachusetts Institute of Technology)
I am working on the design of predictive models that are both accurate and interpretable by a human. These models are built from association rules such as dyspepsia & epigastric pain -> heartburn. I will present three algorithms for decision lists, where classification is based on a list of rules:
1) A very simple rule-based algorithm, which is to order rules based on the adjusted confidence. In this case, users can understand the whole algorithm as well as the reason for the prediction.
2) A Bayesian hierarchical model for sequentially predicting conditions of medical patients, using association rules.
3) A mixed-inter optimization (MIO) approach for learning decision lists. This algorithm has high accuracy and interpretability - both owing to the use of MIO.
This is joint work with David Madigan, Tyler McCormick, Ben Letham, Allison Chang, and Dimitris Bertsimas.