Past Events

Math-to-Industry Boot Camp II

Advisory: Application deadline is February 17, 2017

Organizers:

Fadil Santosa, University of Minnesota, Twin Cities
Daniel Spirn, University of Minnesota, Twin Cities

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The boot camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students will work in teams on projects and will be provided with soft skills training.

There will be two group projects during the session: a small-scale project designed to introduce the concept of solving open-ended problems and working in teams, and a "capstone project" that will be posed by industry scientists. The students will be able to interact with industry participants at various points in the program.

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

Statement of reason for participation, career goals, and relevant experience
Unofficial transcript, evidence of good standing, and have full-time status
Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted by March 17.

Participants

Name	Department	Affiliation
Sameed Ahmed	Department of Mathematics	University of South Carolina
Christopher Bemis		Whitebox Advisors
Amanda Bernstein	Department of Mathematics	North Carolina State University
Jesse Berwald	Enterprise Data Analytics & Business Intelligence	Target Corporation
Neha Bora	Department of Mathematics	Iowa State University
Jeremy Brandman	Computational Physics	ExxonMobil
Phillip Bressie	Mathematics	Kansas State University
Nicole Bridgland	School of Mathematics	University of Minnesota, Twin Cities
Yiying Cheng	Department of Mathematics	University of Kansas
Michael Dairyko	Department of Mathematics	Iowa State University
Miandra Ellis	School of Mathematical and Statistical Sciences	Arizona State University
Wen Feng	Department of Applied Mathematics	University of Kansas
Jasmine Foo	School of Mathematics	University of Minnesota, Twin Cities
Melissa Gaddy	Department of Mathematics	North Carolina State University
Thomas Grandine		The Boeing Company
Ngartelbaye Guerngar	Department of Mathematics and Statistics	Auburn University
Jamie Haddock	Department of Applied Mathematics	University of California, Davis
Madeline Handschy		University of Minnesota, Twin Cities
Qie He	Department of Industrial and Systems Engineering	University of Minnesota, Twin Cities
Thomas Hoft	Department of Mathematics	University of St. Thomas
Tahir Bachar Issa	Department of Mathematics and Statistics	Auburn University (Auburn, AL, US)
Alicia Johnson		Macalester College
Cassidy Krause	Department of Mathematics	University of Kansas
Kevin Leder	Department of Industrial System and Engineering	University of Minnesota, Twin Cities
Gilad Lerman	School of Mathematics	University of Minnesota, Twin Cities
Hongshan Li	Department of Mathematics	Purdue University
Wenbo Li	Applied Mathematics & Statistics, and Scientific Computation	University of Maryland
Youzuo Lin		Los Alamos National Laboratory
John Lynch	Department of Mathematics	University of Wisconsin, Madison
Eric Malitz	Department of Mathematics, Statistics and Computer Science	University of Illinois, Chicago
Tianyi Mao	Department of Mathematics	City University of New York
Emily McMillon	Department of Mathematics	University of Nebraska
Christine Mennicke	Department of Applied Mathematics	North Carolina State University
Kacy Messerschmidt	Department of Mathematics	Iowa State University
Sarah Miracle	Department of Computer and Information Sciences	University of St. Thomas
Ngai Fung Ng		Purdue University
Hieu Nguyen	Institute for Computational Engineering and Sciences	The University of Texas at Austin
Kelly O'Connell	Department of Mathematics	Vanderbilt University
Luca Pallucchini		Temple University
Karoline Pershell	Strategy and Evaluation Division	Service Robotics & Technologies
Fesobi Saliu	Department of Mathematical Sciences	University of Memphis
Fadil Santosa	Institute for Mathematics and its Applications	University of Minnesota, Twin Cities
Richard Sharp		Starbucks
Samantha Shumacher		Target Corporation
Sudip Sinha	Department of Mathematics	Louisiana State University
Ryan Siskind		Target Corporation
Daniel Spirn	University of Minnesota	University of Minnesota, Twin Cities
Anna Srapionyan	Center for Applied Mathematics	Cornell University
Trevor Steil	School of Mathematics	University of Minnesota, Twin Cities
Andrew Stein	Department of Modeling and Simulation	Novartis Institute for Biomedical Research
Aditya Vaidyanathan	Center for Applied Mathematics	Cornell University
Zachary Voller		Target Corporation
Zhaoxia Wang		Louisiana State University
Dara Zirlin	Mathematics Department	University of Illinois at Urbana-Champaign

Projects and teams

Team 1: A Dictionary-Based Remote Sensing Imagery Classification/Clustering Techniques: Features Selection, Optimization Methods

Mentor Youzuo Lin, Los Alamos National Laboratory

Remotely sensed imagery classification/clustering seek grouped pixels to represent land cover features. It has broad applications across engineering and sciences domains. However, because of the large volume of imagery data and limited features available, it is challenging to correctly understand the contents within the imagery. This project team will develop efficient and accurate machine-learning methods for remotely sensed imagery classification/clustering. To achieve this goal, we will explore various image classification/clustering methods. In particular, we are interested dictionary-learning based image analysis methods. Being one of the most successful machine-learning methods, dictionary learning has shown promising performances in various machine learning applications. In this project, the team will focus on the following tasks:

look into a couple of state-of-the-art dictionary learning methods including K-SVD [1] and SPORCO [2]
apply dictionary-learning technique to remotely sensed imagery classification/clustering
compare performances of employing different dictionary-learning methods
analyze computational costs, and further improve the computational efficiency

Out of this project, the team will be able to learn the fundamentals of machine learning with applications to image analysis, understand the specific computational tools for solving large-scale applications, and be capable of solving real problems with those aforementioned techniques.

References:

[1] K-SVD: M. Aharon, M. Elad and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006. (Sources Available at http://www.cs.technion.ac.il/~elad/software/)

[2] SPORCO: B. Wohlberg, "Efficient Algorithms for Convolutional Sparse Representations," IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 301-315, 2016. (Sources Available at http://brendt.wohlberg.net/software/SPORCO/)

Team 2: Optimizing Well Placement in an Oil Reservoir

Mentor Jeremy Brandman, ExxonMobil

Oil and gas – also known as hydrocarbons – are typically found thousands of meters below the earth’s surface in the void space of sedimentary rocks. The extraction of these hydrocarbons relies on the operation of injection and production wells.

Injection wells are used to displace hydrocarbons through the injection of other fluids (e.g. water and CO_2) and maintain overall reservoir pressure. Production wells are responsible for extracting reservoir fluids from the rocks and transporting them to the surface.

Drilling a well is expensive – the cost can be in the hundreds of millions of dollars – and time-consuming. Therefore, it is imperative that wells are placed and operated in a manner that optimizes reservoir profitability. The goal of this project is to develop a well placement strategy that addresses this business need.

The project’s focus will be non-invasive (i.e., black-box or derivative-free) optimization strategies for well placement. Non-invasive approaches are appealing because they do not require access to the computer code used to simulate the flow of hydrocarbons and other fluids. This is an important consideration as industrial flow simulators are complex and constantly in flux, making gradient information potentially difficult to acquire.

In order to test ideas and verify algorithms, the project will begin by considering well placement optimization in the context of a homogeneous two-dimensional reservoir. Following this, students will consider problems in heterogeneous reservoirs inspired by real-world examples.

Students will be provided with a flow simulator written in C that can be coupled to optimization algorithms written in C or Python. An introduction to modeling fluid flow in porous media will also be given.

Team 3: Machine Tool and Robot Calibration through Kinematic Analysis: A Least Squares Approach

Mentor Thomas Grandine, The Boeing Company

Modern machine tools and robots are constructed by assembling sequences of joints and linkages. An end effector, typically a cutter, tool, probe, or other device is attached to the end of the last linkage. Control of these devices is accomplished through a controller through which the location of the various components are programmed. In the usual cases, programming these joint and linkage locations leads to a programmed nominal position for the end effector. Because of mechanical variation and other sources of error, the nominal programmed location of the end effector and the actual location of the end effector are not exactly the same. Most controllers are equipped with compensation functions to account for this, so that the actual location of the linkages is set to the nominal position plus a correction term with the intent that the final position of the actual end effector should be much closer to the intended nominal position. One way of constructing the compensation functions is to program the machines to move the end effector to a collection of different locations. The actual location of the end effector is then measured using some independent means, often a laser scanner or other device, and the difference between the actual end effector location and the nominal end effector location can be measured. Given these discrepancies, a nonlinear least squares problem can be formulated from which accurate error functions can be constructed. In this workshop, we will review the standard methods for solving these problems and then explore some potential new ways of modeling the error functions with a view toward taking this good procedure and making it even better.

Team 4: Personalized Marketing

Mentor Richard Sharp, Starbucks

The goal of personalized marketing is to send the right message to the right person at the right time. Rules-based, targeted marketing suffers from a measurement problem: it works on average, being useful for some but irrelevant for others, and you can’t tell one group from the other. Online retailers are generally better able to track individual customer behavior than their brick and mortar counterparts, but still suffer from an inability to put that behavior in context. A common result is that a shed (or book or shoes or tent or whatever) chases you around the internet. Yes, you searched for it, but then you went down to the store and bought it in person. The next time that add pops up it’s gotten the behavior right, but completely missed the context: right message, right person, wrong time.

Personalized marketing attempts to reduce the inefficiency of targeted marketing by making algorithmic, rather than rules-based decisions, that treat the recipient as an individual rather than a representative of a general class. Challenges include discovering useful behavioral and contextual clues in a mountain of transactional and other data, determining an optimal decision strategy for making use of that information towards some objective, and selecting the objective itself. Unsurprisingly, increasing revenue is a common objective, but so is increasing engagement (or similarly decreasing churn) and objectives can range as widely as supporting health related decisions like smoking cessation or helping individuals make better financial decisions.

We will develop a mathematical model that is part of a working system for making offer decisions. Some of the significant topics we will work to address are:

measuring incremental impact
behavioral and contextual feature engineering
decision strategies and objectives
continual operation in a real-world setting (including feedback for system operators)

Team 5: Supporting oncology drug development by deriving a lumped parameter for characterizing target inhibition in standard math

Mentor Andrew Stein, Novartis Institute for Biomedical Research

During the development of biotherapeutic drugs, modelers are often asked to predict the dosing regimen needed to achieve sufficient target inhibition for efficacy in a solid tumor [1, 2]. Previous work showed that under many relevant clinical scenarios, target inhibition in blood can be characterized by a single lumped parameter: Kd*Tacc/Cavg, where Kd is the binding affinity of the drug, Tacc is the fold-accumulation of the target during therapy, and Cavg is the average drug concentration under the dosing regimen of interest [3]. This project will focus on extending these results to characterizing target inhibition in a tumor, to assist in development of targeted therapies and immunotherapies in oncology.

References

Deng, Rong, et al. "Preclinical pharmacokinetics, pharmacodynamics, tissue distribution, and tumor penetration of anti-PD-L1 monoclonal antibody, an immune checkpoint inhibitor." MAbs. Vol. 8. No. 3. Taylor & Francis (2016) Suppl Fig 5.
Lindauer, A., et al. "Translational Pharmacokinetic/Pharmacodynamic Modeling of Tumor Growth Inhibition Supports Dose‐Range Selection of the Anti–PD‐1 Antibody Pembrolizumab." CPT: Pharmacometrics & Systems Pharmacology (2017).
Stein AM, Ramakrishna R. "AFIR: A dimensionless potency metric for characterizing the activity of monoclonal antibodies." Clin. Pharmacol. Ther: Pharmacometrics and Systems Pharmacol, doi 10.1002/psp4.12169, 2017.

Team 6: How do robots find their way home? Optimizing RFID beacon placement for robot localization and navigation in indoor spaces

Mentor Karoline Pershell, Service Robotics & Technologies

While map apps on mobile devices are excellent for getting around town, they are not precise enough to use within buildings. We are currently working on deploying service robots (vacuuming, security, mail delivery) throughout a facility, and the robotic systems will navigate the space based on a pre-made facility map and built-in obstacle avoidance technology. However, a robot still needs to localize itself within the map (i.e., determine where it is on the map) at regular intervals. Using RFID beaconing technology to triangulate position is a promising option for localization. Given a map and RFID readings along a path, can we extrapolate the signal strength to any point in the map. That is, can we develop a model that will allow a robot to localize on a map? How do we optimize the placement (and other variable settings) of beacons to reduce cost but ensure localization? How can we model reduced signals (e.g., beacons in neighboring rooms who signal is coming through a wall), and differentiate between reduced signals and beacons that are far away, acknowledging that signal strength is often variable?

Math-to-Industry Boot Camp

Advisory:
Application deadline is February 15, 2016

Organizers:

Fadil Santosa, University of Minnesota, Twin Cities
Daniel Spirn, University of Minnesota, Twin Cities
Carlos Tolmasky, University of Minnesota, Twin Cities

Description

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The Boot Camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students will work in teams on projects and will be provided with soft skills training.

There will be two group projects during the session: one "baby project" designed to introduce the concept of solving open-ended problems and on working in teams, and one "capstone project" which will be posed by industry scientists. The students will be able to interact with industry participants at various points in the program.

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

Applications

To apply, please supply the following materials through the link at the top of the page:

Statement of reason for participation, career goals, and relevant experience
Unofficial transcript, evidence of good standing, and have full-time status
Letter of support from advisor, director of graduate studies, or department chair

Participants

Name	Department	Affiliation
Luis Aguirre	Department of Mathematics	Texas Christian University
Kirsten Anderson		Kirsten L Anderson, LLC
Niles Armstrong	Department of Mathematics	Kansas State University
Christopher Bemis		Whitebox Advisors
Jesse Berwald	Enterprise Data Analytics & Business Intelligence	Target Corporation
Mark Blumstein	Department of Mathematics	Colorado State University
Marina Brockway		VivaQuant
Anthea Cheung	Department of Mathematics and Statistics	Boston University
Lise Chlebak	Department of Mathematics	Tufts University
Kelsey DiPietro	Department of Applied Computational Mathematics and Statistics	University of Notre Dame
An Do	Department of Mathematics	Claremont Graduate University
Natalie Durgin	Department of Data Science	Spiceworks
Jasmine Foo	School of Mathematics	University of Minnesota, Twin Cities
Richard Frnka	Department of Mathematics	Louisiana State University
Arezou Ghesmati	Department of Mathematics	Texas A & M University
John Goes	School of Mathematics	University of Minnesota, Twin Cities
Rohit Gupta	Institute for Mathematics and Its Applications	University of Minnesota, Twin Cities
Alex Happ	Department of Mathematics	University of Kentucky
Mela Hardin	Department of Mathematics	Arizona State University
Lindsey Hiltner	Department of Mathematics	University of Minnesota, Twin Cities
Brian Hunter	Department of Mathematics	Texas A & M University
Ahmet Kabakulak	Department of Mathematics	University of Wisconsin, Madison
Julienne Kabre		Illinois Institute of Technology
Katherine Kinnaird		Brown University
Avary Kolasinski	Department of Mathematics	University of Kansas
Shaked Koplewitz	Department of Mathematics	Yale University
Henry Kvinge	Department of Mathematics	University of California, Davis
George Lankford	Department of Mathematics	North Carolina State University
Kevin Leder	Department of Industrial System and Engineering	University of Minnesota, Twin Cities
Gilad Lerman	School of Mathematics	University of Minnesota, Twin Cities
Alfonso Limon		Oneirix Labs
Mike Makowesky		MSM Investment Partners
Kristina Martin	Department of Mathematics	North Carolina State University
Sarah Miracle	Department of Computer and Information Sciences	University of St. Thomas
Yoichiro Mori	School of Mathematics	University of Minnesota, Twin Cities
Khanh Nguyen	Department of Mathematics	University of Houston
Marcella Noorman	Department of Mathematics	North Carolina State University
Dimitrios Ntogkas	Department of Mathematics	University of Maryland
Brian Preskitt	Department of Mathematics	University of California, San Diego
Mrinal Raghupathi	USAA Asset Management Company	USAA Asset Management Company
Analise Rodenberg	Department of Mathematics	University of Minnesota, Twin Cities
Keith Rush	Department of Mathematics	University of Wisconsin, Madison
Nathan Salazar	Department of Mathematics	The University of Iowa
Fadil Santosa	Institute for Mathematics and its Applications	University of Minnesota, Twin Cities
Jacob Shapiro	Department of Mathematics	Purdue University
Timothy Spencer	Department of Mathematics	Georgia Institute of Technology
Daniel Spirn	University of Minnesota	University of Minnesota, Twin Cities
Sumanth Swaminathan		Revon Systems
Stan Swierczek	Program in Applied Mathematics	University of Arizona
Carlos Tolmasky	Institute for Mathematics and its Applications	University of Minnesota, Twin Cities
Katie Tucker	Department of Mathematics	University of Nebraska
Joshua Wilson	Department of Mathematics	University of Minnesota, Twin Cities
Yuhong Yang	Department of Statistics	University of Minnesota, Twin Cities
Camille Zerfas	Department of Mathematical Sciences	Clemson University
Ding Zhao	Department of Mathematics	University of Kentucky

Projects and teams

Team 1: Improving Accuracy of ECG Monitoring Using a Wearable Device

Mentor Marina Brockway, VivaQuant
Lindsey Hiltner, University of Minnesota, Twin Cities
Julienne Kabre, Illinois Institute of Technology
Dimitrios Ntogkas, University of Maryland
Nathan Salazar, The University of Iowa
Katie Tucker, University of Nebraska

Remote ECG monitoring through the use of wearable wireless devices is continuing to play an important role in health care by enabling better diagnostics and individualized medicine delivery at a lower cost. As the use of these devices increases, the cost of analyzing the large volume of data produced by these devices becomes more significant. Considerable progress has been made in rendering remote ECGs more resilient to the noise that is commonly encountered with these recordings. However, more work remains in order to achieve the goal of fully automated analysis of ambulatory ECGs. In order to accomplish this goal, high accuracy of beat detection is required despite the presence of noise and signal corruption as well as changes in ECG character due to the presence of cardiac arrhythmia. This project will concentrate on developing a pattern recognition algorithm capable of identifying errors in detection of normal and ectopic ventricular beats.

Team 2: Human Guided Machine Vision (HGMV)

Mentor Alfonso Limon,
Kelsey DiPietro, University of Notre Dame
Richard Frnka, Louisiana State University
Brian Hunter, Texas A & M University
Shaked Koplewitz, Yale University
Khanh Nguyen, University of Houston
Timothy Spencer, Georgia Institute of Technology

In Human Guided Machine Vision (HGMV), humans and machines collaborate to achieve vision tasks that neither can achieve individually. Humans are smart, perceptive and creative, while computers are fast and accurate. Bringing the best qualities of both participants to collaborate on a vision task requires rethinking of human-computer interaction (HCI) paradigms. In HGMV frameworks, we endeavor to produce real time feedback of what will happen if a particular action was taken by the human participant, before the action has been taken. Take a simple example, of detecting a circle with human help. A non-HGMV version of such a software would take in a mouse click from a user and produce a circle that is close to where the mouse was clicked. An HGMV version of the same software would produce fast and accurate feedback of what circle would get detected IF the user were to click where the mouse pointer currently is, without the user actually clicking. This feedback is updated real-time, as the mouse pointer is moved by the user. The lag between the user's mouse movement and the provided feedback should be minimal - ideally, below the perception threshold of 80ms. This requires a rethinking of image processing algorithms. Typically, under HGMV, image processing algorithms are split into a pre- processing step and a real-time step. The pre-processing step should take minimum time, work over the entire image, and produce data (but not too much data) which will allow the real-time step to take in the mouse pointer location and compute the feedback very fast. This technique is most successful, when the pre-processing step exploits some economy of scale due to being processed over the entire image rather than locally.

In the present project, the project team shall undertake development of an algorithm to detect roads consistent with the HGMV paradigm. Fully automated detection of roads from aerial views remains elusive. A computer cannot always tell the difference between a road and various road-like structures. Furthermore, a computer cannot necessarily tell, looking at junctions and interactions, which road went where. A human being has much more understanding of such nuance, which is why humans still digitize road networks for mapping software. The idea of this project is to allow humans to bring their superior understanding to the table, whereas still using the computer to achieve better speed and accuracy of the road digitization task. E.g., finding the center of a road, finding the carriage width, tracing the road (in obvious stretches) can be done much faster by a computer. The project team is tasked with the following tasks:

Create a script of how a computer and a human would interact to detect roads, while following the HGMV paradigm. Remember that one mainstay of this paradigm is real time feedback of what will happen IF the user clicked.
Create image processing algorithms for human guided road detection.
Split the algorithms into a preprocessing and real-time step.
Create a working demonstration of the proposed algorithms, having real human interactions, real image processing, changeable images, but possibly a very simplistic UI.

Team 3: Mathematical Prediction of Physician Triage of Asthma

Mentor Sumanth Swaminathan, Revon Systems
Mark Blumstein, Colorado State University
Lise Chlebak, Tufts University
Henry Kvinge, University of California, Davis
Camille Zerfas, Clemson University

Asthma is a lung condition that imposes a significant burden on patients’ daily lives. Escalations of this condition (or exacerbations) are a frequent trigger of physician and hospital visits, which are both costly and distressing to patients. The need for novel solutions that limit the impact of exacerbations on global health is abundantly apparent.

One emerging approach to addressing asthma exacerbation is early detection by way of mobile app technology. Many of these apps, however, utilize rule based decision frameworks, which are constantly hampered by the size of the variable space involved in triage and diagnosis.

We are interested in developing a mathematical model that predicts the appropriate triage (urgency of illness) for an asthma patient based off of patient health characteristics. In particular, we hope to train a machine learning type model on physician generated triage data and use that to make out of sample predictions. Some of our major goals and questions include:

What are the most important patient health features or combination of features for predicting an accurate patient triage?
Why do those particular features or combination of features matter the most?
1. Can we understand the temporal component of these features (does the temporal change in these features matter or can we make predictions on features at a snapshot of time?)
Why does the particular machine learning model selected perform better than alternatives?
What insights can be drawn from the physician triage data itself? Are there nontrivial trends in physician diagnosis that can be brought to light?
What data visualization techniques best represent the models, and how might you tune the visualization to convey different aspects of the features and functionality?
How do you represent the probability accuracy for the factors that affect the outcome and instill the appropriate level of confidence that the results are trustable and of high quality?
Can you suggest ways of using feedback from incorrect algorithm triage to feedback into the current predictor to improve future performance (retraining protocols, real time retraining, etc)?

Team 4: Universal Identifier

Mentor Jesse Berwald, Target Corporation
Niles Armstrong, Kansas State University
An Do, Claremont Graduate University
Arezou Ghesmati, Texas A & M University
Alex Happ, University of Kentucky
George Lankford, North Carolina State University
Ding Zhao, University of Kentucky

Uniquely identifying users or customers is a complex problem when people interact with businesses through multiple channels (store, website, coupon sites, etc.) and through multiple devices (desktop, mobile, phone). It often happens that retailers have multiple records for unique individuals. In trying to provide more personalized context, as well as understand the effectiveness of business strategies, it is useful to have a "Universal ID" which links together all of a user's identities.

To get a rough idea of how this would work, it is easiest to consider a simple example. Consider a small set of identifiers that a user might have:

Browser cookie
Email address
Credit card number (hashed)
Login ID

The user can also take certain actions that provide a link between identifiers e.g.:

Logging in to a website links a browser cookie to a login ID
Making a purchase links a credit card number to an email address, and possibly a login ID
Entering an email address in account settings links an email address

The goal is to develop a technique to link all of these IDs together, especially those that don't have an action that provides a direct link (e.g. credit card number to browser cookie). We've approached this using graphs and network theory, but don't let that stop you from using other techniques.

Some additional complications you can add in if time allows:

A user may have multiple identities of the same type (cookies for mobile and desktop browser, multiple credit cards, etc.)
Some "link" actions provide better information than others. A user can log in to their account from another user's computer, which would provide a false cookie-to-login ID link.
The above suggests some sort of probabilistic model. It would be nice to have some sort of score that allows trading off precision for recall depending on the use case.

Team 5: Examining and Resolving Issues found in Mean Variance Optimization

Mentor Christopher Bemis, Whitebox Advisors
Avary Kolasinski, University of Kansas
Kristina Martin, North Carolina State University
Brian Preskitt, University of California, San Diego
Jacob Shapiro, Purdue University
Stan Swierczek, University of Arizona

Mean variance optimization, while providing a foundation for modern portfolio theory, is rife with known issues. In recent years, focus has been paid to the distribution of eigenvalues of the sample covariance matrix, one of the central parameters of the problem. Random matrix theory has found some general level of application as a result.

We use market data to examine the pitfalls in the standard theory. With this groundwork laid, we proceed to examine several remedies, including shrinkage estimators, principal component analysis, and applications arising from random matrix theory. Our analyses will focus in large part on the effect of each remedy to the spectrum of the input covariance matrix, allowing a common discussion amongst methods, and identifying why exactly the original formulation fails.

Team 6: Inventory Demand Kaggle Competition

Mentor Natalie Durgin, Spiceworks
Luis Aguirre, Texas Christian University
Anthea Cheung, Boston University
Mela Hardin, Arizona State University
Ahmet Kabakulak, University of Wisconsin, Madison
Marcella Noorman, North Carolina State University

Data and project information can be found on the Kaggle website.

Many internet companies are monetized by serving ads on their websites. Their “inventory” comprises the ad-slots available in front of their users. Their sales department might agree to run an ad campaign for the month, and serve ads into available slots. Ads are often paid for in units of a thousand served. Internet traffic fluctuates. If there are a dearth of slots and a surplus of ads, revenue will be lost. Alternatively, if not enough ads are booked, slots will run empty when money could have been made. Using historical data to predict user traffic and the availability and performance of ad-slots is an important problem, and has an obvious parallel to the Bimbo bakery problem. We will develop a model for the Grupo Bimbo Inventory Demand problem alongside the Kaggle data science community.