Learning Data Science at the IMA

To meet the burgeoning needs of industry in data analytics, and to create career pathways for young mathematical scientists, the IMA launched the inaugural Data Science Fellowship this past winter to prepare Ph.D. students near graduation and recent postdoctoral mathematicians to become successful data science professionals. The seven-week program trains participants in the basic techniques and tools in data science and provides the opportunity to work in teams on real-world industrial projects proposed by partner companies.

The six selected fellows arrived with varied backgrounds – from low-dimensional topology and partial differential equations to scientific computing to combinatorics, probability, and numerical analysis – and while many of them had computing experience, there was limited exposure to data science. The fellowship served as a way to bridge the gap in knowledge between graduate school and working in industry while providing a demonstrable skill set in data science for employers.

“I'm looking to start a career in data science, but I come from a background in theoretical math with only occasional experience working with real-world data,” said Gavin King, a fifth-year graduate student at the University of Wyoming.

“During my graduate training, my interest in data science and machine learning grew ‘by the second,’ but I had lots of questions about the field and the best approach to handle some machine learning problems,” added Olabanji Shonibare, a recent graduate from Michigan Technological University.

The program began with two weeks of instruction on basic skills by Daniel Kaplan, a professor of mathematics, statistics, and computer science at Macalester College. The small size of the program let participants interact closely with Kaplan and IMA Director Fadil Santosa for help, advice, and feedback.

The remaining weeks of the fellowship were devoted to working on real-world projects provided by Ford, New Frontier Data, Corning, 3M, and Alternative Strategies Advisors on topics related to data wrangling, machine learning, predictive analytics, natural language processing, and sentiment analysis. This included analyzing glass data and building models to predict properties of glass; examining the results of an owner survey to study capacity loss of the Tesla Model S battery; evaluating municipal bond trading activities and building a model to predict higher trading prices in the future; and analyzing Comcast consumer reviews to determine patterns that would improve customer service strategies.

Over the course of seven weeks, the fellows received several valuable lessons: learning to talk the language of their clients, using visualization tools effectively, developing “storytelling” using quantitative findings, and recognizing that some data are nonsense and contain little information.

“We were given the opportunity to learn not only how to analyze data, but how to present it in an easily consumable fashion,” said Melissa Davidson, a graduate from the University of Notre Dame. “Presenting data in a way that everyone can understand is very powerful.”

“I learned the pipeline of a data science project if given a raw data set – how to start from it, how to clean data, perform basic data analysis, and tell stories from data, how to apply machine learning algorithms, and how to deal with clients,” added Xiao Wang, a fifth-year graduate student from University of Illinois at Urbana-Champaign.

The fellowship also assisted participants with building their career paths by helping them seek employment in industry and inviting professionals working in data science to talk about their jobs. According to Santosa, all six fellows leave the program with promising job leads.