# Methodological Aspects of Predictive Data Analytics and VC-theory

Vapnik-Chervonenkis theory (VC-theory), aka Statistical Learning Theory, provides a mathematical framework for predictive data-analytic modeling in machine learning, statistics, data mining, signal processing, bioinformatics etc. The VC-theory was developed in the 1970’s in the former USSR, but it became widely known only in mid-1990’s after introduction of Support Vector Machines. Even though the VC-theory is widely known as a mathematical theory, its methodological contributions are not well known or appreciated. This talk focuses on VC-theoretical methodology for estimating data-analytic predictive models. In particular, I will discuss important differences between knowledge discovery in classical science and modern data-analytic knowledge discovery. The classical view (e.g., adopted in statistics and in signal processing) is that knowledge can be discovered by applying readily available statistical/ machine learning software to the growing volumes of data, aka Big Data: more_data_more_knowledge. An opposite view is that scientific inquiry starts with asking intelligent questions. That is, Science starts from problems, and not from observations (K. Popper). For data-analytic modeling, this view emphasizes proper formalization of application domain requirements and selecting proper type of inductive inference.

Vladimir Cherkassky is Professor of Electrical and Computer Engineering at the University of Minnesota, Twin Cities. He received MS in Operations Research from Moscow Aviation Institute in 1976 and PhD in Electrical and Computer Engineering from the University of Texas at Austin in 1985. He has worked on theory and applications of statistical learning since late 1980’s and he has co-authored the monograph Learning from Data, Wiley-Interscience, now in its second edition. He is also the author of a new textbook Predictive Learning - see www.VCtextbook.com

He has served on editorial boards of IEEE Transactions on Neural Networks (TNN), Neural Networks, Natural Computing, and Neural Processing Letters. He was a Guest Editor of the IEEE TNN Special Issue on VC Learning Theory and Its Applications published in September 1999. Dr. Cherkassky was organizer and Director of NATO Advanced Study Institute (ASI) From Statistics to Neural Networks: Theory and Pattern Recognition Applications held in France in 1993. He received the IBM Faculty Partnership Award in 1996 and 1997 for his work on learning methods for data mining. In 2007, he became Fellow of IEEE for 'contributions and leadership in statistical learning'. In 2008, he received the A. Richard Newton Breakthrough Award from Microsoft Research for 'development of new methodologies for predictive learning'.