Clustering Mixed-Type Data

Thursday, November 8, 2018 - 9:50am - 10:20am
Lind 305
Marianthi Markatou (University at Buffalo (SUNY))
To, effectively, combat the opioid epidemic and associated infectious diseases we must elucidate factors associated with therapeutic initiation, adherence and completion of therapy. However, the patient population compounds the problem of elucidation of these factors. Substance users are characterized by lack of medical care, generally limited financial resources and low health and educational literacy. The recent focus on social determinants of health as well as factors associated with substance use attempt to provide a framework to identify patient subgroups likely to benefit from medical treatment.

Many of the variables measured in this context are mixed-type scale variables, of both interval and categorical (nominal and/or ordinal) scale variables. Current clustering algorithms for mixed-scale variables suffer from at least one of two challenges: 1) they are unable to equitably balance the contribution of continuous and categorical scale variables without strong parametric assumptions; 2) they are unable to properly handle data sets in which only a subset of variables are related to the underlying cluster structure of interest. We present KAMILA, a clustering method that addresses (1)and in many situations (2) without requiring strong assumptions. We study theoretical aspects of our method and demonstrate its performance using simulated and real data.