Hypothesis Testing for Matched Pairs with Missing Data by Maximum Mean Discrepancy: An Application to Continuous Glucose Monitoring

A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, e.g., graphs, strings, and probability distributions. To fill this gap, this paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness assumptions. The validity of this approach is proven and further studied in an extensive simulation study, and statistical consistency results are provided. Data obtained from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing new distributional representations along with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over 5 years can be explored.

keywords: Paired missing data; Distributional representations; Kernel methods