Multiple Imputation: A Statistical Programming Story

PharmaSUG 2017 - Paper SP01

Multiple Imputation: A Statistical Programming Story
Chris Smith, Cytel Inc., Cambridge, MA Scott Kosten, DataCeutics Inc., Boyertown, PA

 

ABSTRACT

Multiple imputation (MI) is a technique for handling missing data. MI is becoming an increasingly popular method for sensitivity analyses in order to assess the impact of missing data. The statistical theory behind MI is a very intense and evolving field of research for statisticians. It is important, as statistical programmers, to understand the technique in order to collaborate with statisticians on the recommended MI method. In SAS/STAT® software, MI is done using the MI and MIANALYZE procedures in conjunction with other standard analysis procedures (e.g. FREQ, GENMOD or MIXED procedures). We will describe the 3-step process in order to perform MI analyses. Our goal is to remove some of the mystery behind these procedures and to address typical misunderstandings of the MI process. We will also illustrate how multiply imputed data can be represented using the ADaM standards and principals through an exampledriven discussion. Lastly, we will do a run-time simulation in order to determine how the number of imputations influences the MI process. SAS® 9.4 M2 and SAS/STAT 13.2 software were used in the examples presented, but we will call out any version dependencies throughout the text. This paper is written to all levels of SAS users. While we present a statistical programmer’s perspective, an introductory level understanding about statistics including p-values, hypothesis testing, confidence intervals, mixed models, and regression is beneficial. Click here to view the full article.