Ultra high‐dimensional semiparametric longitudinal data analysis
View Publication
Abstract
As ultra high‐dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially as exp(𝑛1/2)exp(n1/2) with respect to the number of clusters n. We consider a flexible semiparametric approach, namely, partially linear single‐index models, for ultra high‐dimensional longitudinal data. Most importantly, we allow not only the partially linear covariates but also the single‐index covariates within the unknown flexible function estimated nonparametrically to be ultra high dimensional. Using penalized generalized estimating equations, this approach can capture correlation within subjects, can perform simultaneous variable selection and estimation with a smoothly clipped absolute deviation penalty, and can capture nonlinearity and potentially some interactions among predictors. We establish asymptotic theory for the estimators including the oracle property in ultra high dimension for both the partially linear and nonparametric components, and we present an efficient algorithm to handle the computational challenges. We show the effectiveness of our method and algorithm via a simulation study and a yeast cell cycle gene expression data.