Baris Ucar
Hacettepe University and DEPS, University of Siena
Gianni Betti
DEPS, University of Siena
Abstract
The aim of this study is to look for an appropriate procedure to conduct statistical matching for longitudinal data sets. To our knowledge, among the studies, which are associated to statistical matching with longitudinal data, no such issue has been specifically covered or identified in literature. The longitudinal data set, which is used in this study, involves a longitudinal weight at individual level, which requires further procedures before the matching application. The study will discuss and propose ways to deal with the statistical matching issue for such data sets. In the process, a four-year longitudinal data set is used and data from each year is matched with a cross-sectional data set for the corresponding year. The matching procedure is comprised of two steps, respectively the Renssen (1998) method followed by nearest neighbor distance hot deck matching, proposed by D’Orazio (2016) and Donatiello et al. (2015). The application is undertaken on Turkish data to impute consumption expenditure variable from Household Budget Survey (HBS) to Statistics on Income, and Living Conditions (SILC) Survey. A synthetic longitudinal data set is created by using these two survey data sets. The two data sets have many variables in common including income variable, which enables Conditional Independence Assumption (CIA) to be more likely. The study also carries out validation analyses to determine the quality of the matching procedures and the findings achieved with the proposed itinerary, indicate that the distribution of consumption expenditure estimate in synthetic data set is well preserved with all estimates. The poverty indicators in general and at household level is also looked for and the results indicate a good quality match.