Track Me but Not Really: Tracking Undercoverage in Metered Data Collection.
Date:
Description: Metered data, also called “web-tracking data”, has been proposed as a potentially useful way of measuring online behaviours, since it allows observation of web browsing unobtrusively and without relying on fallible self-reports. Metered data is generally collected from a sample of respondents who willingly install or configure, into their devices, tracking technologies that track digital traces left when people go online (e.g. URLs visited).
To track the complete online behaviour of an individual, tracking technologies must be installed and track all devices, browsers/apps and/or networks (from now on targets) used to go online. When only a subset of targets is tracked, an incomplete record of online behaviours is observed. This undercoverage can negatively affect metered data quality, producing potentially large biases in population estimates. Although little is currently known about this type of undercoverage, past research indicates that a range of factors can prevent researchers from tracking all targets that participants use (Bosch and Revilla, 2021), and that a high proportion of individuals participating in metered studies are undercovered (Pew Research Center, 2020).
To assess the impact of this type of undercoverage on the quality of metered data estimates, we collected metered survey and paradata in Spain, Portugal and Italy. Combining paradata and self-reports of participants’ tracked and used targets, we show the prevalence and characteristics of undercoverage. Besides, using metered data from the subsample of fully covered individuals and through the use of simulations, we estimate the extent and mechanisms in which undercoverage biases metered data estimates, both univariate (e.g. means and proportions) and bivariate (e.g. correlation and regression coefficients). These estimates are computed for different levels of (% of participants affected by) device, browser and app undercoverage.