Sampling choice data from geo-referenced populations: issues to consider

30 September, 2019
Verona, Italy

Stated choice experiments are now routinely used to gather preferences and welfare estimates in environmental, health, transport and marketing applications. The approach entails asking a sample of respondents to identify their preferred alternative from a number of competing hypothetical alternatives. When deciding on the appropriate design of this sample, a range of factors must be considered, encompassing inclusion criteria, sampling techniques as well as intentions for analysing, interpreting and reporting the results. Alongside these factors, while typically dictated by budget constraints, it is necessary to appropriately determine the sample size to assure an adequate power to detect statistical significance. If it is too large, a huge amount of resources are required; if it is too small, the results may become inefficient and as a consequence not useful.

An optimal sampling design, therefore, entails a trade-off between accuracy of prediction (requiring larger samples) and the cost of sampling (limiting the number of samples). Consequently, sampling efficiency, which is ameasure of the optimality of a sampling strategy, is a much sought after property. A more efficient sampling strategy requires a smaller sample size (and budget) to reach a given level of accuracy. One way to achieve this, which has attracted considerable attention in the stated choice experiment literature, is through the use of efficient experimental designs. However, while efficient experimental designs are now commonplace in this literature, any gain in efficiency afforded by these designs is largely dependent on the sampling strategy itself. Although experimental designs are very important for stated choice experiments and have, indeed, been shown to influence sampling efficiency, the manner in which the sampling units are selected from the population is, therefore, perhaps of greater (and more obvious) consequence for sampling efficiency. Yet, this issue has attracted much less attention.

An important, and commonly overlooked, disadvantage of classical random sampling is that it ignores any spatial dependence. However, spatial autocorrelation is ubiquitous in spatial populations. Indeed, there is established evidence that spatial autocorrelation and spatial heterogeneity patterns are inherent in stated preference data. This may disparage many of the current design and execution of sampling schemes used by stated preference practitioners. In cases where spatial dependence does exist, random sampling can lead to data redundancy. In extreme cases, this may seriously impede the efficiency of conventional sampling techniques, making it important for it to be addressed. Despite numerous studies showing that spatial variations do apply in stated choice data, the literature is, however, lacking any guidance on how one should select a sampling strategy to recognise this. Taking into account the well-known geographical principle that observations close to each other are more likely to be similar than observations further apart, then taking samples close to one another may not increase prediction accuracy but will increase costs. The importance of this is explored here.

Danny Campbell
Danny Campbell
Professor of Economics