On estimating variances for Gini coefficients with complex surveys: theory and application




Hoque, Ahmed

Journal Title

Journal ISSN

Volume Title



Obtaining variances for the plug-in estimator of the Gini coefficient for inequality has preoccupied researchers for decades with the proposed analytic formulae often being regarded as being too cumbersome to apply, as well as usually based on the assumption of an iid structure. We examine several variance estimation techniques for a Gini coefficient estimator obtained from a complex survey, a sampling design often used to obtain sample data in inequality studies. In the first part of the dissertation, we prove that Bhattacharya’s (2007) asymptotic variance estimator when data arise from a complex survey is equivalent to an asymptotic variance estimator derived by Binder and Kovačević (1995) nearly twenty years earlier. In addition, to aid applied researchers, we also show how auxiliary regressions can be used to generate the plug-in Gini estimator and its asymptotic variance, irrespective of the sampling design. In the second part of the dissertation, using Monte Carlo (MC) simulations with 36 data generating processes under the beta, lognormal, chi-square, and the Pareto distributional assumptions with sample data obtained under various complex survey designs, we explore two finite sample properties of the Gini coefficient estimator: bias of the estimator and empirical coverage probabilities of interval estimators for the Gini coefficient. We find high sensitivity to the number of strata and the underlying distribution of the population data. We compare the performance of two standard normal (SN) approximation interval estimators using the asymptotic variance estimators of Binder and Kovačević (1995) and Bhattacharya (2007), another SN approximation interval estimator using a traditional bootstrap variance estimator, and a standard MC bootstrap percentile interval estimator under a complex survey design. With few exceptions, namely with small samples and/or highly skewed distributions of the underlying population data where the bootstrap methods work relatively better, the SN approximation interval estimators using asymptotic variances perform quite well. Finally, health data on the body mass index and hemoglobin levels for Bangladeshi women and children, respectively, are used as illustrations. Inequality analysis of these two important indicators provides a better understanding about the health status of women and children. Our empirical results show that statistical inferences regarding inequality in these well-being variables, measured by the Gini coefficients, based on Binder and Kovačević’s and Bhattacharya’s asymptotic variance estimators, give equivalent outcomes. Although the bootstrap approach often generates slightly smaller variance estimates in small samples, the hypotheses test results or widths of interval estimates using this method are practically similar to those using the asymptotic variance estimators. Our results are useful, both theoretically and practically, as the asymptotic variance estimators are simpler and require less time to calculate compared to those generated by bootstrap methods, as often previously advocated by researchers. These findings suggest that applied researchers can often be comfortable in undertaking inferences about the inequality of a well-being variable using the Gini coefficient employing asymptotic variance estimators that are not difficult to calculate, irrespective of whether the sample data are obtained under a complex survey or a simple random sample design.



Gini coefficient, complex survey, variance estimation, asymptotic, body mass index, hemoglobin, monte carlo simulation, demographic and health survey, Bangladesh