Permutation in regression revisited: the residual route proven optimal theoretically
dc.contributor.author | Kim, Soojeong | |
dc.contributor.supervisor | Zhang, Xuekui | |
dc.date.accessioned | 2025-09-08T21:14:18Z | |
dc.date.available | 2025-09-08T21:14:18Z | |
dc.date.issued | 2025 | |
dc.degree.department | Department of Mathematics and Statistics | |
dc.degree.level | Master of Science MSc | |
dc.description.abstract | The assumptions for classical linear-model are never met in practice. Recent evidence shows that such violations inflate Type I error as sample size grows, while simple permutation tests can restore control in single-predictor regressions. Yet in multiple regression, practitioners face a confusing menu of residual- and raw-data shuffling schemes, with little theory to guide the choice. We develop the first closed-form, finite-sample comparison of six widely used permutation strategies for a coefficient of interest in the presence of nuisance covariates. We derive exact means and variances of the permuted estimator, and we establish its asymptotic distribution. Based on this, we discuss Type I error and power of each permutation strategies, as well as how are these affected by corelation between covariances and the focal predictor. The analysis reveals that (i) the three residual-based schemes—permuting response residuals, predictor residuals, or both—are identically distributed; they match the true null up to second moments in finite samples and match in distribution as n → ∞, guaranteeing valid Type I error control. (ii) Raw-data permutations behave unpredictably: shuffling the response is overly conservative, shuffling the predictor is liberal when covariates are correlated, and shuffling both can be unstable. Closed-form results quantify how predictor–covariate correlation, error variance, and sample size drive these patterns and specify the Monte-Carlo sample size needed for accurate p-values. Extensive simulations confirm the theory: residual permutations maintain nominal error and retain power comparable to the classical linear model when assumptions hold, whereas raw-data schemes either inflate or deflate Type I error and sacrifice power. The work reconciles decades of ad-hoc practice, provides actionable guidelines, and equips analysts with a principled, computationally feasible framework for exact inference in large-sample regression. | |
dc.description.embargo | 2026-08-14 | |
dc.description.scholarlevel | Graduate | |
dc.identifier.uri | https://hdl.handle.net/1828/22734 | |
dc.language | English | eng |
dc.language.iso | en | |
dc.rights | Available to the World Wide Web | |
dc.subject | Permutation Test | |
dc.subject | Regression Analysis | |
dc.subject | Residual-maker Matrices | |
dc.subject | Type I Error Rate | |
dc.subject | Random Permutation | |
dc.title | Permutation in regression revisited: the residual route proven optimal theoretically | |
dc.type | Thesis |