Permutation in regression revisited: the residual route proven optimal theoretically

Kim, Soojeong

Permutation in regression revisited: the residual route proven optimal theoretically

dc.contributor.author	Kim, Soojeong
dc.contributor.supervisor	Zhang, Xuekui
dc.date.accessioned	2025-09-08T21:14:18Z
dc.date.available	2025-09-08T21:14:18Z
dc.date.issued	2025
dc.degree.department	Department of Mathematics and Statistics
dc.degree.level	Master of Science MSc
dc.description.abstract	The assumptions for classical linear-model are never met in practice. Recent evidence shows that such violations inflate Type I error as sample size grows, while simple permutation tests can restore control in single-predictor regressions. Yet in multiple regression, practitioners face a confusing menu of residual- and raw-data shuffling schemes, with little theory to guide the choice. We develop the first closed-form, finite-sample comparison of six widely used permutation strategies for a coefficient of interest in the presence of nuisance covariates. We derive exact means and variances of the permuted estimator, and we establish its asymptotic distribution. Based on this, we discuss Type I error and power of each permutation strategies, as well as how are these affected by corelation between covariances and the focal predictor. The analysis reveals that (i) the three residual-based schemes—permuting response residuals, predictor residuals, or both—are identically distributed; they match the true null up to second moments in finite samples and match in distribution as n → ∞, guaranteeing valid Type I error control. (ii) Raw-data permutations behave unpredictably: shuffling the response is overly conservative, shuffling the predictor is liberal when covariates are correlated, and shuffling both can be unstable. Closed-form results quantify how predictor–covariate correlation, error variance, and sample size drive these patterns and specify the Monte-Carlo sample size needed for accurate p-values. Extensive simulations confirm the theory: residual permutations maintain nominal error and retain power comparable to the classical linear model when assumptions hold, whereas raw-data schemes either inflate or deflate Type I error and sacrifice power. The work reconciles decades of ad-hoc practice, provides actionable guidelines, and equips analysts with a principled, computationally feasible framework for exact inference in large-sample regression.
dc.description.embargo	2026-08-14
dc.description.scholarlevel	Graduate
dc.identifier.uri	https://hdl.handle.net/1828/22734
dc.language	English	eng
dc.language.iso	en
dc.rights	Available to the World Wide Web
dc.subject	Permutation Test
dc.subject	Regression Analysis
dc.subject	Residual-maker Matrices
dc.subject	Type I Error Rate
dc.subject	Random Permutation
dc.title	Permutation in regression revisited: the residual route proven optimal theoretically
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kim_Soojeong_MSc_2025.pdf
Size:: 4.33 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)