Beyond conventional P-values: Addressing statistical challenges in big data

dc.contributor.authorZhang, Jing
dc.contributor.supervisorZhang, Xuekui
dc.contributor.supervisorTsao, Min
dc.date.accessioned2026-01-23T21:46:10Z
dc.date.available2026-01-23T21:46:10Z
dc.date.issued2026
dc.degree.departmentDepartment of Mathematics and Statistics
dc.degree.levelMaster of Science MSc
dc.description.abstractDo larger sample sizes lead to higher false positive rates in statistical analysis? The answer provided by ChatGPT 4o is ’no’, which is a common opinion shared by many statisticians. However, empirical evidence from large datasets analyses, such as those from biobanks and single-cell genomics, challenges this conclusion. Com- mon practice assesses both p-values and effect sizes to mitigate the risk of identifying spurious effects in large samples. Nonetheless, the need to adjust p-values in these contexts is unaddressed, which motivated this investigation. We found that common beliefs and practices are incorrect in real-world data analysis, since theoretical assumptions are always violated. Growing sample sizes can amplify violation impacts, inflating false positive rates. Using a simulation study, we provide examples to support our statement and illustrate a permutation-based remedy. This work’s intended contribution is to heighten awareness within our community about the pressing need to reevaluate standard statistical methods in analyzing datasets with huge sample sizes, thereby inspiring further substantial efforts to tackle this emerging challenge of the big data era.
dc.description.scholarlevelGraduate
dc.identifier.urihttps://hdl.handle.net/1828/23071
dc.languageEnglisheng
dc.language.isoen
dc.rightsAvailable to the World Wide Web
dc.subjectBig data
dc.subjectHypothesis testing
dc.subjectInflated type I error
dc.subjectViolated model assumptions
dc.titleBeyond conventional P-values: Addressing statistical challenges in big data
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_Jing_MSc_2026.pdf
Size:
629.49 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: