Code duplication and reuse in Jupyter notebooks

dc.contributor.authorKoenzen, Andreas Peter
dc.contributor.supervisorStorey, Margaret-Anne
dc.contributor.supervisorErnst, Neil A.
dc.date.accessioned2020-09-22T02:33:40Z
dc.date.available2020-09-22T02:33:40Z
dc.date.copyright2020en_US
dc.date.issued2020-09-21
dc.degree.departmentDepartment of Computer Scienceen_US
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractReusing code can expedite software creation, analysis and exploration of data. Expediency can be particularly valuable for users of computational notebooks, where duplication allows them to quickly test hypotheses and iterate over data, without creating code from scratch. In this thesis, I’ll explore the topic of code duplication and the behaviour of code reuse for Jupyter notebooks; quantifying and describing snippets of code and explore potential barriers for reuse. As part of this thesis I conducted two studies into Jupyter notebooks use. In my first study, I mined GitHub repositories, quantifying and describing code duplicates contained within repositories that contained at least one Jupyter notebook. For my second study, I conducted an observational user study using a contextual inquiry, where my participants solved specific tasks using notebooks, while I observed and took notes. The work in this thesis can be categorized as exploratory, since both my studies were aimed at generating hypotheses for which further studies can build upon. My contributions with this thesis is two-fold: a thorough description of code duplicates contained within GitHub repositories and an exploration of the behaviour behind code reuse in Jupyter notebooks. It is my desire that others can build upon this work to provide new tools, addressing some of the issues outlined in this thesis.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/12137
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectJupyteren_US
dc.subjectcomputational notebooksen_US
dc.subjectcode duplicationen_US
dc.subjectcode clonesen_US
dc.subjectcode reuseen_US
dc.subjectdata analysisen_US
dc.subjectdata explorationen_US
dc.subjectexploratory programmingen_US
dc.titleCode duplication and reuse in Jupyter notebooksen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Koenzen_Andreas_MSc_2020.pdf
Size:
9.23 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: