Theses (Computer Science)

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 569
  • Item
    QPLEX: Towards the Integration of Platform Agnostic Quantum Computation into Combinatorial Optimization Software
    (2024) Giraldo Botello, Juan Fernando; Müller, Hausi A.; Villegas Machado, Norha Milena
    Quantum computing has the potential to surpass the capabilities of current classical computers when solving complex problems. Combinatorial optimization has emerged as a pivotal target area for quantum computers, as problems in this field are renowned for their complexity and resource-intensive nature. Moreover, these challenges play a critical role in various industrial sectors, including logistics, manufacturing, and finance. This thesis explores the integration of quantum computation into classical software tools as a means to potentially address combinatorial optimization problems more efficiently and effectively. This work introduces QPLEX, a Python software library that enables practitioners and researchers to implement the general mathematical formulation of a given combinatorial optimization problem once and execute it seamlessly on multiple quantum devices using various quantum algorithms. This software solution automatically adapts a general optimization model to the specific instructions utilized by the target quantum device’s SDK. It offers a versatile execution workflow capable of running gate-based hybrid quantum-classical algorithms for combinatorial optimization in a platform-agnostic manner. This approach reduces the programming overhead required for modeling and experimenting with combinatorial optimization solutions. Within this manuscript, we address and introduce the various aspects associated with the development of QPLEX in a clear and comprehensive manner. These aspects encompass the quantum algorithms and quantum hardware available in the library, along with QPLEX’s system design and implementation. Additionally, we provide a guide on how to use the library and conduct a thorough evaluation of the software solution within a specific use case as part of this thesis.
  • Item
    The Iceberg Theory on the Shared Understanding of Non-Functional Requirements in Continuous Software Engineering
    (2024) Werner, Colin; Damian, Daniela
    While software is largely considered to be heavily associated with technology, it is ultimately software developers that design, discuss, architect, write, test, re-write, and maintain the code that is compiled into the respective software. These humans are, after all, not perfect, and for the most part are not working in isolation from one another. Thus building a shared understanding amongst a group of software developers, including requirements, is key to ensuring downstream software activities are efficient and effective. Non-functional requirements (NFR), which include performance, availability, and maintainability, are vitally important to overall software quality and that the software fulfills its intended purpose. Research has shown that NFRs are, in practice, poorly defined and difficult to verify, especially in agile environments. A lack of attention to NFRs could potentially derail a software project. Many an organization frequently incurs technical debt by making trade-offs between the timely delivery of promised software features and rigorous system design that incorporates sufficient attention for vital NFRs. The software industry has always sought to shorten the time of delivery of systems and features, including the adoption of iterative and incremental methods, in particular agile methods, which have become the norm. Practices such as Continuous Integration, which relies on automatically testing newly integrated code, inspired the automation of other activities within software development, allowing the whole development process to become more continuous. This has led to a trend that has been called continuous software engineering (CSE). CSE relies on automated and fast releases of new versions, delivering new features quickly to users. However, feature development is usually driven by functional requirements (FR), while such fast delivery frequently means that non-functional requirements receive less attention. Previous work has pointed out that NFRs are frequently neglected in agile development, and indeed, little work exists that has explored NFRs in the context of CSE. A major complication of an NFR is that it relates to an entire system's architecture, which is problematic for two reasons. First, evaluating the impact of frequent updates, which comes with a continuous software engineering process, on NFRs is very challenging. Second, it can be challenging for all developers in a project to have a shared and common understanding of a system's architecture, in particular for very large systems. In this dissertation, I describe a multi-year, multi-case study to empirically investigate how four organizations, for which NFRs are paramount to their business survival, build, manage, and maintain the shared understanding of NFRs in their continuous practices. My research goal is to develop a deep and rich understanding of the relationship between an organization and their shared understanding of NFRs in CSE. Through the results and insights from this in-depth research, I developed the Iceberg Theory on the complex and intricate relationship between a shared understanding of NFRs and CSE. The theory includes a classification of shared understanding, a lack of a shared understanding, nine practices an organization may use to build a shared understanding, in addition to the associated challenges and the triggers that prompted an organization to build said shared understanding.
  • Item
    Secure and Privacy-preserving Data Aggregation in Internet of Vehicles
    (2024) Liu, Rui; Pan, Jianping
    In Internet of Vehicles (IoV), crucial data is aggregated to support the applications for automatic driving, intelligent transportation and smart cities. It is crucial to carefully address certain challenges in this process, particularly regarding security and privacy. In this dissertation, we first target a representative IoV data aggregation scenario, fine-grained air quality monitoring. The major challenges we focus on include: a) the sensory data provided by vehicles usually vary in quality; b) there is a significant difference in traffic volumes of streets or blocks, which leads to a data sparsity problem; and c) the original sensory data, vehicle identities, and trajectories face risks of exposure. To address these issues, we propose a truth discovery algorithm incorporating multiple correlations, and extend it to a privacy-preserving framework, EAirQ. EAirQ relies on a traditional end-to-end data aggregation architecture. Designing a new architecture specifically for vehicular networks may hold significant value. Thus, we introduce a privacy-preserving two-layered architecture with vehicle clusters. Instead of focusing on a specific application, we present how this architecture can be well adopted in a general distributed machine learning scenario. We named this part of the work CRS. CRS not only protects the local data, the identities and trajectories of vehicles, but also ensures the accuracy of aggregated learning models by handling packet loss in the application layer. We further work on eliminating the limitations of the proposed two-layered architecture in the following three aspects: a) to provide fast and easy verification of messages within a cluster; b) to preserve vehicle privacy without adopting the pseudonym technique; c) to consider the adversarial behaviors of vehicles and enhance the security. Our solution introduces a novel concept, data approval, based on the Schnorr signature scheme. This part of the work, named SADA, meets more security requirements and is lightweight for vehicles. In addition to exploring new solutions to preserve the privacy of vehicle identities and trajectories, we also pay attention to the latest industry standards. This part of the work focuses on tackling the challenge of certificate provisioning in the latest solution to satisfy the anonymous communication requirement in IoV. We propose a non-interactive approach, named NOINS, empowering vehicles to generate short-term key pairs and anonymous implicit certificates on their side. This new paradigm introduces the possibilities for many extensions and applications.
  • Item
    Algorithms for prediction of RNA secondary structure: coronavirus pseudoknots via Shapify & CParty
    (2024-01-30) Trinity, Luke; Jabbari, Hosna; Stege, Ulrike
    RNA molecules play a vital role in cellular processes, and many possess functional structures. Due to the complex nature of experimental methods to detect RNA structure, computational tools to predict RNA structure formation are invaluable for building comprehensive knowledge. We seek to predict RNA structure algorithmically, with a focus on the following concepts from the literature: (1) Minimum Free Energy (MFE) methods, (2) the hierarchical folding hypothesis, and (3) partition function ensemble approaches. The MFE framework is an RNA folding hypothesis stating that each RNA molecule folds into the structure with the minimum free energy. In conjunction with MFE, we employ the biologically motivated hierarchical folding hypothesis, stating that an RNA molecule will first fold once (initial fold), before a subsequent folding may occur that lowers the structure's free energy. The accuracy of MFE and hierarchical folding methods can be improved by effective incorporation of known RNA structure information such as experimental reactivity data. We introduce Shapify, an algorithm incorporating experimental data within hierarchical RNA folding prediction. Shapify receives SHAPE data as input to guide RNA structure prediction, allowing the unification of multiple experimental results to determine structure-function patterns. The time complexity of Shapify is O(N^3) time, where N is the RNA sequence length, enabling faster prediction compared with other methods that also handle a complex RNA structure class. We then consider the partition function model, based on the MFE approach, where we compute the sum of free energies for each possible RNA structure in the ensemble at equilibrium. The likelihood of any particular RNA structure occurring can then be determined based on the energy of the structure itself relative to the total energy in the system. Currently, partition function methods are restricted to predicting a limited set of RNA structures, because existing algorithms that allow complex RNA structures are too slow, at best O(N^5) time complexity. We introduce CParty, an O(N^3) time complexity partition function algorithm that includes complex RNA structures in the ensemble. The development of CParty's recursive decomposition schemes was non-trivial to integrate within the algorithmic implementation. By providing an input structure to algorithm CParty, we compute a `conditional' partition function, enabling probabilistic calculation that advances understanding of RNA structure formation patterns. In this dissertation, we (1) incorporate partial RNA structure information into hierarchical secondary structure prediction via Shapify to understand important secondary structure motifs affecting viral function, (2) design and implement CParty, a conditional partition function algorithm to handle complex RNA structures, and (3) apply these and other related algorithms to provide RNA structural information for COVID-19 therapeutic targets. Here, we pinpoint key secondary structure folding motifs in our quest to predict functional RNA structures. Our hierarchical folding algorithms push the frontier of prediction accuracy for functional RNA secondary structures, contributing to coronavirus treatments.
  • Item
    Formal Algebraic Reasoning About Compression Function Security
    (2024-01-23) Javar, Zahra; Kapron, Bruce
    Cryptographic hash functions are fundamental in cryptographic constructions, as they transform variable-length input into fixed-length output while maintaining essential security properties like collision resistance, preimage resistance, and indifferentiability from a random oracle (RO). Creating efficient hash functions with provable security has long posed a challenge. Security proofs for hash functions usually fall under the random oracle model or the ideal cipher model, which assumes access to an ideal primitive like a truly random function or permutations. This research endeavors to establish a systematic approach for analyzing the security of hash functions suitable for automated verification and function generation within both ideal models. Building upon prior work [25], which employed an algebraic framework known as Linicrypt[8], primarily for analyzing collision-resistant hash functions in the random oracle model, we extend our efforts in two key directions. We first introduce a simple and easily verifiable property of Linicrypt programs that characterizes preimage awareness, a security property introduced by Dodis, Ristenpart, and Shrimpton [13] who also demonstrate its utility in the construction of indifferentiable hash functions. We also illustrate how this characterization can be efficiently automated and provide an example by enumerating preimage-aware compression functions that employ two random oracle calls. This includes several functions that Dodis et al. previously proved to be preimage aware through manual methods. Next, we broaden the Linicrypt framework, originally proposed in the random oracle setting, to encompass hash function security in the ideal cipher model. Within this context, we delineate collision- and second-preimage-resistance properties using linear-algebraic conditions on Linicrypt programs. We also introduce an efficient algorithm for determining program compliance with these conditions. As an application, we delve into the case of block cipher-based hash functions as proposed by Preneel, Govaerts, and Vandewall [32] and establish that our characterization encapsulates the semantic analysis of PGV presented by Black et al.[5]. Additionally, our research further extends into the ideal cipher model to analyze group-2 compression functions, a category introduced in the well-known work[4]. These are compression functions which are not collision-resistant themselves, but pro- duce collision-resistant hash functions when iterated by the Merkle-Damgard trans- formation. We also provide a comprehensive characterization of collision-resistant double block length compression functions within the ideal cipher model.
  • Item
    A Language-Agnostic Compression Framework for the Bitcoin Blockchain
    (2024-01-18) Papanastassiou, Orestes; Thomo, Alex
    The surging interdisciplinary interest in Bitcoin within both academic and enterprise realms underscores the need for a versatile framework that can efficiently transform the raw Bitcoin blockchain data into a streamlined format suitable for high-performance data analysis. This research proposes an abstract framework designed to convert this data into a compact, normal form, facilitating its utilization across various programming languages and data analysis tools. Our approach centers on the development of a highly efficient, language-agnostic Application Programming Interface (API). This API is designed to be implementable by any Turing-complete programming language, ensuring broad accessibility and usability. Beyond mere data extraction, our framework extends its capabilities to assemble the Bitcoin user transaction graph, a fundamental resource for downstream analysis, including network analysis, forensics, and pattern detection. To ensure compatibility and ease of integration with an array of programming languages and data analysis software, we export the processed data to the language-agnostic HDF5 file format, recognized and supported by mainstream data analysis tools. This strategic choice empowers researchers and analysts to harness the power of Bitcoin blockchain data without being constrained by software dependencies. Furthermore, to demonstrate the practicality and efficiency of our proposed framework, we present a fully functional CPython implementation as a compelling proof-of-concept. This implementation showcases the feasibility and real-world applicability of our solution, opening doors to a wide range of data-driven investigations and applications within the realm of Bitcoin research. As the interest in Bitcoin continues to evolve and expand, our research offers a comprehensive solution to the challenges of handling and analyzing the vast data included in the blockchain. By providing an accessible, language-agnostic extraction and compression framework, we contribute to the democratization of Bitcoin blockchain data analysis, enabling both novices and experts to uncover valuable insights and drive innovations within this field.
  • Item
    Brain network similarity using k-cores
    (2024-01-03) Ferdous, Kazi Tabassum; Thomo, Alex; Srinivasan, Venkatesh
    Autism Spectrum Disorder (ASD) is extensively studied by medical practitioners, health researchers, and educators. ASD symptoms appear in early childhood, within the first two years of life, but diagnosing it remains challenging due to its complex and diverse nature. Nevertheless, early diagnosis is crucial for effective intervention. Traditional methods rely on behavioral observations, while modern approaches involve applying machine learning (ML) to brain networks derived from fMRI scans. Limited explainability of these advanced techniques poses a significant challenge in gaining clinician’s trust. This thesis builds on recent works that design explainable approaches for ASD diagnosis from fMRI data preprocessed as graphs. Our research makes three key contributions. Firstly, we demonstrate that a simple approach based on viewing graphs as tables and using tabular data classifiers can achieve the same performance as state-of-art, explainable graph theoretic methods. Secondly, we provide evidence that by adding higher-order connectivity information as attributes does not improve their performance. Most importantly, we show why the classification of brain networks is challenging by demonstrating the similarity between graphs belonging to individuals with ASD and those without, using a novel k-core based approach.
  • Item
    A Collaborative Health Adherence Optimization System
    (2023-12-22) Lengwe, Maybins Douglas; Weber, Jens; Perin, Charles
    Non-adherence to prescribed medication negatively impacts healthcare sector and over 50% of patients fail to adhere to their prescribed medication regimens. The ramifications of non-compliance are extensive, impacting both healthcare systems and patients alike. Within healthcare systems, non-adherence has been identified as a primary factor contributing to approximately half of all hospital admissions related to medication, incurring annual costs in the billions of dollars. For patients, nonadherence manifests in heightened risks or severity of ailments, potential relapses, and, in severe cases, mortality. The underlying causes of non-adherence are diverse, with unintentional omissions, prominently attributed to forgetfulness, accounting for over one-third of all instances of non-adherence. A viable remedy often involves leveraging reminder systems, which have demonstrated relative effectiveness, particularly when complemented with human support. This dissertation aims to address the issue of non-adherence resulting from challenges in integrating prescriptions into the demanding, active lives of patients. The study delves into ascertaining optimal approaches to support self-management of prescriptions through the utilization of calendars. The research comprises three studies assessing the usability of calendars for effective medication management. These studies encompass; formalizing prescriptions through temporal reasoning frameworks (Study 1); exploring diverse methods of presenting medication information within calendars alongside other events (Study 2); and evaluating the calendar prototype to gauge its efficacy in facilitating medication management (Study 3). Study 1 resulted in the proposition of Simple Temporal Networks (STP) in formalizing prescriptions. Insights from Study 2 informed the proposition of design guidelines encompassing aspects such as (i) employing a familiar design, (ii) facilitating patients’ self-reflection on medication adherence, (iii) ensuring medications do not clutter the calendar interface, (iv) empowering users to control the privacy of medication information within the calendar, and (v) enabling the sharing of medication-only calendars. Study 3 validated the usability of the calendar and its efficacy to support medication management, personal reflection, and schedule refinements. The findings from these studies underscore the potential for calendars to be designed with both expressiveness and efficiency to support medication prescriptions effectively. Additionally, patients utilizing multiple medications expressed receptiveness toward adopting calendars as a means of managing their medication regimens.
  • Item
    Reading Numeric Data Tables: Viewer Behavior and the Effect of Visual Aids
    (2023-12-21) Ji, YongFeng; Nacenta, Miguel; Perin, Charles
    Data tables are one of the most common ways in which people encounter data. Although mostly built with text and numbers, data table representations have strong spatial components, and often exhibit visual elements meant to facilitate their reading. there is an empirical knowledge gap on how people read and use tables and how different visual aids affect people’s ability to use them. In this work, I seek to address this gap through a series of empirical studies. I asked participants to repeatedly perform five different tasks with tables in four conditions where the table representations consisted of Plain tables, tables with Zebra striping, with cell background color encoding the value, and with background bar length in a cell encoding its value. I analyzed the gathered data in multiple ways to characterize human behavior when accessing tables and to assess the benefits of the different visual aids.
  • Item
    Low-Latency Live Video Streaming over Low-Earth-Orbit Satellite Networks
    (2023-12-21) Zhao, Jinwei; Pan, Jianping
    Video streaming currently dominates Internet traffic with an estimated share of 71% of all mobile data traffic, and it is forecast to increase to 80% in 2028. While 5G services are expanding beyond mobile devices to enterprises and Internet of Things (IoT) applications, the demand for service continuity and the surge in mobile data traffic is expected to further drive the evolution and expansion of network architectures into non-traditional areas. Space-Air-Ground Integrated Networks (SAGINs) are promising to transform future Internet connectivity by merging the capabilities of space, aerial, and terrestrial networks. Notably, Starlink's deployment of thousands of low-Earth-orbit (LEO) satellites provides global Internet coverage with significantly lower latency and higher bandwidth than conventional geostationary-equatorial-orbit (GEO)-based or medium-Earth-orbit (MEO)-based satellite networks, particularly enhancing connectivity in remote and rural areas. Yet, while the performance of video-on-demand (VoD) services over Starlink is on par with terrestrial networks, challenges remain in low-latency live video streaming, especially given the dynamic and fluctuating network latency in Starlink networks. In this thesis, we first conduct a comprehensive measurement study of Starlink's performance across different protocol layers, from access networks to live video streaming, at multiple geographical Starlink installations, including one where inter-satellite links (ISLs) are utilized in practice. Then, we present a novel adaptive video bitrate algorithm that utilizes the unique characteristics of Starlink networks to improve the quality of experience (QoE) of live video streaming. Finally, we discuss a joint decision-making solution with multipath QUIC protocol to address the multipath video streaming problem, which can further improve the video streaming experience.
  • Item
    Step-by-Step: Quantum Walk Implementations and Visualizations
    (2023-12-20) Jordon, Addie; Stege, Ulrike
    Quantum walks (QWs) are the quantum analogue to classical random walks. I provide background and motivation for quantum walk algorithms and their applications. I present new visualizations for one-, two-, and three-dimensional quantum walks and explain how the visualizations can help teach quantum concepts such as superposition and interference. I contribute the Quantum Walk Visualization (QWalkVis) application for visualizing quantum walks. Users can select the dimensions, number of states, and number of steps in the walk, and generate probabilistic plots on-the-fly. Users are able to view the probabilities for each step of the walk one by one. QWalkVis aims to aid students in learning about quantum walks and foundational quantum concepts through interactive exploration. I present use cases of QWalkVis for both education and research purposes. I also propose a new application of quantum walks for creating noise distributions. Noise distributions are essential for noise sampling used in differential privacy, and help keep query data private.
  • Item
    Energy-Aware Inter-Data Center Virtual Machine Migration over Elastic Optical Networks
    (2023-12-15) Fatima, Salehnejad Amri; Pan, Jianping
    The rapid growth of data processing demands in large-scale data centers (DCs) has increased the network's brown energy (BE) consumption. The BE is generated from fossil fuels and has an adverse effect on the environment. Since most DCs are now powered by both BE and renewable energy (RE), migrating workloads from DCs with insufficient RE to DCs with sufficient RE can decrease the total BE consumption in the network. However, selecting a destination DC under the dynamic nature of the underlying network and without any prior information is challenging. In addition, although migration can help reduce BE consumption, it comes with an additional cost due to using network devices for the migration. This thesis focuses on minimizing the total cost, which includes BE consumption costs, migration costs, and optical network device costs. To achieve this goal, we optimize both the DC selection process and the efficient transfer of virtual machines (VMs) between DCs. In Chapter 3, we formulate the DC selection as a two-stage Multi-Armed Bandit (MAB) problem. First, we define an arm as a destination DC, and select a DC with the lowest power consumption and available RE. In the second stage, we define the arm as a path, and we implement MAB to find a path with the lowest delay from the source to the selected destination DC. Proposing and utilizing the modified sliding-window lower confidence bound (MSW-LCB), we estimate the lowest power consumption among DCs and the lowest migration cost at each round to find a proper destination DC and path, respectively. Additionally, we adopt optical grooming techniques to minimize the cost of optical network devices used during VM transfer. Furthermore, to validate the effectiveness of our algorithm, we conduct an evaluation using three different real-world datasets to provide different inputs for the algorithm. This evaluation is assessed on USNET topology. In comparison to the sliding-window lower confidence bound (SW-LCB) and two other MAB-based algorithms, namely the knapsack–based upper confidence bound (KUBE) and $\epsilon$-Greedy, the MSW-LCB approach reduces the total cost by about 15\%, 23\%, and 34\%, respectively, while having low regret. The MSW-LCB regret and migration costs are demonstrated to be around 15\% lower than SW-LCB, respectively. We also evaluate our algorithm in two scenarios—one with and one without optical grooming—to show the efficiency of optical grooming in decreasing optical network costs. The results indicate a 12\% drop in network costs. The results of this evaluation provide valuable insights. This indicates that our algorithm is capable of handling real-world data and effectively addressing the challenges associated with inter-DC VM migration.
  • Item
    K-edge Connected Components in Large Graphs: An Empirical Analysis
    (2023-12-08) Sadri, Hanieh; Thomo, Alex,; Srinivasan, Venkatesh
    Graphs play a pivotal role in representing complex relationships across various domains, such as social networks and bioinformatics. Key to many applications is the identification of communities or clusters within these graphs, with k-edge-connected components emerging as an important method for finding well-connected communi- ties. Although there exist other techniques such k-plexes, k-cores, and k-trusses, they are known to have some limitations. This study delves into four existing algorithms designed for computing maximal k-edge-connected subgraphs. We conduct a thorough study of these algorithms to understand the strengths and weaknesses of each algorithm in detail and propose algorithmic refinements to optimize their performance. We provide a careful implementation of each of these algorithms, using which we analyze and compare their performance on graphs of varying sizes. Our work is the first to provide such a direct experimental comparison of these four methods. Finally, we also address an incorrect claim made in the literature about one of these algorithms.
  • Item
    Balancing Autonomy and Persona: Investigating Developer Preferences for Effective Human-Bot Interaction
    (2023-12-07) Ghorbani, Amir; Ernst, Neil
    Software bots play a pivotal role in collective software development, promising enhanced productivity. While prior research has highlighted that excessive bot communication can lead to developer irritation, the broader array of human-bot collaboration attributes influencing developer preferences and their consequences remain less clear. This thesis delves into the key characteristics that shape developers' preferences for interactions between humans and bots, focusing on the context of GitHub pull requests. Employing an exploratory sequential approach, we conducted interviews in Phase I, followed by a vignette-based survey in Phase II. The current thesis primarily reports on Phase II and its findings. A custom-designed vignette-based instrument was employed to survey open-source developers, recruiting participants from third-year software engineering students and the Prolific platform. Rigorous screening procedures ensured data collection from eligible participants only. The study's results reveal a prevalent inclination among developers towards personable bots that demonstrate limited autonomy. Interestingly, the preferences appear to be influenced by developers' experience levels, with more seasoned developers exhibiting a preference for bots possessing greater autonomy. These empirical insights advocate for bot developers to enhance configurability options, allowing developers and projects to tailor bot behaviors according to individual preferences and project contexts.
  • Item
    Towards more Inclusive Software: A Large Scale Analysis of Inclusiveness from User Feedback
    (2023-11-23) Arony, Nowshin Nawar; Damian, Daniela
    In an era of rapidly evolving software usage, addressing the diverse needs of users from around the world has emerged as a critical challenge. Diverse users bring forth diverse requirements, encompassing factors such as human values, ethnicity, culture, educational background, technical expertise, preferences, personality traits, emotional states, and mental and physical considerations. Among the various aspects, inclusiveness, representing a core human value, is often unknowingly neglected during software development, leading to user dissatisfaction. Online platforms, such as forums and social media, offer users a space to express their opinions regarding a software. As a result, in recent times, software companies have recognized these platforms as a source of user feedback. Therefore, in this study, I leverage user feedback from three popular online sources: Reddit, Google Play Store, and Twitter (now known as X) to explore the inclusiveness related concerns from end users. I collected user feedback from the three sources for 50 of the most popular apps in the world. The 50 apps are selected from 5 types of software: business, entertainment, financial, e-commerce, and social media. I employed a Socio-Technical Grounded Theory approach and manually analyzed 23,107 posts across the three sources. Through this process, I identified 1,211 inclusiveness related posts. The research resulted in the development of a taxonomy for inclusiveness comprising 6 major categories: Fairness, Technology, Privacy, Demography, Usability, and Other Human Values. Along with that, I investigated the process of automatically identifying inclusiveness and non-inclusiveness related posts using 5 popular deep learning-based models. Upon experimenting with five deep learning models, I found that GPT-2 performed best on Reddit, achieving an F1-score of 0.838, BERT on the Google Play Store with an F1-score of 0.849, and BART on Twitter with an F1-score of 0.930. My research provides a detailed view of inclusiveness-related user feedback, enabling software practitioners to gain a more holistic understanding of such user concerns. The insights from this thesis can guide software organizations to increase awareness and address the inclusiveness aspects relevant to their product from an end-user perspective. I further provided implications and suggestions that can be used to bridge the gap between user values and software so that software can truly resonate with the varied and evolving needs of diverse users.
  • Item
    Geny: Genotyping Tool for Allelic Decomposition of Killer-cell Immunoglobulin-Like Receptor Genes
    (2023-09-28) Ghezelji, Mazyar; Numanagić, Ibrahim
    The accurate genotyping of Killer Immunoglobulin-like Receptors (KIR) plays a pivotal role in enhancing our comprehension of immune responses, disease correlations, and the advancement of personalized medicine. This thesis delves into the intricacies of KIR genotyping methodologies and introduces ”Geny,” an innovative computational tool formulated for precise allele-level genotyping. Through a comprehensive evaluation, Geny consistently demonstrates superior performance compared to existing tools, notably surpassing T1K, especially within crucial gene segments. The tool’s resilience in addressing both fundamental and advanced genotyping tasks highlights its robustness in the face of various challenges. The exceptional precision demonstrated by Geny in identifying critical genes positions it as a valuable resource for advancing the field of patient-centric medicine. By contributing to the evolution of KIR genotyping, this study not only establishes a new benchmark but also highlights the continuing requirement for innovative approaches. We emphasize Geny’s remarkable capabilities, recognizing the ever-evolving landscape of genomics. Furthermore, we outline potential future directions, encompassing the detection of gene fusions and the enhancement of mutation identification. These insights pave the way for KIR genotyping to play a pivotal role in shaping the landscape of modern medical research.
  • Item
    ABLOC: Accountable Blockchain Logging for Offline Care
    (2023-09-22) Krysl, Joseph; Weber, Jens; Price, Morgan
    Retroactive security is important to cyber security; it is used to hold people account- able for their actions [1]. In the medical world, it is difficult to assign proper privileges, as they can be too wide and vulnerable to misuse, or too narrow [1, 2, 3, 4] restricting access to patient data [2, 4]. Clinicians are often given wide privileges to ensure they can access the data required to care for patients [2]. Logging is relied upon to find breaches of policies [2, 3, 4, 5] but, without reliable logs, changes can be made to the data in the EMR without anyone knowing [6]. Blockchain-based logging has been proposed but requires a stable internet connection [7]. This thesis presents Account- able Blockchain Logging for Offline Care (ABLOC), a Directed Acyclic Graph (DAG) based blockchain, that is combined with a gossip protocol to improve the forensic re- liability and accountability of logs. ABLOC can tolerate participating realms, the internet space that houses one or multiple pieces of medical software, going offline, recovering, and resynchronizing with the rest of the network. The ABLOC system receives log hashes, summarizes them, and shares the summary with different realms on the ABLOC network. This work presents the necessary background information, discusses the design of the ABLOC system, and evaluates the proposed system the- oretically and with a prototype. The proposed system has promising results in the scalability tests performed.
  • Item
    Application of data augmentation techniques in metabolomics correlation network
    (2023-09-20) Emadi, Mina; Jabbari, Hosna
    Motivation: Metabolomics stands as a beacon in modern biological research, enabling a deep understanding of the intricate play of small molecules within organisms. The role of these small molecules prove to be pivotal in disease detection, progression monitoring, and tailoring therapeutic strategies. Correlation networks, which depict the intricate interdependencies between metabolites, form the backbone of these metabolomic studies. However, the quest for precision in these networks is often hampered by the lack of expansive, high-quality datasets- a recurring challenge in clinical metabolomics. While machine learning has transformed numerous disciplines by extracting patterns from vast datasets, its application to typically smaller clinical metabolomics datasets remains suboptimal. This gap between the potential of machine learning and the constraints of available data forms the crux of our study. results: Through this research, we pioneered the implementation of two innovative data augmentation techniques: pairwise mean augmentation and noise introduction. These techniques effectively augmented the scale and variability of our datasets, enhancing the reliability of the resulting correlation networks. Furthermore, we introduced the ``Strongly Correlated Network'' , a novel network construction algorithm. Simplifying network complexities while retaining critical interconnections, our method, when juxtaposed with traditional correlation networks, manifested superior reliability and robustness. Importantly, we underscored the transformative potential of data augmentation techniques in fortifying correlation networks, especially when navigating the shoals of limited sample sizes.
  • Item
    A Music Virtual Assistant Based on Machine Learning
    (2023-09-01) Shameli Derakhshan, Shadan; Tzanetakis, George
    This study focuses on the development of a music chatbot designed to address a gap in the digital music services market. The chatbot provides users with a seamless and interactive experience, enabling them to engage in music-related interactions. The enhancement that differentiates this chatbot from other standard UIs is while standard UIs of popular music streaming platforms like Spotify, Tidal, and Apple Music offer access to a vast library of songs and user-friendly interfaces, they may lack personalized and engaging interactions. That is, unlike traditional UIs where users navigate through menus and search boxes, the chatbot engages users in a conversation, allowing them to interact naturally using text which can later be improved to voice input. This creates a more human-like and enjoyable experience. The Music4all dataset is chosen in this research to train, develop, and evaluate the chatbot. This dataset contains data from 15,602 anonymous users, their listening histories, and 109,269 songs represented by their audio clips, lyrics, and 16 other metadata/attributes. Various techniques, including pattern-matching approaches, TF-IDF combined with machine learning algorithms, Word2vec embeddings, and the BERT model, were explored to determine the most effective methods for creating engaging and responsive music chatbots. To achieve this, the study initially involved data preparation and the creation of a JSON file containing patterns as features and artists as classes. Numerical values representing 60 classes and TF-IDF sparse matrices of 8027 songs were then fed into various machine learning algorithms, including decision trees, random forests, KNN, and SVM. This is followed by a comprehensive comparison of the metrics obtained from these algorithms. The experimental results indicate that the combination of TF-IDF and SVM yielded the best results for designing the chatbot, achieving a classification accuracy of 91\%. However, advanced methods such as the BERT model and Word2vec were found to be less useful due to over-fitting issues and since it is a classification problem for labeled data. Finally, the classification of artists was integrated into a Flask app, which provided the song name and ID based on user-requested tags. This study contributes valuable insights into the development of a music chatbot and highlights the most effective methods for classification and response generation using the given dataset.
  • Item
    Asymmetric Agent Geometries in Synthetic Crowds
    (2023-08-31) Ferreira, Dominic; Haworth, Brandon
    Crowd simulation plays a crucial role in various domains, including urban planning, evacuation analysis, and in the creation of films and games. The representation of agents used in these simulations greatly influences the accuracy of modelling crowd movements and behaviours. Many existing methods lack diversity and use simplified two-dimensional, typically static, agent representations, which constrain the emer gent phenomena and possible applications. By incorporating asymmetric agents, new behaviours and environments can be explored. This thesis investigates two asym metric agent representations that violate many common assumptions requiring novel solutions, particularly in developing a predictive collision avoidance algorithm that accommodates the rotational uncertainty introduced by asymmetric agents. The first model focuses on dynamic agent representations to enhance fidelity and usability in common crowd simulation applications. A mesh-adaptive deformable rep resentation of agents is proposed, which contrasts existing methods that use static primitive geometries. An efficient method for generating elliptical particles that can deform to any mesh and animation state in real time is presented. A physically-based steering algorithm is developed with predictive collision avoidance which accounts for the novel asymmetric agent geometry. The model exhibits realistic packing behaviour of agents under high-density conditions and unpacking when the flow is unconstrained, which is particularly important in evacuation simulation. This approach is exception ally generalizable and supports diverse heterogeneous crowds. The second model considers three-dimensional agent geometries to facilitate the study of human navigation in microgravity environments. The model addresses the complications of a multi-agent non-symmetric particle-based model of astronauts with biomechanically constrained reachable workspace of limbs for steering and col lision avoidance in three dimensions under the conditions of microgravity. A three dimensional predictive collision avoidance algorithm is proposed, which accommo dates the rotational uncertainty of these agents. A multi-layered real-time simulation model is developed for agents aboard spacecraft in microgravity, including the ability for agents to use handles or surfaces to maneuver and float freely through the space-craft. A path finding algorithm is defined over a graph of usable handles, constructing a representation of navigable space. This approach could be used in the planning and safety analysis of future spacecraft design. Both of these models highlight the potential benefits of using asymmetric agent geometries in synthetic crowds research.