Master's Projects
Permanent URI for this collection
Browse
Browsing Master's Projects by Department "Department of Computer Science"
Now showing 1 - 20 of 87
Results Per Page
Sort Options
Item A performance evaluation of collective communication libraries(2026) Srinivasan, Subiksha; Wu, Kui; Prakash Champati, JayaCollective communication operations such as AllGather and AlltoAll are fundamental to high-performance computing (HPC) and large-scale machine learning workloads. Their performance, however, is tightly constrained by network structure, link latency, and bandwidth availability across modern multi-GPU and multi-node systems. As systems scale and become increasingly heterogeneous, traditional collective scheduling approaches, which often assume unrealistic symmetry in latency and topology, become ineffective. This project investigates Traffic Engineering for Collective Communication (TE-CCL), an optimization-based framework that formulates collective scheduling as a Mixed-Integer Linear Programming (MILP) problem. TE-CCL explicitly incorporates link-level latency (α) into its scheduling formulation, enabling more realistic modelling of heterogeneous multi-fabric GPU clusters. This project examines how varying α across links affects routing decisions, epoch schedules, and solver behaviour. By introducing heterogeneous α values—rather than assuming a fixed latency across all links—the model adapts its schedules to prioritize low-latency paths, reduce hop count where beneficial, and capture realistic communication delays found in the cloud and datacenter clusters. This work provides an analysis of TE-CCL under latency variability, evaluating solver behaviour, schedule structures, and topology sensitivity across multiple cluster designs. The study highlights how α-aware scheduling reshapes the communication patterns selected by the solver and provides insights into when and why topology-regularity influences optimization stability. Overall, this investigation clarifies the importance of latency modelling in collective communication and offers guidance for extending TE-CCL toward more robust, topology-adaptive scheduling strategies for next-generation HPC and ML systems.Item A prototype architecture for interactive 3D maps on the web(2024) Liu, Ting; Coady, YvonneVirtual 3D city models offer detailed 3D representations of urban space and serve in various fields, such as urban planning, architecture, navigation, and environmental simulation. With the advancement of technologies such as photogrammetry and laser scanning, the scale of 3D city models has increased significantly, making it a challenge to transmit and visualize such large datasets for sharing purposes. The development of advanced web technologies and the emergence of WebGL has made it possible to render and share large-scale 3D city models on the Internet. In addition, the introduction of game engines has further enhanced the simulation and interactive functions of 3D GIS applications. In this project, the exploration focused on using and integrating WebGL-based rendering tools to visualize large 3D city models, providing a portal where users can navigate and interact with urban scenarios from different perspectives. The architecture utilized 3DCityDB for tiling and format conversion of 3D models, 3D Web Client/Cesium.js virtual globe for loading large-scale tiled data, and Babylon.js to achieve interactive functions and environmental simulation. A GridMap mechanism was proposed to solve the problem of loading a large number of models with geographic coordinates in the Babylon scene. Test results show that this mechanism can maintain effective loading efficiency. Especially when the size of the dataset grows significantly, loading time and memory consumption will not increase, and FPS can also be maintained at a high level to ensure smooth interaction. This study expands the feasibility of applying 3D GIS data in web-based game engines through enhanced interactivity and simulation.Item Abstract and Metaphoric visualization of emotionally sensitive data(2022-04-28) Malik, MonaStandard visualizations such as bar charts and scatterplots, especially those representing qualitative, emotionally sensitive issues, fail to build a connection between the data that the visualization represents and the viewer of the visualization. To address this challenge, the information visualization community has become increasingly interested in exploring creative visualization techniques that could potentially help viewers relate to the suffering and pain in emotionally sensitive data. We contribute to this open question by investigating whether visualizations that rely on metaphors (i.e., that involve existing mental images such as a tree or a person image) with some emotional connection can foster viewers’ empathy and engagement with the data. Specifically, we conducted an empirical study in which we compare the effect of visualization type (metaphoric and abstract) on people’s engagement and empathy when exposed to emotionally sensitive data (data about sexual harassment in academia). We designed a metaphoric visualization that relies on the metaphor of a flower symbolizing life, beauty, and fragility which might help the viewers to relate to the victim, build some emotional connection, and an abstract visualization that relies on purely geometric forms with which people should not have any existing emotional connection. In our study, we found no clear difference in engagement and empathy between metaphoric and abstract visualization. Our findings indicate that female participants were slightly more engaged and empathic with both visualizations compared to other participants. Additionally, we learned that measuring empathy in a data visualization is a complex task. Informed by these findings on how people engage and empathize with metaphoric and abstract visualization, newer and improved visualization and experiences can be developed for similar emotionally sensitive topics that are emotionally charged and fear-provoking.Item Adaptive lifelong learning(2018-12-20) Parul; Mehta, NishantLifelong learning is an emerging field in machine learning that still requires a lot of research. In lifelong learning, the tasks are presented sequentially, the system learns knowledge at each task and the goal is to retain the learned knowledge and utilize it when learning a new task. Exponentially Weighted Aggregation for Lifelong Learning (EWA-LL) is a meta-algorithm used in lifelong learning setting. It transfers information from previous tasks to the next. A prior distribution is maintained on the set of representations, which is updated after the encounter of each new task using the exponentially weighted aggregation (EWA) procedure. This project tries to relax the problem and explores the case of an easy scenario where we have some more information about the data. It implements adaptive learning in lifelong learning setting. It utilizes the adaptive learning algorithm Follow The Leader with Dropout Perturbations (FTL-DP) used in Online Prediction with Expert Advice. FTL-DP sets the losses of the experts to 0 or 1 at each task based on the dropout probability before selecting the leader. This project transports FTL-DP to lifelong learning setting. The goal is to prove that adaptive algorithm in lifelong learning is a better approach than EWA-LL as it gives smaller regret for certain easy problems while still maintaining the regret bounds similar to EWA-LL for the harder problems.Item Adaptive teaching: learning to teach(2018-12-20) Lakhani, Aazim; Mehta, NishantTraditional approaches to teaching were not designed to address individual student's needs. We propose a new way of teaching, one that personalizes the learning path for each student. We frame this use case as a contextual multi-armed bandit (CMAB) problem a sequential decision-making setting in which the agent must pull an arm based on context to maximize rewards. We customize a contextual bandit algorithm for adaptive teaching to present the best way to teach a topic based on contextual information about the student and the topic the student is trying to learn. To streamline learning, we add an additional feature which allows our algorithm to skip a topic that a student is unlikely to learn. We evaluate our algorithm over a synthesized unbiased heterogeneous dataset to show that our baseline learning algorithm can maximize rewards to achieve results similar to an omniscient policy.Item Agile Requirements Change Management Model For Global Software Development(2023-05-22) Koulecar, Neha Sheilesh; Damian, DanielaWe propose a comprehensive and robust agile requirements change management (ARCM- GSD) model that addresses the limitations of existing models and is tailored for agile software development in the global software development paradigm. To achieve this goal, we conducted an exhaustive literature review and an empirical study with RCM industry experts. Our study evaluated the effectiveness of the proposed RCM model in a real-world setting and identified any limitations or areas for improvement. The results of our study provide valuable insights into how the proposed ARCM-GSD model can be applied in agile global software development environments to improve software development practices and optimize project success rates.Item An Exploratory Study of Data Physicalization Using Household Objects(2024-03-20) Ramesh, Shanker; Somanath, Sowmya; Perin, CharlesWe explore people’s perceptions and ideas regarding creating data physicalizations using household objects such as chairs, flower pots, and photo frames to enable data-driven self-reflection. By conducting a sketching-based qualitative study with 11 participants we identified styles of physical encoding participants used, strategies for creating physicalizations they employed, and techniques for constructing physicalizations they relied on. From the study results we contribute i) a bottom-up list of physical variables people might use for different data types, ii) a comparison between the theory about visual variables and the empirical use of physical variables, iii) an identification of the need for flexible taxonomies for physical representations, and iv) a discussion of the relationship between social pressure and the location of physical representations in the household.Item Analysis of Students' Learning for Efficient Task Assignment in a Distributed Scrum Setting(2015-08-12) Chhabra, Prashant; Damian, DanielaThe Development of software across multiple sites called global software development [GSD] is the norm of industry. Various factors like monetary benefits, desire to tap into a pool of skilled workers and the proximity to customers, has led to its growth. However, GSD suffers from various challenges, with communication being the biggest. It is difficult to divide complex software into independent modules such that to minimize the communication requirements. Development of these interdependent modules takes place across different sites that are separated by different time zones, cultures and languages. These differences create difficulty in communication and collaboration which is essential for the success of GSD. Scrum, an agile methodology, has shown to mitigate some of these challenges by enforcing structured communication. This report presents a case study of students working in the development of mobile application in distributed scrum setting. The aim of this report is to see if students following scrum methodology learn task assignment to reduce cross team and cross-site dependencies.Item Analysis of Trajectories of Clarifying Communication of Requirements(2015-08-12) Sheoran, Jyoti; Damian, DanielaStakeholders of IBM Rational Team Concert project use text-based online tools for communication. Requirements related text-based discussions could be classified into different communication patterns. Identification of ineffective communication patterns in requirement discussions can help management to take immediate actions to ensure completion of requirement within planned time. This report presents results from a study of the trajectories of clarifying communication and the history of user requirements in RTC project. There are statistically significant differences in the six communication patterns - discordant, procrastination, textbook-example, back-to-draft, happy-ending and indifferent. Also, certain requirement attributes such as number of comments and priority significantly affect the communication pattern of a requirement.Item Analyzing GitHub as a Collaborative Software Development Platform: A Systematic Review(2017-04-24) Reyes López, Arturo; M German, DanielGitHub is a popular social coding site where developers not only host their code and use git functions, but also use social features to communicate, collaborate, and be aware of changes and others' activities. This new paradigm to code together, and the availability of data have given rise to much research studying collaboration from different angles. However, the vast accumulated knowledge about GitHub tends to be scattered and fragmented. The goal of this study is to collect the available research on GitHub that is focused on identifying the impact of GitHub in software development. The design of the study includes two sections. First, a systematic search in 7 electronic digital libraries was conducted using a de fined search protocol, which included a keyword string and exclusion/inclusion criteria. Second, the extraction of data from each publication and manual coding was conducted to defi ne categories of knowledge based on research questions and fi ndings. The study results show a growing trend in research with an increase in mixed methodology. The preferred data sources for empirical studies about GitHub are the GitHub API and GHTorrent in 72.57% of publications. The study reveals that a group made of 30 researchers publish 45.86% of total research. The research in NorthAmerica represents 26% of publications. The research on GitHub is focused on the evaluation of pull requests and use of issues(30.77%), popular projects characteristics (20.88%), collaboration and transparency (15.38%), developers' roles (9.89%), influence of popular developers (8.79%), quick-start package with guidelines and datasets (8.79%), tools to improve contributions and collaboration (4.40%) and other (1.1%).Item Artificial Intelligence: Where We Came From, Where We Are Now, and Where We Are Going(2017-07-11) Evans, Guy-Warwick; Kapron, BruceFrom ancient myths to early advances in formal logic and mathematics, the story of artificial intelligence (AI) began centuries before the rise of modern computers. Today, modern AI has impacted nearly every area of human activity - from industries such as healthcare and transportation to science fiction media and popular culture. The story of artificial intelligence is far from over, with current trends and research suggesting large areas of impact in the future. This report examines three questions relating to AI: where did AI come from, what is the current state of AI, and what does the future of AI look like. A brief history of artificial intelligence is presented followed by a literature review and discussion of the impacts and trends of artificial intelligence research.Item Automating Static Code Analysis for Risk Assessment and Quality Assurance of Medical Record Software(2017-12-14) Kaur, Harneet; Weber, JensItem Automating the Configuration of Virtual Private Network Servers(2016-10-20) Xu, Yongjun; Coady, YvonneThe challenge of consistent and reliable deployment of a distributed application on a large scale is significant, in particular if all of the steps must be executed manually. This project explores an automated approach to populate a distributed environment using a freely available tool called Chef. In particular, we focus on configuring cloud servers into a Virtual Private Network (VPN) of service providers. To demonstrate a fully implemented distributed VPN service, we present an infrastructure including a web interface, payment service and database integration. The prototype system allows for one-line command setup for VPN servers, leveraging an automated deployment framework. Furthermore, a preliminary evaluation and analysis on the automated approach is presented, concretely demonstrating the advantages and disadvantages of automated deployment within the setup process on a large scale.Item BDD Automation Framework for Oscar-EMR(2018-11-29) Jain, Harsh; Weber, Jens; Ernst, NeilThe purpose of this project is to introduce Behavior Driven Development (BDD) Testing for Oscar EMR Clinical Management System using Cucumber-Selenium Automation Framework.Item CADConverter-For Converting Complex CAD files into HDF5 Format(2023-08-29)In this paper, we offer a very effective approach for extracting data from Computer- Aided Design And Manufacturing (CAD/CAM) step files and converted them to the self-descriptive and open-source HDF5 format. This format provides for the seamless integration of data and metadata inside a single file, making it ideal for dealing with complex data. The difficulties associated with converting CAD files between applications and the sophisticated data organisation inside multiple CAD file formats have motivated our research. The International standard ISO 10303 or STEP (’STandard for the Exchange of Product Model data’) addresses the format complexity and CAD data conversion problem between different computer-aided design (CAD) systems. We reviewed current CAD datasets like ABC and Fusion360 for geometrical data processing and created an algorithm that converts CAD models to open-source a human-readable format to feed deep learning algorithms directly and hence eliminating the requirement for third-party software for file conversions will be boosting the efficiency of CAD data administration and processing. To do this, we employ a method that combines data pre-processing as well as the development of an algorithm that converts around one million CAD models to the HDF5 format and uses the derived data for a number of applications, including machine learning and data analysis. The findings of the study lead to a better understanding of CAD data processing procedures and establish the framework for future research and development and integrate the converted dataset with PyTorch and TensorFlow to address current CAD system constraints,such as the limitations of standard neutral 3D CAD file formats that are difficult to understand. It opens the way for more efficient and simplified procedures in CAD-intensive sectors.Item A Case Study in Web Application Performance Measurement(2015-12-15) Goyal, Nitin; Hoffman, DanielThe Computational Quiz Generation (CQG) system is a web application that provides online programming quizzes. CQG has been used in CSC 111, CSC 116, CSC 361, SEng 265 and SEng 360. In the future we want to use CGQ in larger sections but due to the unavailability of performance metrics on CQG, it would be risky. We want to get quantitative performance data. We are interested in identifying maximum number of users supported stably by CQG, quiz start up time and if Java questions are expensive. Hence performance testing was conducted on CQG using Apache JMeter. Several tests were conducted to collect quantitative performance data relating to speed, stability and scalability. This project is a deployment of the test infrastructure on CGQ that would benefit the stakeholders in CQG to better determine and understand problems related to the maximum number of supported users, start up delays, expensive questions, etc. Experimental results have shown that the quiz start up time is high and depends on the size of the question library. It was also found that Java questions are much more expensive to use than C, C++ and Python. Performance testing has also uncovered the modules in CGQ that requiring optimization.Item Cloud Resource Monitoring for Facilitating Administration(2016-05-24) Kadyan, Sumit; Ganti, SudhakarThis project presents the design and implementation of a web dashboard used for monitoring cloud resources. Cloud computing provides sharing and managing data and performing computations on a shared resource via the Internet. In this dissertation, the cloud that is being monitored by the dashboard is SAVI testbed. The dashboard provides a bird's eye view for the monitoring of SAVI tested resources. The dashboards allows the user to monitor different resources: parameters such as cpu utilization, memory usage, cpu time, virtual cpu and instances. It also allows users to browse through information about different resources and then make a decision on resource migration. This dissertation outlines some of the identified weaknesses and it lists suggestions to further improve the dashboard.Item A Conceptual Design: The Verification of QIR’s Generic Quantum Optimization Passes(2023-08-30) Naghavi, Paria; Weber, JensThis report outlines the design of a verification system for QIR quantum optimization passes using Vellvm interpreter, along with QWIRE semantics and memory model. The primary objective is to establish a foundation for future advancements in quantum optimization and verification techniques. The system focuses on a small circuit in a one-qubit system but can be scaled up for more complex analytics and optimization passes in QAT. The design includes three components: the QIR input, Vellvm parser, and verification tools of QWIRE. The input consists of the QIR quantum blocks before and after the optimization pass. The parser utilizes QWIRE semantics to map the QIR code to Coq objects and types, generating a Coq AST for analysis. The verification is performed by Coq's proof assistant, checking for well-formedness and equivalence of density matrices from the QIR code blocks corresponding to pre and post quantum optimization pass application. The design can be extended to other generic or targeted quantum transformations in QAT, including target-specific gate merging. The overall goal is to provide an agnostic and hybrid-compatible verification system that can improve the reliability of quantum computations on diverse hardware platforms.Item CrowdLabs: A platform for human behaviour and perception studies(2023-08-29) Parikh, Kunal; Haworth, BrandonAdvancements in technology encourage researchers to decipher several intricate conceptions such as human perception and behavioral studies. Though engineering science supports scientists, it in turn demands enormous efforts from them. Therefore, the necessity for a system that reduces the academicians’ time and toil becomes inevitable. The proposed system – CrowdLabs allows the researchers to conduct a wide variety of human experiments and simulations and can scale up to capture fast-flowing data streams without them worrying about most of the underlying engineering concerns. The apparatus consists of two major traits: The experimenter who creates new trials and collects several forms of data for their studies, and the participants who take part in these experiments. This modular framework offers a minimal backend for different perspective controllers, data collection, and control to the scientists for loading and managing their experimental designs, repetition of scenes, randomizing sequences, and integrating training through a simple interface. The system was assessed for an interactive experiment to determine the system's capability to conduct a wide variety of complex human perception and behavior experiments. Subsequently, the system’s performance was evaluated for CPU, GPU, memory, and frame rates. This analysis indicated that the apparatus was responsive and robust for all the experiment scenarios. Furthermore, the system was evaluated extensively and several avenues for future work have been identified.Item Data Cleaning using a Matching Dependency Technique(2019-01-02) Jain, Shashank; Coady, YvonneIn today’s digital society, people are often required to enter their home or office addresses on forms available online. It is not uncommon for people to introduce some minor mistakes, such as misspelled addresses, or incorrect postal codes/zip codes. Such mistakes made by the user can be quite problematic when automated systems must process their request. For example, if a person orders something online providing the incorrect postal code in the entered address, this mistake could lead to delay in the delivery of the item or even worse, the item may remain undelivered. To avoid such situations, these systems often use a machine learning technique called ‘Matching Dependency’ which has been proven helpful in making recommendations for the correction of any incorrect value in the input address. This technique uses a binary search algorithm to reduce the number of cycles the process has to go through to make recommendations. Our exploration of one possible implementation of this algorithm uses our own synthesized sample data sets instead of real user input with the external data. External data has been used as the authenticated data source to verify the user input data. We compare our synthesized user input data with the external data that is considered to be completely trust worthy. The system then makes possible recommendations based on the correctness of the user input. The evaluation was mainly done on two different sizes of data sets, 1000 and 15000. The results had zero false negatives, few false positives, and mostly relevant recommendations.