Inferring network topology for distributed machine learning model training
dc.contributor.author | An, Renjun | |
dc.contributor.supervisor | Wu, Kui | |
dc.date.accessioned | 2024-10-16T20:22:46Z | |
dc.date.available | 2024-10-16T20:22:46Z | |
dc.date.issued | 2024 | |
dc.degree.department | Department of Computer Science | |
dc.degree.level | Master of Science MSc | |
dc.description.abstract | With the application of distributed machine learning in various industries, there is an increasing demand for model training using cloud computing resources. However, many cloud computing service providers refuse to provide end-users with information about the underlying network topology for commercial and security reasons. Due to this opaqueness, it is challenging to arrange the computation modules in different Virtual Machines (VMs) to achieve the best resource utilization efficiency. To address this problem, we propose an algorithm called Flow Tracking (FT), which uses external measurements to infer the internal structure of a general graph. Compared to the state-of-the-art topology inference algorithms, FT achieves the most accurate topology measured in four different metrics. Notably, FT achieves 100% reconstruction of the underlying topology under the shortest-path routing strategy of the underlying network. Experimentally, resource allocation using the inferred topology improves the model training efficiency significantly compared to random allocation. | |
dc.description.scholarlevel | Graduate | |
dc.identifier.uri | https://hdl.handle.net/1828/20602 | |
dc.language | English | eng |
dc.language.iso | en | |
dc.rights | Available to the World Wide Web | |
dc.subject | Network tomography | |
dc.subject | Topology inference | |
dc.subject | Distributed machine learning | |
dc.title | Inferring network topology for distributed machine learning model training | |
dc.type | Thesis |