Inferring network topology for distributed machine learning model training

dc.contributor.authorAn, Renjun
dc.contributor.supervisorWu, Kui
dc.date.accessioned2024-10-16T20:22:46Z
dc.date.available2024-10-16T20:22:46Z
dc.date.issued2024
dc.degree.departmentDepartment of Computer Science
dc.degree.levelMaster of Science MSc
dc.description.abstractWith the application of distributed machine learning in various industries, there is an increasing demand for model training using cloud computing resources. However, many cloud computing service providers refuse to provide end-users with information about the underlying network topology for commercial and security reasons. Due to this opaqueness, it is challenging to arrange the computation modules in different Virtual Machines (VMs) to achieve the best resource utilization efficiency. To address this problem, we propose an algorithm called Flow Tracking (FT), which uses external measurements to infer the internal structure of a general graph. Compared to the state-of-the-art topology inference algorithms, FT achieves the most accurate topology measured in four different metrics. Notably, FT achieves 100% reconstruction of the underlying topology under the shortest-path routing strategy of the underlying network. Experimentally, resource allocation using the inferred topology improves the model training efficiency significantly compared to random allocation.
dc.description.scholarlevelGraduate
dc.identifier.urihttps://hdl.handle.net/1828/20602
dc.languageEnglisheng
dc.language.isoen
dc.rightsAvailable to the World Wide Web
dc.subjectNetwork tomography
dc.subjectTopology inference
dc.subjectDistributed machine learning
dc.titleInferring network topology for distributed machine learning model training
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
An_Renjun_Msc_2024.pdf
Size:
2.8 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: