Theses (Computer Science)http://hdl.handle.net/1828/762023-03-28T19:36:18Z2023-03-28T19:36:18ZAutomatic Field Extraction of Extended TLV for Binary Protocol ReverseHuang, Zewenhttp://hdl.handle.net/1828/145932022-12-23T00:02:50Z2022-12-22T00:00:00ZAutomatic Field Extraction of Extended TLV for Binary Protocol Reverse
Huang, Zewen
Type Length Value (TLV) is one of the main structures commonly used in network
protocols. A large number of proprietary protocols, whose specification is unknown to
the public, run in the current Internet as well as domain-specific Internet of Things
(IoT) applications. It is critical to infer the TLV fields within a packet because
this information can help network administrators quickly identify abnormal traffic
and potential attacks. Inferring TLV fields belongs to the general task of protocol
reverse engineering and is particularly challenging for binary protocols, where the
boundaries of TLV fields have many possible positions. Existing methods for reverse
engineering binary protocols involve many parameters and only work for protocols
strictly following the conventional TLV format. We extend the concept of TLV to
accommodate a broader category of structural patterns in various binary protocols,
such as TCP, IP, ModBus, and MQTT. We then design algorithms to automatically
extract the extended-TLV fields from packets. Via a series of experiments over several
protocols, we demonstrate that our algorithms can accurately and quickly identify the
extended-TLV fields in all the tested protocols. Our approach can thus be deployed
as a general method for automatically reverse engineering binary protocol format.
2022-12-22T00:00:00ZGermline genetic contribution to metabolic pathways in cancerJalilkhany, Mansourehhttp://hdl.handle.net/1828/145782022-12-21T19:24:02Z2022-12-21T00:00:00ZGermline genetic contribution to metabolic pathways in cancer
Jalilkhany, Mansoureh
Cancer research is essential in improving cancer prevention, detection, and treatment. The analysis of cancer genomes helps uncover gene abnormalities that cause the emergence and spread of many types of cancer. While many studies have investigated various landscapes of cancer, the role of inherited genetic mutations is primarily unexplored. In this work, we studied the genetic variations affecting metabolic pathways in cancer from the SNP-level, gene-level, and pathway-level aspects. First, we identified the significant SNPs and genes associated with metabolic traits. Then we introduced A-LAVA to perform gene set analysis and detect the most significant gene sets associated with the target traits. A-LAVA is a competitive gene set analysis approach that resolves the bias resulting from overlapping gene sets, as a potential confounding effect, in addition to other standard corrections performed in current methods. We also showed that accounting for the shared genes present in the gene sets is essential for any gene set analysis approach when there is an overlap between gene sets, as it remarkably affects the results.
2022-12-21T00:00:00ZComputing (1+ϵ)-Approximate Degeneracy in Sublinear TimeYong, Quintonhttp://hdl.handle.net/1828/145762022-12-21T19:07:22Z2022-12-21T00:00:00ZComputing (1+ϵ)-Approximate Degeneracy in Sublinear Time
Yong, Quinton
The problem of finding the degeneracy of a graph is a subproblem of the k-core
decomposition problem. In this paper, we present a (1 + ϵ)-approximate solution to
the degeneracy problem which runs in O(n log n) time on a graph with n nodes, sublinear
in the input size for dense graphs, by sampling a small number of neighbours
adjacent to high degree nodes. Our algorithm can also be extended to an O(n log n)
time solution to the k-core decomposition problem. This improves upon the method
by Bhattacharya et al., which implies a (4 + ϵ)-approximate ˜O(n) solution to the
degeneracy problem. Our techniques are similar to other sketching methods which
use sublinear space for k-core and degeneracy. We prove theoretical guarantees of
our algorithm and provide optimizations which improve the running time of our algorithm
in practice. Experiments on massive real-world web graphs show that our
algorithm performs significantly faster than previous methods for computing degeneracy,
including the 2022 exact degeneracy algorithm by Li et al.
2022-12-21T00:00:00ZScaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit TheoremHowie, Josephhttp://hdl.handle.net/1828/145722022-12-20T01:25:32Z2022-12-19T00:00:00ZScaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem
Howie, Joseph
In this thesis, we focus on structural clustering of probabilistic graphs, which comes with significant computational challenges and has, so far, resisted efficient solutions that are able to scale to large graphs, e.g. state-of-art can only handle graphs with a few million edges. We address the main bottleneck step of probabilistic structural clustering, computing the structural similarity of vertices based on their Jaccard similarity over the set of possible worlds of a given probabilistic graph. State-of-art used Dynamic Programming, a quadratic run-time algorithm, that does not scale to pairs of vertices of high degree. In this thesis we present a novel approach based on Lyapunov Central Limit Theorem. By using a carefully chosen set of random variables we are able to cast the computation of structural similarity to computing a one-tailed area under the Normal Distribution. Our approach has linear run-time as opposed to quadratic, and as such, it scales to much larger inputs. Extensive experiments show that our approach can handle massive graphs at web-scale which state-of-art cannot.
2022-12-19T00:00:00Z