Copula Theory and Its Applications in Computer Networks
by
Fang Dong
B.Sc., Wuhan University, 2011
M.Eng., Wuhan University, 2013
A Dissertation Submitted in Partial Fulfillment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
in the Department of Computer Science
c 
 Fang Dong, 2017
University of Victoria
All rights reserved. This dissertation may not be reproduced in whole or in part, by
photocopying or other means, without the permission of the author.
ii
Copula Theory and Its Applications in Computer Networks
by
Fang Dong
B.Sc., Wuhan University, 2011
M.Eng., Wuhan University, 2013
Supervisory Committee
Dr. Kui Wu, Co-Supervisor
(Department of Computer Science)
Dr. Venkatesh Srinivasan, Co-Supervisor
(Department of Computer Science)
Dr. Lin Cai, Outside Member
(Department of Electrical and Computer Engineering)
iii
Supervisory Committee
Dr. Kui Wu, Co-Supervisor
(Department of Computer Science)
Dr. Venkatesh Srinivasan, Co-Supervisor
(Department of Computer Science)
Dr. Lin Cai, Outside Member
(Department of Electrical and Computer Engineering)
ABSTRACT
Traffic modeling in computer networks has been researched for decades. A good
model should reflect the features of real-world network traffic. With a good model,
synthetic traffic data can be generated for experimental studies; network performance
can be analysed mathematically; service provisioning and scheduling can be designed
aligning with traffic changes. An important part of traffic modeling is to capture
the dependence, either the dependence among different traffic flows or the temporal
dependence within the same traffic flow. Nevertheless, the power of dependence
models, especially those that capture the functional dependence, has not been fully
explored in the domain of computer networks.
This thesis studies copula theory, a theory to describe dependence between random variables, and applies it for better performance evaluation and network resource
provisioning. We apply copula to model both contemporaneous dependence between
traffic flows and temporal dependence within the same flow. The dependence models
are powerful and capture the functional dependence beyond the linear scope. With
numerical examples, real-world experiments and simulations, we show that copula
modeling can benefit many applications in computer networks, including, for example, tightening performance bounds in statistical network calculus, capturing full
iv
dependence structure in Markov Modulated Poisson Process (MMPP), MMPP parameter estimation, and predictive resource provisioning for cloud-based composite
services.
v
Contents
Supervisory Committee ii
Abstract iii
Table of Contents v
List of Tables ix
List of Figures xi
Nomenclature xiii
Acknowledgements xviii
Dedication xix
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Preliminaries on Copula Theory 8
2.1 Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . 8
2.2 Copula-based Dependence Measures . . . . . . . . . . . . . . . . . . . 13
2.3 Parametric Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Empirical Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Copula Analysis for Contemporaneous Dependence and Its Application in Statistical Network Calculus 19
vi
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Background of Stochastic Network Calculus . . . . . . . . . . . . . . 21
3.4 Insights of Copula Analysis . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Basic Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.2 An Example of Copula Analysis . . . . . . . . . . . . . . . . . 25
3.4.3 Performance Bounds of SNC with Copulas . . . . . . . . . . . 27
3.5 Copula Modelling at Work . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1 Copula Analysis in Real-world Applications . . . . . . . . . . 29
3.5.2 Copula Analysis with Simulated Traffic . . . . . . . . . . . . . 33
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Copula Analysis of Temporal Dependence of Markov Modulated
Poisson Process 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Markov Modulated Poisson Process . . . . . . . . . . . . . . . 42
4.3.2 Why Do Existing Results Not Suffice? . . . . . . . . . . . . . 43
4.4 Theoretical Copula Analysis for MMPP,
HoMMPP and HeMMPP . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.1 Theoretical Copula Analysis for Single MMPP . . . . . . . . . 46
4.4.2 Theoretical Copula Analysis for HoMMPP . . . . . . . . . . . 48
4.4.3 Theoretical Copula Analysis for HeMMPP . . . . . . . . . . . 51
4.4.4 An Algorithm to Compute HeMMPP Copula . . . . . . . . . 52
4.5 Parametric Copula Modeling for MMPP trace . . . . . . . . . . . . . 56
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Application of MMPP Copulas for Network Traffic Prediction 58
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Copula-based Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 Prediction Based on Theoretical Copulas . . . . . . . . . . . . 59
5.2.2 Prediction Based on Parametric Copulas . . . . . . . . . . . . 60
5.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.1 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . 62
vii
5.3.2 Case Study on A Single MMPP Trace from Real-world . . . . 63
5.3.3 Case Study on HoMMPP Trace with Simulation . . . . . . . . 69
5.3.4 Case Study on HeMMPP trace . . . . . . . . . . . . . . . . . 73
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6 Application of MMPP Copulas in Composite Cloud Service Provisioning 77
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4 A Copula Model for Latent Dependence Structure in Service Composition 81
6.5 Collaborative Auto-Scaling of Virtualized Functions . . . . . . . . . . 82
6.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.5.2 Copula-based Scaling Matrix . . . . . . . . . . . . . . . . . . . 83
6.5.3 Utilization-based Individual Scaling Matrix . . . . . . . . . . . 83
6.5.4 Integrated Scaling Matrix . . . . . . . . . . . . . . . . . . . . 84
6.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.6.1 MMPP modeling of Real-world Cloud Trace . . . . . . . . . . 84
6.6.2 Performance Evaluation with Synthetic Data . . . . . . . . . . 86
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Application of MMPP Copulas in Parameter Estimation 91
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3 Copula-based Parameter Estimation of MMPP . . . . . . . . . . . . . 93
7.3.1 Matching Marginal Distribution . . . . . . . . . . . . . . . . . 94
7.3.2 Matching Copula . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.3 A Summary of MarCpa Algorithm . . . . . . . . . . . . . . . 101
7.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4.1 Performance Evaluation Based on Ground Truth . . . . . . . . 103
7.4.2 Performance Evaluation Based on Average Goodness-of-Fitting
and Running Time . . . . . . . . . . . . . . . . . . . . . . . . 105
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8 Conclusions and Future Work 110
8.1 Contemporaneous Dependence Modeling . . . . . . . . . . . . . . . . 110
viii
8.2 Temporal Dependence Modeling . . . . . . . . . . . . . . . . . . . . . 111
8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography 113
ix
List of Tables
Table 3.1 Kolmogorov-Smirnov goodness of fit test for a
1
and a
2
in three
datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 3.2 “Blanket” goodness of fit test for copula between a
1
and a
2
across
three datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 3.3 Kolmogorov-Smirnov goodness of fit test for backlog based on
simulated dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 3.4 “Blanket” goodness of fit test for copula between B1
and B2
based
on simulated dataset . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 4.1 Definition of Matrices . . . . . . . . . . . . . . . . . . . . . . . . 54
Table 5.1 Dependence Measures of BCpAug89 Trace from Theoretical Analysis and Empirical Analysis . . . . . . . . . . . . . . . . . . . . 64
Table 5.2 One-Step Prediction RMSE on BC-pAug89 trace with Different
Training Percentages. . . . . . . . . . . . . . . . . . . . . . . . . 66
Table 5.3 Dependence Measures of the Associated Trace from Theoretical
Analysis and Empirical Analysis . . . . . . . . . . . . . . . . . . 67
Table 5.4 One-Step Prediction RMSE on the Associated Trace with Different Training Percentages. . . . . . . . . . . . . . . . . . . . . . . 68
Table 5.5 Dependence Measures of the HoMMPP trace from Theoretical
Analysis and Empirical Analysis . . . . . . . . . . . . . . . . . . 69
Table 5.6 One-Step Prediction RMSE on the HoMMPP Trace with Different Training Percentage. . . . . . . . . . . . . . . . . . . . . . . 71
Table 5.7 Two-step Dependence Measures of the HoMMPP Trace from Theoretical Analysis and Empirical Analysis . . . . . . . . . . . . . 71
Table 5.8 Two-Step Prediction RMSE on the HoMMPP Trace with Different Training Percentage. . . . . . . . . . . . . . . . . . . . . . . 73
Table 5.9 One-Step Prediction RMSE on the HeMMPP trace with Different
Training Percentages. . . . . . . . . . . . . . . . . . . . . . . . . 75
x
Table 5.10Two-Step Prediction RMSE on the HeMMPP trace with Different
Training Percentages. . . . . . . . . . . . . . . . . . . . . . . . . 75
Table 6.1 Calculation of Collaborative Scaling Matrix Sg
. . . . . . . . . . 84
Table 6.2 Comparison of The First Two Order of Moments of Arrival Counts
in Every 300 Seconds . . . . . . . . . . . . . . . . . . . . . . . . 86
Table 6.3 Parameters of Simulated Composite System . . . . . . . . . . . 87
Table 6.4 Simulation results with initial capacity as ?
j
= 1 . . . . . . . . . 89
Table 6.5 Simulation results with initial capacity as ?
j
= 2 . . . . . . . . . 89
Table 7.1 Estimated parameters for the simulation trace. . . . . . . . . . . 104
Table 7.2 Kolmogorov-Smirnov test results on sample trace. . . . . . . . . 105
Table 7.3 Running time in seconds. . . . . . . . . . . . . . . . . . . . . . . 105
Table 7.4 Ratio of experiments that pass K-S tests. . . . . . . . . . . . . . 109
xi
List of Figures
Figure 1.1 Scatter plot of successive arrival counts of BCpAug89 . . . . . . 3
Figure 2.1 An explanatory example of the definition of copula. . . . . . . . 9
Figure 2.2 An explanatory example of Sklar’s theorem. . . . . . . . . . . . 10
Figure 2.3 An explanatory example of the invariant property . . . . . . . . 11
Figure 2.4 Fr´echet-Hoeffding lower bound copula Clb
. . . . . . . . . . . . . 13
Figure 2.5 Product copula Cind
. . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2.6 Fr´echet-Hoeffding upper bound copula Cub
. . . . . . . . . . . . 14
Figure 2.7 Scatter plot figures of three Archimedean copulas with parameter
? = 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 3.1 Different Bounds with r
1
= 0.5, r
2
= 1 . . . . . . . . . . . . . . 27
Figure 3.2 Different Bounds with r
1
= 2, r
2
= 2 . . . . . . . . . . . . . . . 27
Figure 3.3 Experiment scenario . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 3.4 Histogram of a
1
and a
2
based on samples in one dataset. . . . . 31
Figure 3.5 Histograms of B1
and B2
based on samples in simulated dataset. 35
Figure 3.6 Backlog bound curves of two input flows of the simulated system. 36
Figure 3.7 Backlog bound for aggregate traffic A. . . . . . . . . . . . . . . 38
Figure 4.1 Arrival counts of the two traces . . . . . . . . . . . . . . . . . . 44
Figure 4.2 Covariances of two MMPPs over different time lags . . . . . . . 45
Figure 4.3 Scatter plot with marginal histograms of Ai
and Ai+1
in two traces 45
Figure 4.4 Bivariate frequency histogram (upper layer) with its heat map
(lower layer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 5.1 Copula contours for MMPP learned from BCpAug89 trace. . . 65
Figure 5.2 Prediction with theoretical copula on the testing set (last 20%)
of BCpAug89 trace . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 5.3 Prediction with theoretical copula on the testing set (last 20%)
of the associated trace . . . . . . . . . . . . . . . . . . . . . . . 67
xii
Figure 5.4 One-step copula contours for HoMMPP. . . . . . . . . . . . . . 70
Figure 5.5 Prediction with theoretical HoMMPP copula on the testing set
(last 20%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 5.6 Two-step copula contours for HoMMPP. . . . . . . . . . . . . . 72
Figure 5.7 Two-step prediction with theoretical copula on the testing set
(last 20%) of the HoMMPP trace . . . . . . . . . . . . . . . . . 72
Figure 5.8 Copula contours for HeMMPP. . . . . . . . . . . . . . . . . . . 74
Figure 6.1 The conceptual diagram of service composition . . . . . . . . . 78
Figure 6.2 A queueing model for composite service . . . . . . . . . . . . . 80
Figure 6.3 Q-Q plot of arrival counts in every 300 seconds . . . . . . . . . 87
Figure 6.4 Copula-based inference on call arrival counts . . . . . . . . . . . 89
Figure 7.1 An example of the initialization of parameter ? . . . . . . . . . 96
Figure 7.2 Arrival counts of simulation trace. . . . . . . . . . . . . . . . . 103
Figure 7.3 Performance in DM for 3-state MMPP traces. . . . . . . . . . . 106
Figure 7.4 Performance in DC
for 3-state MMPP traces. . . . . . . . . . . 106
Figure 7.5 Performance in running time for 3-state MMPP traces. . . . . . 107
Figure 7.6 Performance in DM for 5-state MMPP traces. . . . . . . . . . . 107
Figure 7.7 Performance in DC
for 5-state MMPP traces. . . . . . . . . . . 108
Figure 7.8 Performance in running time for 5-state MMPP traces. . . . . . 108
xiii
Nomenclature
Notation of Chapter 2
C Copula
C (u, v; ?) Parametric copula
Clb
Fr´echet-Hoeffding lower bound copula
Cub
Fr´echet-Hoeffding upper bound copula
Cind
Product copula
ˆ
C Empirical copula
u, v The argument value of copula, or the sample value of marginal distribution function
U, V, X, Y Random variables
x, y Sample value of random variables
F Cumulative distribution function
ˆ
F Empirical cumulative distribution function
?
t
Kendall’s tau
?
s
Spearman’s rho
? Pearson correlation coefficient
?
+
t
Upper tail dependence
?
-
t
Lower tail dependence
xiv
Notation of Chapter 3
A(t) Cumulative traffic arrives in time interval (0, t]
A
*
(t) Cumulative traffic departs in time interval (0, t]
S (t) Cumulative amount of service in time interval (0, t]
A Traffic model
S Service model
¯
F Complementary distribution function/ survival function
a The curve function in the definition of arrival model
ß The curve function in the definition of service model
? A sliding window size
? Rate in SBB model
r
1
, r
2
Parameter of exponential distributions
R1, R
2
Constant service rate to flows
B(t) Backlog at time t
D(t) Delay at time t
B Random variable of backlog
a Random variable of the amount of data sent per unit of time
a
i
Sample value of a in the ith unit of time
(?, µ1
, s
1
, µ
2
, s
2
) Parameters of mixture of two Gaussian distributions
Notation of Chapter 4
(Q, ?) Parameter of MMPP
m number of states in MMPP
? The stationary distribution for the CTMC
xv
P (t) The transition matrix for the CTMC after time t
I
i i-th time slot
Ai
The random variable of the arrival count in i-th time slot of single MMPP
trace
Si
The random variable of the state of MMPP in i-th time slot
? Length of time slots
M The cumulative distribution function of Ai
Ci
0 The copula between arrival counts Ai
and Ai+i
0 , i
0
? N
Gj
The marginal distribution of Ai on the condition that associated CTMC
is in state j
G(x) The vector G(x) = [G1
(x), G
2
(x), · · · , Gm(x)]
A
l
i
The random variable of the arrival count in i-th time slot of HoMMPP/HeMMPP
traces
M
l
The cumulative distribution function of A
l
i
C
l
i
0 The copula between arrival counts A
l
i
and A
l
i+i
0 , i
0
? N
?Ci
0 The single MMPP copula gardient
?C
l
i
0 The HoMMPP/HeMMPP copula gardient
(
l
Q, l
?) The parameters of the l-th MMPP in HoMMPP/HeMMPP
l Ai
The random variable of the arrival count in i-th time slot of the l-th
MMPP trace
l M The cumulative distribution function of l Ai
l
p The probability mass function of l Ai
l Ci
0 The copula between arrival counts l Ai
and
l Ai+i
0 , i
0
? N
?l Ci
0 The single MMPP copula gradient of the l-th MMPP
xvi
ˆ a The upper threshold of interested range of arrival counts
ˆ
M
l
The empirical cumulative distribution function of A
l
i
C (u
i
, u
i+i
0 ; ?) The parametric copula between A
l
i
and A
l
i+i
0 learnt from tarce
Notation of Chapter 5
x
i
Sample value of Ai
or A
l
i
ˆ x
i
Predicted value of Ai
or A
l
i
c(u
i
, u
i+i
0 ; ?) Parametric copula density function
(?1, ?
2
, 
t
) Parameters of AR(1) model
s Parameter of LPC(1) model
A
0
i
An associate trace of Ai
Notation of Chapter 6
d Scaling delay
ß Capacity unit
?
j Current capacity for VF j
µ
j Capacity level for VF j
Sc
Copula-based scaling matrix
Su
Utilization-based scaling matrix
Sg
Integrated scaling matrix
% Utilization of queueing system
Notation of Chapter 7
u
i
(ˆ u
i
) Marginal (empirical) distribution value of Ai
?
i
(
ˆ
?
i
) (Empirical) copula value of Ai
and Ai+1
W1, W
2
Objective function to minimize in two-step matching
xvii
T1
, T2
Parameter sets to estimate in the first, and the second step
a Step-size of gradient descent
T
(r)
1
, a
(r)
Estimated parameter, step-size in the r-th iteration
H Coefficient matrix for copula matching
E Constraints coefficient matrix for copula matching
b Constraints vector for copula matching
DM K-S distance between testing marginal and empirical marginal distributions
DC
K-S distance between testing copula and empirical copula
xviii
ACKNOWLEDGEMENTS
I would like to thank:
my supervisors, Dr. Kui Wu and Dr. Venkatesh Srinivasan, for giving me the
strong support and guidance during my PhD. Whenever I am stuck with a research problem or have questions about research, you are always open to help
me. Your continuous advising and mentoring in the fast four years are of great
value to me. I am deeply grateful and happy to pursue a PhD degree under
your supervision.
my husband, Dr. Cheng Chen, for your love. You have always been with me
through all those tough moments. Your encouragement always gives me the
passion and strength to pursue what we believe and what we value the most.
Your companionship makes our life wonderful and full of happiness.
my family, for your unconditional love and companionship. You are always there
to share and witness every moment of my life even though we are not living in
the same country. I feel sorry that we don’t have much time together physically
these years. I would like to express my sincere appreciation for your support
and encouragement during the years of my education.
my labmates and friends, for sincere friendship, your valuable advice and help,
and the unforgettable moments we have spent together.
xix
DEDICATION
To my family.
Chapter 1
Introduction
In this chapter, we describe the motivation for applying copula theory in the computer
network domain, and explain our research goals and contributions.
1.1 Motivation
In the modern society, our daily life heavily depends on computer networks. Everyday,
tremendous network traffic is transmitted in both local area networks and the Internet
for various applications. Whenever we transmit files between hosts, access a remote
computer, visit a website, or watch a video online, network packets are generated
and transmitted on networks. As more and more applications are emerging over the
Internet, there is a high demand to explore accurate and robust models for network
traffic flows.
In many cases, a good network traffic model is a prerequisite for research in computer networks. A good network traffic model means that the model can characterize
and mimic specific real network traffic well. A good model can identify specific network traffic [57], simulate the traffic similar to the real traffic [48], and analyse the
network performance [9].
Network traffic models can be divided into two groups: the models for statistical
properties, such as mean, variance, skewness [21], and the models for dependence,
such as covariance and correlation [46, 53]. The dependence modeling is of great
significance to characterize network traffic and deepen our understanding of network
traffic from a different angle. Considering the period of FIFA World Cup or Olympic
Games, hundreds of thousands of people may visit the same website to watch the
2
game videos from home computers. When modeling the network traffic flows sent
from home computers to the designed server, we cannot just add up the models of
each individual flow, rather we need to take the dependence among the constituent
flows into consideration. A dependence model between traffic flows will lead to a more
accurate model for the aggregate flow and help to improve the analysis of network
performance. In another example where there is a single traffic flow from a source to
a destination, understanding the dependence between its arrivals over different times
is important to predict future arrivals or detect abnormal events [4].
The two scenarios we consider above show the impact of two categories of dependence in network traffic, the contemporaneous dependence and the temporal dependence. The contemporaneous dependence is the dependence between arrivals from
different traffic flows, while temporal dependence is the dependence between arrivals
from the same traffic flow but over different times. Both contemporaneous and temporal dependencies in network traffic are non trivial to model. The contemporaneous
dependence in network traffic is normally ignored for ease of analysis. Network performance analysis under stochastic network framework suffers from this ignorance
and leads to a loose bound on network delay or backlog in practice [44]. The temporal dependence in one network traffic flow has existing solutions that are mostly
based on the covariance or correlation [53]. However, the covariance or correlation can
only measure the linear dependence, which discards abundant dependence information carried by traffic flows. We take the traffic trace BCpAug89 [32] as an example.
Fig. 1.1 shows the scatter plot of the successive arrival counts (number of arrivals)
every second. The shape of the scatter plot shows the dependence between successive
arrival counts. From the figure, the linear dependence only considers the projection
of all the points onto a straight line, while neglecting their (varying) vertical distances
to the line. Therefore, linear dependence measures, such as covariance and autocorrelation, only measure the dependence partially, and are far from sufficient to reflect
the complex dependence structure.
With the significance of network dependence modeling and the lack of rich models
that capture the full spectrum of dependence structures, we are motivated to apply
an advanced tool, copula, to model the functional dependence of network traffic and
apply the new model to improve network studies. Copulas, as the term indicates,
are functions that join one-dimensional marginal distributions to multivariate distributions. As an effective mathematical tool to capture dependence, copulas have
been very popular in the domain of financial analysis, especially for risk manage-
3
0 200 400 600 800
0
200
400
600
800
A
i
A
i+1
Figure 1.1: Scatter plot of successive arrival counts of BCpAug89
ment. To estimate the market risk appropriately, more than one assets need to be
considered. Copulas are shown flexible and useful to measure the dependence between
assets[67, 40] and the dependence along the time series of a single asset[66, 65, 71].
Although copulas have been considerably researched in the finance domain, they are
quite new and rarely exploited in other domains. In recent years, researchers attempt to extend the usage of copulas in other areas. Specifically, copulas are used
in the telecommunication networks domain to model the shortest-path trees[60], and
in the agriculture domain to model the dependence between energy and agricultural
commodities[50, 49]. To the best of our knowledge, copulas are seldom applied in
computer networks domain, though dependence modeling of network traffic attracts
a lot of attention and is considered of great significance for the examination and
improvement of the network performance[46].
1.2 Research Goals
This thesis applies copulas to improve both contemporaneous and temporal dependence modeling of network traffic, which could further benefit the applications relying
on dependence. Specifically, the research goals are described as follows:
1. Contemporaneous dependence modeling: Model the contemporaneous
dependence between network traffic flows with copula. Contemporaneous dependence modeling is integrated to a network analysis framework, stochastic
network calculus (SNC). With the contemporaneous dependence captured, the
derived performance bounds would be tighter and more accurate.
4
2. Temporal dependence modeling: Model the temporal dependence for network traffic flow with copula. The temporal dependence in terms of copula can
be used to improve the following network applications:
• Network traffic prediction: By understanding the temporal dependence
of network traffic, we can find a solution to predict the future arrivals based
on current observations.
• Cloud service provisioning: This application is based on network traffic
predictions. Cloud service can be better offered according to the requested
amount. Designing an effective service provisioning strategy based on prediction of requested amount is a goal in this context.
• Parameter estimation problem: We propose a parameter estimation
method for a widely-used network traffic model, Markov Modulated Poisson Process. The parameters will be estimated by matching statistical
moments and temporal dependence, separately. We study both theoretical and parametric copulas for MMPP and design a method for fast and
accurate parameter estimation.
1.3 Contributions
The thesis makes the following contributions:
1. Copula analysis for contemporaneous dependence in statistical network calculus
In Chapter 3, we integrate copula into the framework of SNC and make the
following contributions:
• we augment the power of SNC with copula analysis to utilize the dependence structure between traffic flows. In particular, copula analysis can be
integrated into the SNC framework to provide tighter performance bounds.
Such analysis offers extra benefit in inferring the adaptive behavior of some
proprietary systems.
• Using copula analysis, we show the range of stochastic bounds that SNC
can achieve. This discovery has a deep implication in the future design of
flow scheduling or input buffering methods.
5
• A real-world case study as well as simulation evaluation demonstrate the
practicality of copula analysis and its improvement over the performance
of SNC that is oblivious to the dependence structures between flows.
2. Copula analysis for temporal dependence of Markov Modulated Poisson Process
In Chapter 4, we fully study the temporal dependence of Markov Modulated
Poisson Process and makes the following contributions:
• We use copula to analyse the dependence structure of MMPP traffic. The
copula-based dependence reveals richer information of temporal dependence and is more powerful than the commonly-used measures, covariance
and correlation.
• We give the exact form of temporal dependence of MMPP with arbitrary
number of states. This is the first theoretical result on the functional
temporal dependence of multi-state MMPP.
• We propose a way to construct copula for superposition of MMPPs. Recursive algorithms are designed to calculate the numerical values of copulas.
• We propose parametric copula modeling method for both single MMPP
and superposition of MMPPs.
3. Application of MMPP copula for traffic prediction
In Chapter 5, we apply MMPP copula for network traffic flow prediction and
make the following contributions:
• We introduce MMPP traffic prediction based on either theoretical copulas
or parametric copulas.
• We demonstrate applications of MMPP copula on both real-world traffic
traces and simulated traffic traces. Both single MMPP flow and superposition of multiple MMPP flows are studied.
• Case studies show that our copula-based traffic prediction method is more
accurate and stable than existing methods.
4. Application of MMPP copula in collaborative auto-scaling of cloud
service
In Chapter 6, we apply MMPP copula in composite cloud service system to
6
design effective service provisioning strategy and make the following contributions:
• We introduce a novel approximation approach that transforms the timeordered, spatially distributed calls to virtual functions (VFs) into a Markov
Modulated Poisson Process (MMPP). This method solves the challenging
problem in performance modeling of composite service, where the workflow of a task may pass through multiple VFs in an arbitrary order. By
analysing the performance of MMPP input into a virtual queue, we can
easily estimate the performance of composite services.
• To address the difficulty that the amount of calls at different VFs might
scale up differently, we introduce a copula model to capture the stable
dependence structure, even if the amount of calls to different VFs may
scale up differently. This unique feature greatly simplifies the dependence
modeling, since there is no need to rebuild the dependence model when
the total amount of service calls varies.
• Cloud brokerage needs a mechanism to carefully balance the cost of purchasing VF resources and the QoS of composite service. As such, we
propose a tiered, collaborative resource auto-scaling strategy, based on the
predictive power of the copula model.
5. Application of MMPP copula in parameter estimation
In Chapter 7, we apply MMPP copula to develop a fast and accurate estimation
method to learn parameters of MMPP, and make the following contributions:
• We model the joint behavior of successive arrival counts in terms of their
marginal distribution and copula. The theoretical forms of marginal distribution and copula of arrival counts in MMPP lay solid foundation for
parameter estimation.
• Based on the MMPP copula, we propose a two-step estimation algorithm,
MarCpa, to estimate MMPP parameters by matching marginal and matching copula separately.
• Case studies with a large number of simulations demonstrate that our
proposed method is more efficient and accurate than existing estimation
methods that learn MMPP parameters from arrival counts.
7
1.4 Publications
Fang Dong, Kui Wu, and Venkatesh Srinivasan. “Copula Analysis for Statistical
Network Calculus,” in 2015 IEEE Conference on Computer Communications (INFOCOM), April 2015.
Fang Dong, Kui Wu, Venkatesh Srinivasan, and Jianping Wang. “Copula Analysis
of Latent Dependency Structure for Collaborative Auto-scaling of Cloud Services”,
in 2016 25th International Conference on Computer Communication and Networks
(ICCCN), August 2016.
Fang Dong, Kui Wu, Venkatesh Srinivasan. “Copula-based Parameter Estimation
for Markov-modulated Poisson Process”, in Proceedings of IEEE/ACM International
Symposium on Quality of Service (IWQoS), June 2017.
Fang Dong, Kui Wu, Venkatesh Srinivasan. “Copula Analysis of Temporal Dependence Structure in Markov Modulated Poisson Process and Its Applications,”
ACM Transactions on Modeling and Performance Evaluation of Computing Systems
(ToMPECS), accepted in May 2017.
8
Chapter 2
Preliminaries on Copula Theory
2.1 Definitions and Basic Properties
We start with the definition of copulas and three core theorems.
Definition 1. (Copulas) A 2-dimensional copula is a function C having the following properties [59]:
1. Its domain is [0, 1] × [0, 1];
2. C is 2-increasing, i.e., for every u
1
, u
2
, v
1
, v
2 ? [0, 1] and u
1 = u
2
, v
1 = v
2
, we
have C (u
2
, v
2
) - C (u
2
, v
1
) - C (u
1
, v
2) + C (u
1
, v
1
) = 0.
3. C (u, 0) = C (0, v ) = 0, C (u, 1) = u, C (1, v ) = v, for every u, v ? [0, 1].
The function is called a subcopula if it has the second and the third properties of
copula, but its domain is b
1 × b
2
, where b
1
and b
2
are subsets of [0, 1] containing 1
and 0.
By definition, a copula is essentially the joint distribution function of two random
variables, denoted by U and V , that follow uniform distributions on the interval [0, 1].
That is, C (u, v) = FUV
(u, v) where U ~ Uni(0, 1), V ~ Uni(0, 1) and FUV
is their
joint distribution. An example is given in Example 1 to visualize the idea. In the
example, the scatter plot shows the way U and V jointly distribute; In other words,
the plot suggests the relationship between U and V . Different relationships will lead
to different copulas. Therefore, the shape of a scatter plot of U and V indicates
copula. Both scatter plot and contour are widely-used ways to visualize a copula.
9
Example 1. Consider two random variables U and V that follow uniform distribution
on [0, 1] and their samples shown in scatter plot in Fig. 2.1a. The scatter plot shows
how U and V jointly distribute on two dimensional plane. The contour of the related
copula is shown in Fig. 2.1b.
0 0.5 1
U9Uni(0,1)
0
0.5
1
V9Uni(0,1)
(a) Scatter plot of samples of (U, V )
0.1 0.1
0.1 0.1
0.2
0.2
0.2
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.6
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1
u
0
0.2
0.4
0.6
0.8
1
v
(b) Contour of the related copula
Figure 2.1: An explanatory example of the definition of copula.
Theorem 1. (Sklar’s theorem) [59] Let FXY be a joint distribution function with
marginals FX
and FY
, then there exists a copula C such that for for all x and y,
FXY (x, y) = P r(X = x, Y = y) = C (FX
(x), F
Y
(y)).
If the marginals FX
and FY
are continuous, then copula C is unique; otherwise,
C is uniquely determined on the range of the marginals. Example 2 is given for
explanation of the theorem. Sklars theorem is the core of copula theory. It shows
how copula connects marginals with joint distribution, which is the essential way
that copula captures dependence between random variables. On one hand, Sklars
theorem is especially useful since the joint distribution of random variables is hard
to find directly in many applications [11, 59]. In this situation, integration of a
copula model and marginals makes it easy to understand the joint behaviour. On the
other hand, Sklar’s theorem implies that copula, as a dependence measure, is entirely
separated from both marginals and joint distribution. The modeling of marginal
distributions and the modeling of copula could be totally separate to fit different
application scenarios.
Example 2. Consider two random variables X ~ Exp(1) and Y ~ Gaussian(1, 2.5),
with their samples (x, y) shown in Fig. 2.2a. Regarding the marginal distribution value
10
of X and Y as random variable U and V , every sample pair (x, y) is mapped to a
sample pair (u, v) in the marginal domain in the way
u = FX
(x) = P r(X = x) = 1 - e
-x
,
v = FY
(y) = P r(Y = y) =
1
2.5
v
2p
Z
y
-8
-(y
0
- 1)
2
2 * 2.5
2
dy
0
.
The scatter plot of U and V in Fig. 2.2b indicates the copula that represents the joint
distribution of U and V , and is called the copula between X and Y . The copula links
the marginal distribution of X and Y into their joint distribution in the way
P r(X = x, Y = y) = P r(U = u, V = v) = C (u, v) = C (F (x), F (y)).
0 2 6
2
4
6
8
4
X ~Exp(1)
Y ~ Gaussian(1,2.5)
x-y scatter
(a) X-Y scatter plot.
0 1
0
0.2
0.4
0.6
0.8
1
0.5
U ~Uni(0,1)
V ~Uni(0,1)
u-v scatter
(b) U -V scatter plot.
Figure 2.2: An explanatory example of Sklar’s theorem.
Theorem 2. (The invariant property of copulas) [59] Let X and Y be continuous random variables with copula CXY . If a1
and a2
are strictly increasing functions
on the range of X and the range of Y , respectively, then Ca1(X)a2(Y ) = CXY . In other
words, CXY is invariant under strictly increasing transformations of X and Y .
As Sklar’s theorem shows, copula is independent from both marginals and joint
distributions, so the dependence in terms of copula is stable when the marginals
change functionally, which is formally defined in the above invariant property. The
practical meaning of the invariant property in computer networks domain is that
the contemporaneous dependence between traffic flows and the temporal dependence
11
within one traffic flow in terms of copula will remain the same, even when the flow
arrivals all scale up functionally. On this condition, we don’t need to build the
dependence repeatedly. Example 3 shows an example for the invariant property. The
example also shows other dependence measures, such as correlation and covariance,
don’t satisfy the invariant property, making copula much more stable for practical
use.
Example 3. X1
is a random variable Gaussian distributed with the mean as 0 and
the standard deviation as 1. Y
1
is a random variable functionally dependent with X1
,
i.e., Y
1 = X
2
1
. Fig. 2.3a and 2.3b shows the scatter plot of X1
and Y
1
, and the scatter
plot in the marginal domain, respectively.
-2 0 2
0
5
10
X
1
~ Gaussian(0,1)
Y
1
~X
1
2
X
1
- Y
1
scatter
(a) X1
-Y
1
scatter plot.
0 1
0
0.2
0.4
0.6
0.8
1
0.5
U1
~Uni(0,1)
V1
~Uni(0,1)
u
1
-v
1
scatter
(b) U1
-V
1
scatter plot.
0 5 10
5
10
15
X
2
~X
1
2
Y
2
~Y
1
+3
X
2
- Y
2
scatter
(c) X2
-Y
2
scatter plot.
0 1
0
0.2
0.4
0.6
0.8
1
0.5
U2
~Uni(0,1)
V2
~Uni(0,1)
u
2
-v
2
scatter
(d) U2
-V
2
scatter plot.
Figure 2.3: An explanatory example of the invariant property
Let’s generate another two random variables by applying increasing functions on
X1
and Y
1
respectively, e.g., X2 = X
2
1
, Y
2 = Y
1
+ 3. After the transformation, the
12
X2 - Y
2 scatter plot, in Fig. 2.3c, appears completely different from X1 - Y
1
scatter
plot. However, in the marginal domain, the scatter plot turns to be the same as
comparing Fig. 2.3d and 2.3b. As the scatter plot figures of U1 - V
1
and U2 - V
2
indicate two copulas, we can tell the dependence structure between random variables,
in terms of copulas, has been kept stable under the increasing function transformation.
From Figs. 2.3a and 2.3c, we can also tell that X1
and Y
1
are not linearly dependent,
whereas X2
and Y
2
are. Therefore, the linear dependence structure is not invariant
under functional transformation.
Theorem 3. (Fr´echet-Hoeffding bounds) [59] For every copula C and for all
u, v in [0, 1], the following inequality holds
Clb
(u, v) = max(u + v - 1, 0) = C (u, v) = min(u, v) = Cub
(u, v). (2.1)
We refer to Cub
as the Fr´echet-Hoeffding upper bound and Clb
as the Fr´echet-Hoeffding
lower bound.
Fr´echet-Hoeffding bounds show the range of all possible copulas. Consider copula
C to model the dependence between X and Y . When C = Clb
, Y is a decreasing
function of X ; when C = Cub
, Y is an increasing function of X [35]. Therefore Fr´echetHoeffding bounds actually capture two extreme functional dependencies. Except for
these two special copulas, a third important copula is product copula, Cind
(u, v) = uv.
X and Y is independent if their copula is Cind
. Figs. 2.4, 2.5 and 2.6 visualize the
copulas Clb
, Cind
and Cub
, respectively, with their scatter plot figures and contour
figures.
Theorem 4. (Inversion method) [59] Let FXY be a joint distribution function
with marginals FX
and FY
. Let F
-1
X
and F
-1
Y
be the inverse function of FX
and FY
.
Then the copula between X and Y can be constructed as
C (u, v) = FXY (F
-1
X
(u), F
-1
Y
(v)) ?u, v,
such that
FXY (x, y) = C (FX
(x), F
Y
(y)) ?x, y.
The inversion method is used to construct a theoretical copula for the problem
at hand. It uses Sklar’s theorem to construct copulas. The inversion method leads
13
0 0.5 1
U9Uni(0,1)
-1
-0.5
0
V9Uni(0,1)
(a) Scatter plot in U - V plane.
0.1 0.1 0.1 0.1
0.2 0.2 0.2
0.3 0.3 0.3
0.4 0.4 0.4
0.5 0.5
0.6 0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1
u
0
0.2
0.4
0.6
0.8
1
v
(b) Copula contour.
Figure 2.4: Fr´echet-Hoeffding lower bound copula Clb
.
0 0.5 1
U9Uni(0,1)
0
0.5
1
V9Uni(0,1)
(a) Scatter plot in U - V plane.
0.1
0.1
0.1
0.2
0.2
0.2
0.3
0.3
0.4
0.4
0.5
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1
u
0
0.2
0.4
0.6
0.8
1
v
(b) Copula contour.
Figure 2.5: Product copula Cind
.
to a unique copula when the marginals are continuous, and leads to a unique subcopula when the marginals are not continuous. The unique subcopula can be easily
extended to a copula via various ways, for instance, bilinear interpolation [59]. Thus,
a subcopula shares most properties of copulas. In the following context, we do not
differentiate between subcopula and copula, because their difference does not impact
the our analysis and application in following chapters.
2.2 Copula-based Dependence Measures
The copula-based dependence measures satisfy the invariant property as shown in
Theorem 2. There are two main ways to measure the copula-based dependence. One
is based on concordance statistics, which measures the extent to which two random
14
0 0.5 1
U9Uni(0,1)
0
0.5
1
V9Uni(0,1)
(a) Scatter plot in U - V plane.
0.1 0.1
0.1 0.1
0.2
0.2
0.2
0.3
0.3
0.3
0.4
0.4 0.4
0.5
0.5
0.6
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1
u
0
0.2
0.4
0.6
0.8
1
v
(b) Copula contour.
Figure 2.6: Fr´echet-Hoeffding upper bound copula Cub
.
variables are both large or small at the same time. The other one is tail dependence,
which measures the amount of dependence in the upper and lower quadrant tail of
joint distributions.
Kendall’s tau and Spearman’s rho are two popular copula-based dependence measures defined in terms of concordance. Their definitions are as follows:
Definition 2. (Kendall’s tau) [59] Let (Xi
, Y
i
) and (Xj
, Y
j
) denote two observations from a vector (X, Y ) of continuous random variables with copula between X and
Y as C (u, v), the Kendalls’ tau is defined as
?
t = P r((Xi -Xj
)(Y
i -Y
j
) > 0)-P r((Xi -Xj
)(Y
i -Y
j
) < 0) = 4
Z
1
0
Z
1
0
C (u, v)dC (u, v)-1.
(2.2)
Definition 3. (Spearman’s rho) [59] Let (Xi
, Y
i
), (Xj
, Y
j
) and (Xk
, Y
k
) denote
three observations from a vector (X, Y ) of continuous random variables with copula
between them as C (u, v), the Spearman’s rho is defined as
?
s
= 3(P r((Xi -Xj
)(Y
i -Y
k
) > 0)-P r((Xi -Xj
)(Y
i -Y
k
) < 0)) = 12
Z
1
0
Z
1
0
C (u, v)dudv-3.
(2.3)
Essentially, both Kendall’s tau and Spearman’s rho are calculated by using concordance minus discordance between samples of two random variables. Although
their values could be quite different, they have the same range from 0 to 1, and
are monotonic increasing functions of each other. From the values of Kendall’s tau
15
and Spearman’s rho, the degree of dependence is explained as follows: a large value
indicates stronger positive functional dependence between variables, and a smaller
value indicates stronger negative functional dependence between variables. The functional dependence degree is reflected by absolute values |?
t | or |?
s|. Three special
dependence values are listed below with the related copulas [29]:
• ?
t
= 1 or ?
s
= 1 is equivalent to C = Cub
, indicating the largest positive
functional dependence;
• ?
t = -1 or ?
s = -1 is equivalent to C = Clb
, indicating the largest negative
functional dependence;
• ?
t
= 0 or ?
s
= 0 is equivalent to C = Cind
, indicating the independence
As copula-based measures, both Kendall’s tau and Spearman’s rho can capture dependence beyond linear scope. Taking X1
and Y
1
in Example 3 as an example, the
copula between X1
and Y
1
is Cub
, and the copula-based dependence degree between
the two random variables are ?
t
= 1 and ?
s = 1. With copula-based dependence measures, the strong functional dependence between X1
and Y
1
has been shown. However,
with linear dependence measures, for example, Pearson correlation coefficient, ? = 0
between X1
and Y
1
shows a zero dependence degree, and does not reflect the actual
dependence.
Tail dependence calculates the probability that two random variables achieve extreme large (or small) value simultaneously. The upper tail dependence and lower
tail dependence are defined as follows:
Definition 4. (Tail dependence) [29] Given two random variables X and Y with
marginals as FX
and FY
, and their copula C , the upper tail dependence is
?
+
t
= lim
u?1
P r(X > F
-1
X
(u)|Y > F
-1
Y
(u)) = lim
u?1
1 - 2u + C (u, v)
1 - u
; (2.4)
the lower tail dependence is
?
-
t
= lim
u?0
P r(X < F
-1
(u)|Y < F
-1
(u)) = lim
u?0
C (u, u)
u
. (2.5)
In practice, the tail dependence shows the possibility of the concurrence of two
extreme events. The information on the concurrence of extreme events gives a new
aspect of understanding of dependence, and is helpful to monitor and identify events
on extreme conditions.
16
2.3 Parametric Copulas
In many applications, the exact copulas between random variables are difficult to construct. So parametric families of copulas have been proposed and explored to cover
various types of dependence structures. Elliptical copulas and Archimedean copulas
are two copula families mostly studied. Elliptical copulas are derived from multivariate distribution implicitly. They strictly have symmetrical lower tail dependence and
upper tail dependence, indicating that the probability of occurrence of extreme large
values is equal to the probability of occurrence of extreme small values. The typical
elliptical copulas are Gaussian copula and Student’s t copula.
Archimedean copulas are explicit copulas, which have clear and closed forms.
Compared with elliptical copulas, Archimedean copulas are more flexible on the property of tail dependence. They could model either equal or distinct upper and lower
tail dependence. Besides, Archimedean copulas are easier to construct due to the few
parameters to estimate. Even with few parameters, this family of copulas include a
great variety of copulas, and can model the dependence structure very effectively. All
these advantages make Archimedean copulas good candidates for most applications.
Three popular one-parameter Archimedean copulas are Clayton copula, Gumbel copula and Frank copula:
• Clayton copula
C (u, v; ?) = [max{u
-?
+ v
-?
- 1, 0}]
-1/?
, ? ? [-1, 8) \ {0};
• Frank copula
C (u, v; ?) = -
1
?
log[1 +
(exp(-?u) - 1)(exp(-?v) - 1)
exp(-?) - 1
], ? ? [-8, 8) \ {0};
• Gumbel copula
C (u, v; ?) = exp[-((- log u)
?
+ (- log v)
?
)
1/?
], ? ? [1, 8).
The scatter plot figures of these three copulas are shown in Fig. 2.7. The three
copulas are widely used due to several reasons. First, they are all one-parameter
copulas, making it easier to fit models into the real problem. Second, the parameter of
copula relates to copula-based dependence, Kendall’s tau and Spearman’ rho directly.
17
0 0.5 1
u
0
0.2
0.4
0.6
0.8
1
v
(a) Clayton copula.
0 0.5 1
u
0
0.2
0.4
0.6
0.8
1
v
(b) Frank copula.
0 0.5 1
u
0
0.2
0.4
0.6
0.8
1
v
(c) Gumbel copula.
Figure 2.7: Scatter plot figures of three Archimedean copulas with parameter ? = 7.
For instance, ?
t = ?/(? + 2) for Clayton copula, and ?
t
= 1 - 1/? for Gumbel copula.
Thus the copula parameter itself reflects the degree of dependence. Finally, the
three copulas capture three extremely distinct tail dependencies. Specifically, Clayton
copula captures low tail dependence, Gumbel copula captures upper tail dependence,
and Frank copula capture symmetric tail dependence. Taking Clayton copula as an
example, we can observe that samples cluster on the bottom left of scatter plot in
Fig. 2.7a, indicating strong lower tail dependence. In this thesis, we will exploit these
three Archimedean copulas for dependence modeling in network applications.
2.4 Empirical Copula
Empirical copula is statistically counted from samples and defined as
Definition 5. Given two random variables X and Y , and n number of observed
18
sample pairs (x
i
, y
i
). The empirical copula between X and Y ,
ˆ
C is defined as:
ˆ
C (u, v) =
1
n
n X
i=1
1(u
i = u, v
i = v) =
1
n
n X
i=1
1(
ˆ
FX
(x
i
) = u,
ˆ
FY
(y
i
) = v), (2.6)
where
ˆ
FX
and
ˆ
FY
are empirical marginal distribution functions defined as
ˆ
FX
(x
i ) =
1
n
n X
i
0
=1
1(x
i
0 = x
i
) (2.7)
ˆ
FY
(y
i ) =
1
n
n X
i
0
=1
1(y
i
0 = y
i
) (2.8)
From the definition, the empirical copula is purely determined by samples, so it is
the raw model that represents the samples. Empirical copula can be used as benchmark to test whether a parametric copula is the underlying copula of samples [36].
Many research works use empirical copula for goodness-of-fitting test [26, 36]. If the
parametric copula to test is close enough to empirical copula, it can be accepted as
the underlying copula; otherwise, it is not the copula of samples.
2.5 Summary
From the introduction to copula theory in this chapter, we show the advantages of
copulas for dependence modeling. First, copulas can measure the functional dependence beyond linear scope with Spearman’s rho and Kendall’s tau. Second, copulas
separate marginals from joint distributions, allowing copulas to remain stable and
invariant even when the marginals change functionally. Third, copulas are very useful to reveal the joint information of random variables. Usually, marginals are more
accessible than joint distributions, and joint distributions are hard to find directly.
In this situation, integration of a copula model and marginals makes it easier to understand the joint behaviour. All these benefits of copulas help to better understand
the dependence in network traffic. Therefore, in this thesis, we use copula theory
for dependence modeling, and explore its applications in different computer network
scenarios.
19
Chapter 3
Copula Analysis for
Contemporaneous Dependence and
Its Application in Statistical
Network Calculus
3.1 Introduction
Since its introduction in early 1990s [22], network calculus has been widely adopted to
analyse complex queueing systems, such as multimedia networks, where the Markovian property of arrivals generally does not hold and thus traditional queueing theory
becomes hard to apply. Network calculus was initially developed along the deterministic track [15, 51] and later evolved to stochastic version [15, 18, 30, 44]. Stochastic
network calculus (SNC) has received much attention in recent years due to its power
in deriving probabilistic performance bounds, which are more meaningful in practice.
The practical use of SNC, however, has faced challenges due to the lingering
problem in deriving tight stochastic performance bounds [20]. In particular, inappropriate traffic models and the extensive use of model transform may lead to loose
performance bounds [20]. While substantial efforts have been devoted to improving
the bounds [19, 42], the problem has only been tackled for special types of traffic and
service models, using probability inequalities, e.g., Chernoff bounds and martingale
inequalities. In many cases, the independence assumption is required to ease the
analysis, e.g., the independence of the traffic arrivals and the independence between
20
the arrivals and the service.
In practice, loose bounds may occur due to the inaccurate a-prior traffic arrival
models and/or the obliviousness of potential correlations in traffic flows. To alleviate
this problem, Beck et al. [9] proposed to integrate statistical inference, based on past
traffic data, into SNC. This important move opens the door for new opportunities to
use the powerful analytical toolsets of SNC for real-world applications.
Along the same line of statistical modeling and inference, this chapter points
out the potential benefits of using copula theory in SNC. With copula analysis and
numerical experiments, we clearly show the region where copulas can be helpful and
the best bound that SNC can possibly achieve. Statistical analysis on real-world trace
data in an experiment with Skype conference calls shows that copula analysis can
discover the (hidden) correlation between traffic flows, which in turn can help obtain
tighter performance bounds. The discovery of copula modeling provides hints and also
sheds light on the adaptive strategies in proprietary systems such as Skype. To the
best of our knowledge, none of existing work has utilized copula analysis to enhance
the capability of SNC in traffic modeling and performance bounds improvement.
3.2 Related Work
The theory of network calculus was first proposed by Cruz in 1991 [22, 23] for network performance evaluation. There are two main tracks of network calculus theorydeterministic network calculus (DNC) and stochastic network calculus (SNC). The
details and the results of deterministic network calculus theory can be found in the
books [51, 15]. This track of research only analyse the performance bounds of the
worst case, which are too loose for practical use.
As an alternative, stochastic network calculus was developed. The basic properties
and results of SNC are concluded in [43, 44]. It is generally non-trivial to derive
tight performance bounds. Union bounds are generally used [44]. To achieve tighter
performance bounds, independence case study is introduced in [44], which assumes
the independence between arrivals and service. In addition, Martingales have been
used to tighten the performance bounds [68, 20, 19, 42]. The basic idea is to construct
a Martingale process and derive performance bounds with Doob’s inequality.
Whether or not the above proposed assumptions accord well with the real traffic
or service process is a problem. An error model may lead to failure when applying
theoretical bounds on real case study. To avoid this situation, Beck et al [9] propose
21
statistical network calculus (StatNC). In his work, traffic models are established by
measuring arrivals statistically. The performance bounds are also analysed in a statistical way. Due to advantage of StatNC, the performance bounds study reap great
accuracy and robustness for different cases. Our work in Chapter 3 shares the same
spirit as StatNC. Nonetheless, they are significantly different in that we make use of
copula to capture the dependence between flows, while the work of [9] mainly focuses
on the statistical estimation of a single flow.
3.3 Background of Stochastic Network Calculus
We introduce the notation and key concepts of stochastic network calculus [44, 54].
We assume that all arrival curves and service curves are non-negative and wide-sense
increasing functions. Conventionally, A(t) and A
*
(t) are used to denote the cumulative
traffic that arrives and departs in time interval (0, t], respectively, and S (t) is used
to denote the cumulative amount of service provided by the system in time interval
(0, t]. For any 0 = s = t, let A(s, t) = A(t) - A(s), A
*
(s, t) = A
*
(t) - A
*
(s), and
S (s, t) = S (t) - S (s). By default, A(0) = A
*
(0) = S (0) = 0.
We denote by F the set of non-negative wide-sense increasing functions, i.e.,
F = {f (·) : ?0 = x = y, 0 = f (x) = f (y)},
and by
¯
F the set of non-negative wide-sense decreasing functions, i.e.,
¯
F = {f (·) : ?0 = x = y, 0 = f (y) = f (x)}.
For any random variable X , its distribution function, denoted by
FX
(x) = P r{X = x},
belongs to F , and its complementary distribution function (or survival function),
denoted by
¯
FX
(x) = P r{X > x},
belongs to
¯
F .
The (min, +) convolution of functions f and g is useful for SNC, and is defined
22
under the (min, +) algebra [15, 22, 51]:
(f ? g)(t) = inf
0=s=t
{f (s) + g(t - s)}. (3.1)
In addition, the (min, +) deconvolution [15, 22, 51] of functions f and g is defined
as:
(f  g)(t) = sup
s=0
{f (t + s) + g(s)}. (3.2)
For simplicity, we denote [x]
1 = min{x, 1} and [x]
+
= max{x, 0} in the following.
Stochastic traffic arrival curve and stochastic service curve are core concepts in
stochastic network calculus, with the former used for traffic modeling and the latter
for service modeling. In the literature, there are different definitions of stochastic
arrival curve and stochastic service curve [44].
Definition 6. The t.a.c. model [44]: A flow A(t) is said to have a traffic-amountcentric (t.a.c.) stochastic arrival curve a ? F with bounding function f ?
¯
F , denoted
by
A ~tac
hf, ai ,
if for all t = s = 0 and all x = 0,
P r{A(s, t) - a(t - s) > x} = f (x). (3.3)
In addition, we call A(t - d, t), 0 = d = ? the statistic of A within sliding window
of size ?.
Intuitively, the above model means the cumulative amount of traffic arrivals in
any time period is upper bounded by a function with some violation probability. The
model is actually quite general and covers several broadly-used models. For example,
the stochastically bounded burstiness (SBB) model [82] is a special case of the t.a.c.
model by setting a(t - s) = ? · (t - s). Following the same notation, when the traffic
arrival A(t) follows the SBB model with upper rate of ? and bounding function of f ,
we denote it as A ~SBB
hf, ? i.
Definition 7. The v.b.c. model [44]: A flow A(t) is said to have a virtual-backlogcentric (v.b.c.) stochastic arrival curve a ? F with bounding function f ?
¯
F , denoted
by
A ~vbc
hf, ai ,
23
if for all t = s = 0 and all x = 0,
P r{ sup
0=s=t
{A(s, t) - a(t - s)} > x} = f (x). (3.4)
In addition, we call sup
0=d=?
{A(t - d, t)} the statistic of A within sliding window
of size ?.
Remark 1. Assume that a (virtual) server with service rate a is fed with arrival A.
The term sup
0=s=t
{A(s, t) - a(t - s)} represents the backlog of this virtual server at
time t. Intuitively, the v.b.c model implies that the queue length of a virtual server of
service rate a fed with the flow A is upper-bounded with some violation probability [44].
We adopt the following model for services:
Definition 8. The s.s.c. model [44]: A server is said to to provide a strict stochastic
service curve (s.s.c.) ß ? F with bounding function g ?
¯
F , denoted by
S ~ssc
hg, ß i ,
if during any period (s, t] the amount of service S (s, t) provided by the server satisfies
P r{S (s, t) < ß(t - s) - x} = g(x). (3.5)
Remark 2. In the literature, there are different definitions of stochastic arrival curve
and stochastic service curve [44]. We adopt the above models with the consideration
of their capability to model real-world traffic/services and the convenience to derive
performance bounds.
3.4 Insights of Copula Analysis
3.4.1 Basic Lemmas
In stochastic network calculus, we are often interested in the complementary distribution function of Z = X + Y , i.e., P r{Z > z}. The following two lemmas have been
widely used in the derivation of stochastic bounds.
Lemma 1. General case [44]: For the sum of two random variables X and Y , Z =
X + Y , no matter whether X and Y are independent or not,
¯
FZ
(z ) = (
¯
FX ?
¯
FY
)(z ).
24
Lemma 2. Independent case: Assume that non-negative random variables X and
Y are independent and
¯
FX
(x) = f (x) and
¯
FY
(x) = g(x), where f, g ?
¯
F . Then, for
all x = 0, P r{X +Y > x} = 1 -(
¯
f * ¯ g)(x), where
¯
f (x) = 1 -[f (x)]
1
, ¯ g(x) = 1 -[g(x)]
1
,
and * is the Stieltjes convolution operation.
The following lemmas from copula analysis are useful for SNC:
Lemma 3. Let Z be the sum of two random variables X and Y . The survival function
of Z ,
¯
FZ
(z ) can be calculated in terms of FX
(x), FY
(y) and their copula CXY :
¯
FZ
(z ) = 1 -
Z Z
x+y<z
dC (FX
(x), F
Y
(y)). (3.6)
Proof. Let f
Z
(z ) be the probability density function (pdf) of Z , and FZ
(z ) be its
distribution function. Let f
XY (x, y) be the joint probability density function of X
and Y , and FXY (x, y) be the joint distribution function of X and Y . Since Z is the
sum of X and Y , its pdf can be represented as
f
Z
(z) =
Z
8
-8
f
XY (x, z - x)dx =
Z
8
-8
f
XY (z - y, y)dy. (3.7)
Accordingly, FZ
(z ) is derived as follows:
FZ
(z) =
Z
z
-8
f
z
(t)dt =
Z
z
-8
Z
8
-8
f
XY (x, t - x)dxdt
=
Z
z-x
-8
Z
8
-8
f
XY (x, y)dxdy =
Z Z
x+y<z
f
XY (x, y)dxdy
=
Z Z
x+y<z
dFXY (x, y)
=
Z Z
x+y<Z
dC (FX
(x), F
Y
(y)).
(3.8)
With
¯
FZ
(z ) = 1 - FZ
(z ), Eq.(3.6) holds.
Note that Lemma 3 can be extended to multivariate case.
Lemma 4. Copula case: Let Z be the sum of two random variables X and Y .
Then
ˆ
¯
FZ
(z ) =
¯
FZ
(z ) =
?
¯
FZ
(z ), (3.9)
25
where
ˆ
¯
FZ
(z ) = 1 - sup
x+y=z
{Clb
(FX
(x), F
Y
(y))}, (3.10)
?
¯
FZ
(z ) = 1 - inf
x+y=z
{
e
Clb
(FX
(x), F
Y
(y))}, (3.11)
where Clb
is the Fr´echet-Hoeffding lower bound copula defined in Theorem 3, and
e
Clb
(u, v) = u + v - Clb
(u, v) = min(u + v, 1). The proofs of Lemma 4 are similar to
those in [59] with slight modifications.
3.4.2 An Example of Copula Analysis
Markov modulated processes have been extensively used for representing multimedia
traffic [78]. It has been shown that Markov modulated traffic could be captured with
the stochastically bounded burstiness (SBB) model. Assume that we are given two
Markov modulated processes A1 ~SBB
hf
1
, ?
1
i and A2 ~SBB
hf
2
, ?
2
i. As a concrete
example, we assume that both bounding functions, f
1
and f
2
, have the exponential
form [82] with mean values of r
1
and r
2
, respectively. We are interested in modeling
the superposition of A1
and A2
, A = A1 + A2
.
Let X and Y be exponentially distributed random variables with mean values of
r
1
and r
2
, respectively, and let Z = X + Y . With Lemma 1, we have the following
bound, denoted as the general bound since it holds for any X and Y :
¯
FZ
(z) =
?
?
?
1, z < ?
e
-
z-?
r1
+r2 , z = ?
(3.12)
where ? = (r
1 + r
2
) ln(r
1 + r
2
) - r
1
ln(r
1
) - r
2
ln(r
2
).
If we know that X and Y are independent, we have the following bound, denoted
as the independent bound :
¯
FZ
(z) =
?
?
?
[(1 +
z
?
)e
-
z
r ]
1
, r
1 = r
2 = r
[
r1e
-
z
r1 -r2e
-
z
r2
r1-r2
]
1
, r
1
6 = r
2
(3.13)
Based on Lemma 4, we have the following upper and lower bounds of
¯
FZ from
copula.
26
Theorem 5. Let X and Y be exponentially distributed random variables with means
r
1
and r
2
, respectively. Let
ˆ
¯
FZ
and
?
¯
FZ
be as in Lemma 4. Then
ˆ
¯
FZ
(z) =
?
?
?
1, z < ?
e
-
z-?
r1
+r2 , z = ?
(3.14)
and
?
¯
FZ
(z) =
?
?
?
1, z < 0
e
-
z
max(r1,r2)
, z = 0
(3.15)
where ? = (r
1 + r
2
) ln(r
1 + r
2
) - r
1
ln(r
1
) - r
1
ln(r
2
).
It is easy to show the following theorem to model the superposition of A1
and A2
,
A = A1 + A2
.
Theorem 6. Assume that A1 ~SBB
hf
1
, ?
1
i and A2 ~SBB
hf
2
, ?
2
i, where f
1
and f
2
have the exponential form. The superposition of A1
and A2
, A ~SBB
hg, ?
1 + ?
2
i,
where g can be calculated:
• with Equation (3.12) (general bound, applicable to any situation),
• or with Equation (3.13) (independent bound, applicable when A1
and A2
are
independent),
• or with Equation (3.14) (upper bound with copula), or with Equation (3.15)
(lower bound with copula), or with Equation (3.6) (if the copula between A1
and
A2
is known).
Note that due to Lemma 4, the lower bound with copula indicates the tightest
bound that we can possibly obtain with SNC when the upper rate is ?
1 + ?
2
.
Figs. 3.1 and 3.2 show two numerical examples. We have the following interesting
observations from the figures:
• The general bound is the same as the upper bound with copula, indicating that
the general bound is actually the loosest bound.
• There is a clear gap between the independent bound and the lower bound with
copula.
27
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
z
Prob
general
upper bound 
independent
lower bound
Figure 3.1: Different Bounds with r
1
= 0.5, r
2
= 1
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
z
Prob
general
upper bound 
independent
lower bound
Figure 3.2: Different Bounds with r
1
= 2, r
2
= 2
Remark 3. The gap between the independent bound and the lower bound with copula
has important implication. When the dependence of flows is unclear or hard to determine, independent case analysis does not always lead to the best bound. There is
much room for us to explore for improving stochastic bounds with copulas.
3.4.3 Performance Bounds of SNC with Copulas
The following measures are of interest in service guarantee analysis:
• The backlog B(t) of flow A in the system at time t is defined as:
B(t) = A(t) - A
*
(t). (3.16)
28
• The delay D(t) of flow A at time t is defined as:
D(t) = inf {t = 0 : A(t) = A
*
(t + t )}. (3.17)
If copula statistics are known, we have the following theorem to model the superposition of traffic flows:
Theorem 7. Assume that Ai ~vbc
hf
i
, ?
i
i (i = 1, . . . , n). Assume that the statistic
of Ai
(i = 1, . . . n) within sliding window of size ?, denoted as Xi
, has a marginal
distribution function FXi
. Assume that the copula of X1, . . . , X
n
, C (FX1
, . . . , F
Xn
),
is known. Based on the Definition 7 and the extended multivariate case of Lemma 3,
the superposition of A1, . . . , A
n
, A ~vbc
hg, ?
1 + . . . + ?
n
i, where g can be calculated
as:
g(z ) = 1 -
Z
· · ·
Z
x1+...+xn<z
dC (FX1
(x
1
), . . . , F
Xn
(x
n
)). (3.18)
With Theorem 7 in this thesis and Theorems 4.9 and 5.1 in [44], we have the
following bound on backlog:
Theorem 8. Backlog Bound: Consider a system with input flows A1, . . . , A
n
.
Assume that Ai ~vbc
hf
i
, ?
i
i. Assume that the statistic of Ai
(i = 1, . . . n) within
sliding window of size ?, Xi
, has a marginal distribution function FXi
and that the
copula of X1, . . . , X
n
, C (FX1
, . . . , F
Xn
), is known. Assume that the system provides
to the input a service curve S ~ssc
hg, ß i . The backlog B(t) is bounded by
P r{B(t) > x} = (f ? g)(x - a  ß (0)) (3.19)
where
a =
n X
i=1
?
i
, (3.20)
f = 1 -
Z
· · ·
Z
x1+...+xn<z
dC (FX1
(x
1
), . . . , F
Xn
(x
n
)). (3.21)
In addition, with Theorem 7 in this thesis and Theorems 4.9 and 5.4 in [44], we
have the following bound on delay:
Theorem 9. Delay Bound: Consider a system with input flows A1, . . . , A
n
. Assume that Ai ~vbc
hf
i
, ?
i
i. Assume that the statistic of Ai
(i = 1, . . . n) within sliding
29
window of size ?, Xi
, has a marginal distribution function FXi
and that the copula
of X1, . . . , X
n
, C (FX1
, . . . , F
Xn
), is known. Assume that the system provides to the
input a service curve S ~ssc
hg, ß i . The delay D(t) is bounded by
P r{D(t) > h(a + x, ß )} = (f ? g)(x) (3.22)
where h denotes the maximum horizontal distance between two curves and
a =
n X
i=1
?
i
, (3.23)
f = 1 -
Z
· · ·
Z
x1+...+xn<z
dC (FX1
(x
1
), . . . , F
Xn
(x
n
)). (3.24)
Remark 4. Following the same principle presented in [9], Theorems 8 and 9 both
rely on the statistics of flow arrivals.
3.5 Copula Modelling at Work
3.5.1 Copula Analysis in Real-world Applications
Real-world Experiments
To obtain an initial idea on traffic model of real-world flows and their dependence, we
study traffic in Skype group calls as a preliminary step. The experiment scenario is
shown in Fig. 3.3. Three users enter a Skype group call over a campus network. We
name their IP addresses as IP 1, IP 2 and IP 3. During the group chatting, data flows
(marked as dashed lines in Fig. 3.3) are transmitted between each pair of terminals.
The outflows from IP 1 (marked as red dashed lines in Fig. 3.3) are identified as
the 1st flow and the 2nd flow, respectively. The data packets of the two flows are
captured with Wireshark. The captured information includes the frame number, the
time, the source IP, the destination IP, the protocol, the length of packages, etc. To
draw a reliable conclusion, we perform three independent experiments, each of which
records traffic data of a group call for more than 20 minutes. Data collected from the
three experiments is saved in Dataset 1, Dataset 2, and Dataset 3, respectively.
30
IP 1
IP 2
IP 3
Figure 3.3: Experiment scenario
Traffic Modelling
We define a random variable a to represent the amount of data sent per unit of time
(set as 1 second in our analysis). The values of observed samples of this random
variable are denoted as a. Then the traffic during time interval (s, t] can be regarded
as the cumulative amount of traffic in each unit of time, i.e.
A(s, t) =
t X
i=s+1
a
i
, (3.25)
where a
i
is the observed value of a in i-th unit of time in the interval. Similarly, the
traffic process A(0, t) can be represented as a series of observed samples a
1
, a
2
, . . . , a
t
of a. By modelling the distribution of random variable a, a traffic process becomes
analytically easy to study.
Therefore, we have two random variables to model in the datasets. One is the
amount of data sent per unit of time in the 1st flow, denoted as a
1
; the other is
the amount of traffic sent per unit of time in the 2nd flow, denoted as a
2
. To save
space, we only show the histogram of sample values of a
1
and a
2
based on one dataset
(Dataset 1), since the results from the other two datasets are similar. As shown in
Fig. 3.4, the shape of the histograms seems to suggest a mixture of two Gaussian
distributions
1
. The general form of cumulative distribution function (CDF) of the
1
We also tested other distributions such as Gaussian and gamma distributions, but the data
failed the test.
31
mixed distribution is
F (x) = ?F(
x - µ
1
s
1
) + (1 - ?)F(
x - µ
2
s
2
), (3.26)
where F is the CDF of the standard Gaussian distribution Gaussian(0, 1). The
formula indicates the mixed distribution combining two weighted Gaussian distributions, Gaussian(µ
1
, s
1
) and Gaussian(µ
2
, s
2
). There are five parameters to estimate
in Eq. (3.26), ?, µ1
, s
1
, µ
2
, s
2. These parameters are computed by using the maximum
likelihood estimate method on sample data.
5000 6000 7000 8000 9000
0
50
100
150
200
Sending Bytes Per Second
Frequency
(a) Histogram of samples of a
1
.
4000 6000 8000 10000
0
20
40
60
80
Sending Bytes Per Second
Frequency
(b) Histogram of samples of a
2
.
Figure 3.4: Histogram of a
1
and a
2
based on samples in one dataset.
We then test the null hypothesis,
• the random variable a
i
(i = 1, 2) conforms to the mixture of two Gaussian
distributions with parameters given by the parameter estimates.
The goodness of fit test is conducted with the Kolmogorov-Smirnov test [7]. The
test results for the three datasets are shown in Table 3.1. The degrees of freedom are
determined by the size of observed samples. From the table, the Kolmogorov-Smirnov
statistic values D are always smaller than the critical values D0.01
. Therefore, the
above null hypothesis cannot be rejected. Both a
1
and a
2
follow a mixture of two
Gaussian distributions. This result suggests that Skype may adapt its sending rates
along different channel conditions.
Copula-based Dependence between Flows
The dependence between the 1st flow and 2nd flow in each dataset is unknown to us.
Copula analysis can help to disclose the hidden dependence structure. As described
32
Table 3.1: Kolmogorov-Smirnov goodness of fit test for a
1
and a
2
in three datasets.
a1 a2 a1 a2 a1 a2
? 0.60926 0.508744 0.574633 0.548374 0.434783 0.42081
µ1 5674.983 6886.837 5617.563 6219.708 5760.396 5857.61
µ2 6271.071 8072.268 6183.135 7427.538 6316.716 6463.853
s1 151.3938 532.1466 171.9808 388.4737 201.6061 183.0272
s2 470.381 394.1223 437.9055 605.1354 441.7937 461.2388
0.033909 0.025224 0.023118 0.030246 0.031631 0.035658
0.0447 0.038 0.0412 Critical values D 0.01
Degree of freedom
Statistical value D
1329 1837 1566
Dataset 1 Dataset 2 Dataset3
Random variable
Estimate
of 
paramters
in the above section, traffic of each flow is represented by random variable a. Then
copula between random variables a
1
and a
2
shows how the two flows correlate with
each other. In order to disclose the copula-based dependence between a
1
and a
2
, we
test three popular copulas, Gumbel copula, Frank copula and Clayton copula. By
goodness-fit-test on these three copulas, we can quickly understand the dependence
structure between a
1
and a
2
.
The fitness to Gumbel, Frank and Clayton copulas is tested with “Blanket tests”
based on empirical copula [36]. The main idea is to measure how far the empirical
copula is from the tested copula. The test results are shown with P -value. Statistically, “the P -value can be viewed as a measure of fit, with larger values being
better. This suggests that we could fit every distribution at our disposal, compute
the test statistic for each fit, and then choose the distribution that yields the largest
P -value” [7].
The fitness results for three copulas based on samples from the three datasets are
listed in Table 3.2. Gumbel copula fits the samples best across all the three datasets.
Therefore, it is suitable to use Gumbel copula to capture the dependence between
flows a
1
and a
2
.
Table 3.2: “Blanket” goodness of fit test for copula between a
1
and a
2
across three
datasets.
Dataset 1 Dataset 2 Dataset 3
? 1.1464 1.0597 1.6791
P -value 0.41 0.5 0.94
? 1.2531 0.4483 4.465
P -value 0.23 0.41 0.4
? 0.2057 0.0327 0.7574
P -value 0.07 0.21 0
Gumbel
Frank
Clayton
33
The Gumbel-based dependence actually reveals information about the transmission processes during Skype group calls in our experiments. It has been investigated
that Skype adapts its sending rate to packet losses, packet delay, and available bandwidth [89]. Specifically, Skype will reduce the sending rate if the transmission channel
is busy [89]. From the Gumbel-based dependence, we can infer that
• in most situations, the transmission channels for the two flows are not busy.
Thus there is a relatively high probability that Skype arranges a high sending
rate to the two destinations at the same time, causing the strong upper tail
dependence;
• when the transmission channel for one flow becomes busy, Skype reduces the
transmission rate to the corresponding destination, while the transmission rate
to the other destination does not need to change, resulting in the weak lower
tail dependence.
Remark 5. Our test results do not exclude the possibility that the data may fit another
possible distribution or another possible copula. Nevertheless, the completeness of
statistical tests is not the main focus of our work, and our framework is generally
applicable to other (possible) distributions and copulas.
3.5.2 Copula Analysis with Simulated Traffic
Due to the difficulty in accurately tracking the buffer size allocated and used by Skype
traffic, we study the performance bounds using simulated traffic. The simulated traffic
flows follow the statistical model obtained in the above real-world experiments. On
the service part, we simulate a constant rate server. To save space, we only show the
backlog bounds. The delay bounds can be studied with the similar method.
Generation of Simulated Traffic
Consider a system with two input flows (A1
and A2
) and a node with constant service
rate to the input flows (R1
to A1
and R2
to A2
). The two input flows follow the
distributions and dependence structure same as those of two outflows in the above
Skype group calls. In particular, the generated traffic amount per unit of time, a
1
and
a
2
, follow a mixture of two Gaussian distributions, and their correlation is modeled
by Gumbel copula. Denote the CDF of a
1
and a
2
as Fa1
and Fa2
, respectively,
34
and denote the copula between a
1
and a
2
as C (Fa1
, F
a2
; ?). The copula is chosen as
Gumbel copula, and the parameter is chosen as ? = 1.6791 according to our case study
of real traffic Dataset 3. Note that the similar performance results can be obtained
with other two datasets. Algorithm 1 shows the method to generate simulated traffic.
Algorithm 1 Traffic Generation Based On Given Distributions and Copula
Require: Distributions of a
1
and a
2
, copula between them, the length of time of
simulated process t
Ensure: Traffic data of two flows A1
and A2
1: for i ? 1 : t do
2: Generate a random pair (u
1
,u
2
) based on given copula using the method introduced in [59];
3: Generate a sample of a
1
within i-th unit of time by a
i
1
= F
-1
a1
(u
1
);
4: Generate a sample of a
2
within i-th unit of time by a
i
2
= F
-1
a2
(u
2
);
5: end for
6: The sample sequence of a
1
, {a
1
1
, a
2
1
, . . . , a
t
1
} represents traffic data of flow A1
;
7: The sample sequence of a
2
, {a
1
2
, a
2
2
, . . . , a
t
2
} represents traffic data of flow A2
;
The output of Algorithm 1 is actually the traffic arrived in each unit of time of input flows A1
and A2
. All the output traffic data is combined to be Simulated Dataset.
According to output traffic, the traffic amount arrived within any time interval (s,t]
can be computed in accumulative way with Eq. (3.27). Thus with simulated traffic
data, the arrival process of two flows can be entirely recovered and further used for
backlog bounds study.
A1
(s, t) =
t X
i=s+1
a
i
1
, A
2
(s, t) =
t X
i=s+1
a
i
2
. (3.27)
Backlog Bounds for Each Flow
Given the arrival Ai
(s, t) and a constant service rate Ri
, the backlog Bi
(t) is:
Bi
(t) = sup
0=s=t
{Ai
(s, t) - Ri
(t - s)}, i = 1, 2. (3.28)
If we characterize the backlog as a random variable Bi
(i = 1, 2), the backlog sequence
along time Bi
(1), Bi
(2), . . ., Bi
(t) are the observed samples of Bi
. Moreover, the
backlog bounding function is the survival function of Bi
. Then the backlog bound
can be estimated by the statistical distribution of backlog Bi
.
35
The service rate assigned to each flow equals its average arrival rate. By computation with Eq. (3.28), samples of Bi
(i = 1, 2) can be obtained. The histograms
of the sample values are shown in Fig. 3.5. The bimodal shaped histograms suggest
that B1
and B2
may also follow a mixture of two Gaussian distributions, which is
essentially inherited from the model of simulated input flows.
0 0.5 1 1.5 2
x 10
4
0
20
40
60
Backlog (Bytes)
Frequency
(a) Histogram of samples of B1
.
0 1 2 3
x 10
4
0
10
20
30
40
50
Backlog (Bytes)
Frequency
(b) Histogram of samples of B2
.
Figure 3.5: Histograms of B1
and B2
based on samples in simulated dataset.
The parameter estimation and Kolomogorov-Smirnov goodness of fit test results
are shown in Table 3.3. For both B1
and B2
, the statistical values are smaller than
the critical values, indicating that they both follow a mixture of two Gaussian distributions. With the estimated parameters, the survival function of backlog variables
B1
and B2
can be determined. Accordingly, the backlog bounds of flows A1
and A2
can be drawn, as shown in Fig. 3.6.
Remark 6. The “raditional” way to obtain backlog bound with SNC is to derive the
bound with traffic and service models. We treat the backlog as a random variable
and models its statistical features directly. Nevertheless, this is not unusual. For the
convenience of bound analysis, previous work [44] introduces some traffic models, such
as the v.b.c. model, which could be considered as the same type of practice as ours,
as per Definition 7 and Remark 1.
Backlog Bound for Superposition of Two Flows
We next consider the backlog of the aggregated flow. The aggregated traffic is A =
A1 + A2
. The service rate assigned to A is R1 + R2
. The backlog of A can be
36
Table 3.3: Kolmogorov-Smirnov goodness of fit test for backlog based on simulated
dataset
B1 B2
? 0.316657 0.31119
µ1 6402.2 6912.625
µ2 10741.37 12382.1
s1 1650.444 2439.608
s2 1930.165 4116.222
0.021 0.0233
Degree of freedom 1000
Critical values D 0.01 0.0515
Random variable
Estimate
of 
paramters
Statistical value D
0 0.5 1 1.5 2
x 10
4
0
0.2
0.4
0.6
0.8
1
Backlog (Bytes)
Prob (a) Backlog bound curve of flow A1
.
0 1 2 3
x 10
4
0
0.2
0.4
0.6
0.8
1
Backlog (Bytes)
Prob
(b) Backlog bound curve of flow A2
.
Figure 3.6: Backlog bound curves of two input flows of the simulated system.
represented as the summation of backlogs of A1
and A2
:
B(t) = sup
0=s=t
{A(s, t) - R(t - s)},
= sup
0=s=t
{A1
(s, t) - R1
(t - s)} + sup
0=s=t
{A2
(s, t) - R2
(t - s)},
=B1
(t) + B2
(t).
(3.29)
Based on the analysis in the previous section, we can obtain the survival functions of
B1
and B2
, denoted as
¯
FB1
and
¯
FB2
, respectively. With Lemma 1, the general bound
of B(t) can be calculated as:
P r{B(t) > x} = (
¯
FB1
?
¯
FB2
)(x). (3.30)
By introducing a proper copula capturing the correlation between B1
and B2
, the
37
backlog bound of B(t) can be calculated with Lemma 3 and is tighter:
P r{B(t) > x} = 1 -
Z Z
b1+b2<x
dC (FB1
, F
B2
). (3.31)
To identify a proper copula between B1
and B2
, we do the fitness test based on
Gumbel, Frank and Clayton copulas, respectively. The estimated parameters and
the test results are shown in Table 3.4. Clearly, Clayton is the one that best models
the dependence between B1
and B2
. The fitness of Clayton copula shows that the
backlogs of two flows are more lower tail dependent. That is, the probability that
backlogs of small size appear in both traffic flows at the same time is higher.
Table 3.4: “Blanket” goodness of fit test for copula between B1
and B2
based on
simulated dataset
? 2.48
P -value 0.03
? 9.3526
P -value 0.21
? 3.5
P -value 0.68
Gumbel
Frank
Clayton
Given the Clayton copula with the estimated parameter and the known marginal
distributions FB1
and FB2
, the copula-based backlog bound can be computed with
Eq. (3.31). Note that the copula-based bound obtained here is a special case of
Theorem 8, for the service process is simplified as a constant-rate service. Both the
general bound and the copula-based bound are shown in Fig. 3.7. We also label the
values x
bound = infx
P r(B(t) > x) = 0.1 from simulation, copula bound, and general
bound with vertical lines in the figure. Practically, the values x
bound
bound backlog
with a small violation probability (less than 0.1). The value from copula bound is
very close to the simulation result and much smaller than the value from the general
bound. It is clear that the copula-based bound is closer to reality and tighter than
the general one.
3.6 Summary
Integrating the statistical method in SNC has been shown to be promising [9]. With
a concrete real-world case study and numerical examples, this chapter illustrates the
benefit of applying copula analysis in SNC for tighter performance bounds. This
38
0 1 2 3 4 5 6
Backlog (Bytes) #10
4
0
0.2
0.4
0.6
0.8
1
Prob
Copula Bound
General Bound
General(3.165x10
4
)
Copula(2.909x10
4
)
Simulation(2.878x10
4
)
Figure 3.7: Backlog bound for aggregate traffic A.
analysis also sheds light on several important issues in SNC, such as the region where
we can take advantage of dependence of random processes, and the tightest bound
that SNC can possibly achieve.
39
Chapter 4
Copula Analysis of Temporal
Dependence of Markov Modulated
Poisson Process
4.1 Introduction
Markov modulated Poisson process (MMPP) is the doubly stochastic Poisson process
whose arrival rate is modulated by an irreducible continuous time Markov chain
(CTMC) independent with the arrival process [31]. Specifically, the arrival process
is a Poisson process with arrival rate ?
j
whenever the CTMC is in state j . MMPP
was first proposed by Yechiali and Naor to model non-homogeneous Poisson arrival
process in queueing systems [86]. Compared with traditional Poisson process, MMPP
allows the arrival rates to vary from time to time, making the model more flexible.
Besides, MMPP is effective to capture burst arrivals and sudden changes in arrivals
since it can integrate significantly different rates into one model. All these benefits
make MMPP a widely applied model for the arrival processes in networks [17, 38], for
the processes that show pattern changes [24], and for burst events detection [79, 41].
The good properties and the broad applications of MMPP are all on the basis of
the temporal dependence carried by MMPP. Essentially, the dependence/correlation
among inter-arrival times is the main difference between MMPP and Poisson process [10]. In the model of MMPP, the inter-arrival times are not independent. The
dependence between inter-arrival times comes from CTMC that modulates the state
switches over time. With the dependence structure of MMPP, we can better under-
40
stand the process and predict its trend. For example, when we model or detect traffic
of networks with MMPP, the temporal dependence of MMPP can be the objective to
match with that of the real traffic trace. Another example is to model web traffic or
traffic in cloud with MMPP [70, 69, 64]. In this case, resource provisioning based on
MMPP arrivals is the problem of interest. The capability of predicting arrivals based
on temporal dependence structures is critical in designing the resource provisioning
policy.
Existing theoretical studies of MMPP mainly fall into two categories. One track
of studies is to use MMPP as the input of the queueing system and study the queueing performance. Current representative works include [17, 70]. The other track of
studies is to develop algorithms to estimate the parameters of MMPP. Recent developments cover the algorithms of fitting MMPP to IP traffic traces [6], the expectationmaximization (EM) based algorithms to learn MMPP as a type of Markovian Arrival
Process [63], the algorithms to learn MMPP through the detection of change points
along with the arrival rates estimation [14], and the online learning algorithms by
modeling MMPP as a Hidden Markov Model [16]. All these learning algorithms are
either based on the arrival times, or the number of arrivals within every unit of time
(arrival counts). Despite the abundant existing theoretical results on MMPP, there
still is a large gap in the formal analysis of the dependence structure of MMPP in
the literature. This gap is reflected in the following aspects.
First, the temporal dependence of MMPP is not well understood. The existing
results related to the MMPP temporal dependence are all on the basis of covariance/autocorrelation. Neuts derived covariance between arrival counts over any two
time slots for stationary MMPP in 1989 [61], which is still the strongest result known.
This covariance result is not sufficient for many applications. To begin with, the covariance is not easy to compute due to the matrix exponential and matrix inverse
involved (especially when the number of states becomes large). For 2-state stationary MMPP, the closed-form of the covariance between arrival counts was given by
Andersen and Nielsen in 1998 [3]. For multi-state MMPPs, their covariances are usually obtained approximately by statistical counting on simulated traces [64, 16, 3].
Furthermore, the covariance or the autocorrelation is only capable of measuring the
linear dependence degree over time. However, the MMPP network traffic may contain temporal dependence more complex than linear dependence. Through a detailed
example given in Section 4.3.2, it is clear that the covariance only captures MMPP
dependence structure partially and is far from reflecting its whole dependence struc-
41
ture. This motivates us to search for the exact and functional temporal dependence
structure of MMPPs.
Second, there is no analysis on the temporal dependence in the superposition of
MMPPs, i.e., the aggregation of multiple flows, each modeled as an MMPP [39].
Although it has been proved that the superposition of MMPP is still an MMPP [31],
analysing the superposition of MMPP becomes intractable in real applications due
to the exponential increase of the number of states. For instance, the superposition
of two 20-states MMPP is computationally expensive to solve [39]. In other words,
simply treating the superposition of MMPP as one MMPP of higher number of states
would not work in practice. The temporal dependence of superposition of MMPPs
thus requires a different analytical method.
Copula, an advanced dependence measure that links marginals into joint distributions, is ideal for modeling the temporal dependence of MMPP. First, copula can be
constructed theoretically based on the analysis on marginals and joint distributions
of the observations in MMPP. Second, copula is capable of capturing all the characteristics from dependence structures. Beyond the linear dependence, it characterizes functional dependence structure and carries abundant dependence information.
Third, with the help of copula, it is easy to avoid the explosion of the number of
parameters when modeling the superposition of MMPPs, and it is computationally
tractable to calculate the temporal dependence of superposed MMPPs. Finally, the
invariant property of copula keeps the dependence measure stable even when MMPP
trace changes functionally. In this chapter, we build the theoretical copula to capture
the temporal dependence of both single and superposed MMPPs.
4.2 Related Work
The Markov Modulated Poisson Process (MMPP) was first applied in the network
domain in 1971 [86]. Since then, tremendous research efforts have been devoted to
MMPP. Early theoretical results and applications of MMPP were outlined in the
review [31] and references therein. In brief, the review includes the theoretical results
of the characterization of MMPP, the statistical moments of MMPP arrivals, and the
superposition of independent MMPPs. Afterwards, MMPP was further studied as
the arrival input of queueing systems. Furthermore, various learning algorithms for
parameter estimation of MMPP were proposed.
Among the literature of MMPP, the research that related to temporal dependence
42
modeling is summarized as follows. MMPP and other Markovian arrival processes
were generalized into the versatile Markov point process in [61]. The covariance
between arrival counts was derived for the versatile Markov point process. The closed
form of the covariance of 2-state MMPP was given in [3]. The covariance was further
derived into an asymptotic form and used for learning parameters. In [64, 16, 3],
covariance was the evaluation metric for goodness of fitting test for MMPP, and
it was computed empirically from simulated trace of fitted MMPP. Different from
the above works, our work in Chapter 4 derives the theoretical results on temporal
dependence of MMPP in terms of copula, which represents functional dependence.
In the case of superposition of MMPPs, its mathematical form has been given
in [31], but the parameter computation of superposed MMPPs is complex due to the
explosion of state number. To reduce the computational complexity, recent efforts
have been made to reduce the number of states and obtain an approximate solution [39, 88]. Our work in Chapter 4 focuses on the exact and tractable solution of
temporal dependence in the superposition of MMPPs.
4.3 Preliminaries
4.3.1 Markov Modulated Poisson Process
We introduce the definition and key concepts of MMPP.
Definition 9. A Markov-modulated Poisson Process (MMPP) [31] is constructed by varying the arrival rate of a Poisson process according to an m-state
irreducible continuous-time Markov chain (CTMC). In particular, when the Markov
Chain is in state j , the arrivals follow a Poisson process of rate ?
j
. Therefore, an
MMPP can be parameterized by the Q matrix [73] of CTMC and the m Poisson arrival
rates, ? = (?
1
, . . . , ?m).
We thus denote an MMPP by parameters (Q, ?).
Definition 10. Environment-stationarity of an MMPP [31]: An MMPP (Q, ?)
is considered to be environment-stationary if its associated CTMC is stationary.
For an environment-stationary MMPP, the stationary distribution of the states,
? = (p
1
, . . . , pm), is determined by solving the equation ?Q = 0. In our analysis of
MMPP, we only consider the environment-stationary MMPP.
43
Since the superposition of MMPPs is still an MMPP [31], to distinguish regular
MMPP with superposition of MMPPs, in this thesis either the term single MMPP
or MMPP refer to an MMPP not created from superposition. We introduce the
following terms to refer the superposition of MMPPs:
Definition 11. Superposition of independent homogeneous MMPPs: An
MMPP is called HoMMPP if it is a superposition of multiple independent homogeneous MMPPs. All constituent MMPPs have the same parameter (Q, ?).
Definition 12. Superposition of independent heterogeneous MMPPs: An
MMPP is called HeMMPP if it is a superposition of multiple independent heterogeneous MMPPs. The constituent MMPPs carry different parameters (
1
Q, 1
?), (
2
Q, 2
?),
..., (
l
Q, l
?)..., where (
l
Q, l
?) denotes the parameters of the l-th constituent MMPP.
Definition 13. Arrival counts of MMPP are a sequence of random variables representing the number of arrivals in disjoint equal-sized small time intervals, called
time slots. Denote the sequence of time slots as I
1
, I
2
, . . . , I
n
, and the random variable representing the arrival count of single MMPP in I
i
as Ai
, of superposition of l
independent MMPPs in I
i
as A
l
i
.
Remark 7. We denote the length of each time slot as ?. For MMPP modeling we
assume ? is short enough such that the state transition of MMPP within one time slot
is negligible. To keep this assumption valid, we recommend that the length of time
slot be no larger than the smallest average time of MMPP staying on one state, i.e.,
? =
1
maxj |qjj |
where q
jj
is the diagonal element in the j -th row of matrix Q. Under
this condition, the number of state transitions in one time slot can be ignored and
the arrival rate in one time slot is (approximately) stable. Experiments in [63] have
showed that the parameter estimation based on arrival counts becomes inaccurate
when ? >
1
maxj |qjj |
, indicating that a large value of ? would make the arrival counts
lack enough information to retrieve the MMPP. In other words, we assume that the
state transitions occur only at the boundaries of time slots. This approximation has
been used in previous research, e.g., in [63].
4.3.2 Why Do Existing Results Not Suffice?
The strongest result so far that discloses the temporal dependence of MMPP is
from [61], where the covariance or the autocorrelation of arrival counts over different
44
time slots is given. Covariance, however, is only capable of capturing linear dependence. MMPP trace may contain temporal dependence much more complex than
linear dependence. To illustrate the pitfalls of covariance, we consider two MMPPs
with their parameters as (1
Q, 1
?) and (2
Q, 2
?) shown below:
1Q =
-0.1 0.1
1 -1
!
,
1
? = (2, 200);
2Q =
-0.1 0.1
1 -1
!
,
2
? = (200, 2).
We simulate traces from these two MMPPs: Trace 1 is from MMPP (1
Q, 1
?); Trace 2
is from MMPP (
2
Q, 2
?). The traces are analysed by arrival counts. Specifically, the
number of arrivals in i-th timeslot is denoted as Ai
(i ? N). The arrival counts of the
two traces are shown in Fig. 4.1. From the figure, the traces from the two MMPPs
are very different. To study their dependence, we first analyse the covariances of two
MMPPs and then visualize their temporal dependence by the joint distribution of
successive arrival counts.
0 1000 2000 3000 4000 5000
No. of time slot
0
10
20
30
40
Arrival count
(a) Trace 1
0 1000 2000 3000 4000 5000
No. of time slot
0
10
20
30
40
Arrival count
(b) Trace 2
Figure 4.1: Arrival counts of the two traces
The theoretical form of the covariance of a 2-state MMPP is given by Eq.(3) in
Section II of [3]. Based on the given covariance function, the covariances between Ai
and Ai+i
0 (i
0
? N, is the time lag) of the two MMPPs in this example is theoretically
the same. In Fig. 4.2, we use the green plots to show that the theoretical covariances
of the two MMPPs (from the theoretical analysis with Eq.(3) of [3]) are all the same
over different time lags. We also plot the empirical covariances calculated from the
simulated traces. The covariances of two traces are close, though they vary slightly
from the theoretical results. So in terms of covariance, the two MMPPs show the
same dependence structure.
45
0 2 4 6 8 10
Lag
0
10
20
30
40
Covariance
30.3
27.7
25.9
23.2
20.6
19.8
18.6
17.0
15.6
14.7
29.1
26.0
23.3
20.9
18.7
16.8
15.0
13.5
12.1
10.8
21.8
19.2
17.0
14.9
13.4
11.9
10.7
9.8
8.9
8.1
From Trace 2
From theoretical analysis
From Trace 1
Figure 4.2: Covariances of two MMPPs over different time lags
To obtain the full view of the dependence structure, we visualize the joint behaviour of Ai
and Ai+1
by the scatter plots with marginal histograms in Fig. 4.3 and
their bivariate frequency histograms with heat map in Fig. 4.4. From the two figures,
we can observe that the joint behaviour of two successive arrival counts is quite different in the two MMPPs. Therefore, it is clear that the two MMPPs have different
temporal dependence between Ai
and Ai+1
.
0 10 20 30
A
i
0
10
20
30
A
i+1
(a) Trace 1
0 10 20 30 40
A
i
0
10
20
30
40
A
i+1
(b) Trace 2
Figure 4.3: Scatter plot with marginal histograms of Ai
and Ai+1
in two traces
In the above simple example, the two MMPPs have the same covariance theoretically. However they generate traces with significantly different temporal dependence
structures. Therefore, covariance, only measuring partial information from dependence, is not sufficient to represent MMPP dependence. This motivates us to seek
a better dependence structure to characterize temporal dependence beyond linear
46
-5000
40
0
40
Frequency
A
i+1
20
A
i
5000
20
0 0
0
1000
2000
3000
4000
(a) Trace 1
-5000
40
0
40
Frequency
A
i+1
20
A
i
5000
20
0 0
0
100
200
300
400
500
(b) Trace 2
Figure 4.4: Bivariate frequency histogram (upper layer) with its heat map (lower
layer)
scope when modeling network traffic with MMPP. We tackle this challenge with copula analysis. We will apply both theoretical way (Section 4.4) and parametric copula
modeling (Section 4.5) to construct the copula of MMPP and superposed MMPP.
4.4 Theoretical Copula Analysis for MMPP,
HoMMPP and HeMMPP
4.4.1 Theoretical Copula Analysis for Single MMPP
We first study an m-state MMPP with parameters (Q, ?). Based on Definition 13 and
Remark 7, the state in I
i
is considered as stable, thus defined as a random variable
Si
. Denote the transition matrix by P (t) = [p
j1j2
(t)], where p
j1j2
(t) is the probability
that the CTMC switches from state j
1
to state j
2
after time t. P (t) = e
Qt
can be
calculated with numerical methods such as those introduced in Chapter 6.8 of [73].
As ? is small, P (?) relates to Q matrix in the following way:
p
j1j2
(?) = 1 + q
j1j2
? + o(?), j
1 = j
2
;
p
j1j2
(?) = q
j1j2
? + o(?), j
1
6 = j
2
(4.1)
where o(?) is an infinitesimal. Therefore, by a simple calculation, P (?) is approximately equal to matrix Q? plus an identity matrix.
The MMPP traffic will be analysed in terms of arrival counts Ai
, and the temporal
47
dependence of MMPP will be the dependence between Ai
and Ai+i
0 with time lag as
i
0
. Under environmental-stationarity, the arrival counts of all time slots (i.e., Ai
with
any i) share the same marginal distribution function, denoted as M . Similarly, the
copula between Ai
and Ai+i
0 will be a function invariant on time slot label i but
only variant along time lag i
0
, thus is denoted as Ci
0 . In the following, we derive the
marginal distribution function M in Theorem 10 and the copula Ci
0 in Theorem 11.
Theorem 10. Let x
i be the sample value of Ai , the marginal distribution of Ai
on x
i
is
M (x
i
) = P r(Ai = x
i ) =
m X
j =1
p
j Gj
(x
i ) = ?G(x
i
) (4.2)
where
• Gj
(x
i
) = P r(Ai = x
i
|Si = j ) = e
-?j ?
P
k=x
i
k=0
(?j ?)
k
k!
,
• G(x
i
) = [G1
(x
i
), · · · , Gm(x
i
)] is a conditional marginal vector.
Proof.
M (x
i
) = P r(Ai = x
i ) =
m X
j =1
P r(Ai = x
i
|Si = j )P r(Si = j ) =
m X
j =1
p
j Gj
(x
i ) = ?G(x
i
).
Theorem 11. (Single MMPP copula) Let u
i = M (x
i
), the copula of any two
arrival counts, Ai
and Ai+i
0 (i
0
? N), can be calculated as:
Ci
0 (u
i
, u
i+i
0 ) = G(M
-1
(u
i
))diag(?)P (i
0
?)G(M
-1
(u
i+i
0 ))
T
, (4.3)
where
• M
-1
is the inverse function of M defined by (4.2),
• diag(?) is a square diagonal matrix with the elements of vector ? on the main
diagonal,
• G(M
-1
(v))
T
is the transpose of G(M
-1
(v)).
48
Proof. The joint distribution of Ai
and Ai+1
is derived as
FAi A
i+i
0
(x
i
, x
i+i
0 ) = P r(Ai = x
i , A
i+i
0 = x
i+i
0 )
=
m X
j2
=0
m X
j1
=0
P r(Ai = x
i , A
i+i
0 = x
i+i
0 |Si = j
1, S
i+i
0 = j
2
)P r(Si = j
1, S
i+i
0 = j
2
)
=
m X
j2
=0
m X
j1
=0
P r(Ai = x
i
|Si = j
1
)P r(Ai+1 = x
i+i
0 |Si+1 = j
2
)P r(Si = j
1
)P r(Si+i
0 = j
2
|Si = j
1
)
=
m X
j2
=0
m X
j1
=0
Gj2
(x
i+i
0 )p
j1j2
(i
0
?)Gj1
(x
i
)p
j1
=G(x
i
)diag(?)P (i
0
?)G(x
i+i
0 )
T
.
With the inverse method based on Theorem 4, the copula between Ai
and Ai+i
0 is
constructed as
Ci
0 (u
i
, u
i+i
0 ) = FAi A
i+i
0
(M
-1
(u
i
), M
-1
(u
i+i
0 ))
= G(M
-1
(u
i
))diag(?)P (i
0
?)G(M
-1
(u
i+i
0 ))
T
.
The copula Ci
0 in Theorem 11 is the theoretical copula that models the dependence
of two arrival counts in MMPP. The number of time slots between two arrival counts
are specified by the value of i
0
. We name this theoretical copula as single MMPP
copula.
4.4.2 Theoretical Copula Analysis for HoMMPP
For a better understanding, we start from HoMMPP to explore the copula of superposition of MMPPs. HoMMPP is a good model for many network system, such as
Internet core routers, where the incoming traffic may be a superposition of multiple
independent homogeneous MMPP traffic traces. We consider a HoMMPP with the
number of constituent MMPPs as l(l ? N), each of which has the same parameters
(Q, ?). All notations related to the HoMMPP is numbered with l on their
top right. That is, the HoMMPP has arrival counts random variable as A
l
i
, marginal
distribution of A
l
i
as M
l
, copula between A
l
i
and A
l
i+i
0 as C
l
i
0 , etc. Note that when
l = 1, the HoMMPP regress to single MMPP, the notations can omit l to be consistent
with those defined in Section 4.4.1.
49
For HoMMPP, it is hard to derive the theoretical copula directly. When l is
getting large, number of states of CTMC associated to HoMMPP will explode, and
the joint distribution of A
l
i
and A
l
i+i
0 can hardly expressed in closed-form. To tackle
this difficulty, we derive following theorems, which are helpful to reveal HoMMPP
copula. These two theorems will be the basis of an algorithmic approach to compute
HoMMPP copula introduced later in Section 4.4.4.
Theorem 12. Let x
i denote the sample value of A
l
i
, the marginal of A
l
i
is
M
l
(x
i ) =
X
l1+···+lm=l
l
1
l
!  
l
2
l - l
1
!
· · ·
lm
l - l
1 - · · · - lm-1
!
* p
l1
1
p
l2
2
· · · p
lm
m
* P o((l
1
?
1 + · · · + lm?m)?, x
i
),
(4.4)
where
k
n
!
is the combinatorial number of choosing k from n, and P o(?, x
i
) represents the Poisson cumulative distribution of value x
i
, with parameter ?.
Proof. Assume the number of MMPP in State j in time slot I
i
is l
j
(j = 1, 2, . . . , m).
The probability that the HoMMPP is at the above allocation of states is
l
1
l
!  
l
2
l - l
1
!
· · ·
lm
l - l
1 - · · · - lm-1
!
* p
l1
1
p
l2
2
· · · p
lm
m
. Since the superposition of two Poisson processes with rate ?
1
and ?
2
is a Poisson process with rate ?
1 + ?
2
, under the assumed
state combination A
l
i
follows Poisson distribution with parameter of (l
1
?
1 + · · · +
lm?m)?. Adding up all the possible allocations of states leads to the marginal form
in Theorem 12.
We define the copula gradients here for further analysis of HoMMPP copula.
Definition 14. Single MMPP copula gradient ?Ci
0 is defined as
?Ci
0 (u
i
, u
i+i
0 ) = ?Ci
0 (M (x
i
), M (x
i+i
0 ))
=Ci
0 (M (x
i
), M (x
i+i
0 )) + Ci
0 (M (x
i - 1), M (x
i+i
0 - 1))
- Ci
0 (M (x
i
), M (x
i+i
0 - 1)) - Ci
0 (M (x
i - 1), M (x
i+i
0 ));
(4.5)
50
HoMMPP/HeMMPP copula gradient ?C
l
i
0 is defined as
?C
l
i
0 (u
i
, u
i+i
0 ) = ?C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 ))
=C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 )) + C
l
i
0 (M
l
(x
i - 1), M
l
(x
i+i
0 - 1))
- C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 - 1)) - C
l
i
0 (M
l
(x
i - 1), M
l
(x
i+i
0 )).
(4.6)
Lemma 5. Single MMPP copula gradient can be simply regarded as ?Ci
0 (M (x
i
), M (x
i+i
0 )) =
P r(Ai = x
i , A
i+i
0 = x
i+i
0 ); Similarly, ?C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 )) = P r(A
l
i
= x
i , A
l
i+i
0 =
x
i+i
0 ).
Proof. Based on the definition of single MMPP copula gradient and the fact that the
arrival counts follow discrete marginal distributions, we have
?Ci
0 (M (x
i
), M (x
i+i
0 )
=Ci
0 (M (x
i
), M (x
i+i
0 )) + Ci
0 (M (x
i - 1), M (x
i+i
0 - 1))
- Ci
0 (M (x
i
), M (x
i+i
0 - 1)) - Ci
0 (M (x
i - 1), M (x
i+i
0 ))
=P r(Ai = x
i , A
i+i
0 = x
i+i
0 ) + P r(Ai = x
i - 1, A
i+i
0 = x
i+i
0 - 1)
- P r(Ai = x
i , A
i+i
0 = x
i+i
0 - 1) - P r(Ai = x
i - 1, A
i+i
0 = x
i+i
0 )
=P r(Ai = x
i , A
i+i
0 = x
i+i
0 )
(4.7)
Similarly, ?C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 )) = P r(A
l
i
= x
i , A
l
i+i
0 = x
i+i
0 ).
Theorem 13. The HoMMPP copula has recursive relationship between C
l
i
0 and C
l-1
i
0
as shown below:
C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 )) =
x
i X
x=0
x
i+i
0
X
y=0
C
l-1
i
0 (M
l-1
(x
i - x), M
l-1
(x
i+i
0 - y)) * ?Ci
0 (M (x), M (y)),
(4.8)
Proof. Since the constituent MMPPs are mutually independent, the arrivals of l number of aggregate MMPPs can be divided into arrivals of (l - 1) number of aggregate
MMPPs plus a single MMPP arrivals, i.e., A
l
i
= A
l-1
i
+ Ai
. Following this idea, we
51
have:
C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 ))
=P r(A
l
i
= x
i , A
l
i+i
0 = x
i+i
0 )
=
x
i X
x=0
x
i+i
0
X
y=0
P r(A
l
i
= x
i , A
l
i+i
0 = x
i+i
0 |Ai = x, A
i+i
0 = y) * P r(Ai = x, A
i+i
0 = y)
=
x
i X
x=0
x
i+i
0
X
y=0
P r(A
l-1
i
= x
i - x, A
l-1
i+i
0 = x
i+i
0 - y) * P r(Ai = x, A
i+i
0 = y)
=
x
i X
x=0
x
i+i
0
X
y=0
C
l-1
i
0 (M
l-1
(x
i - x), M
l-1
(x
i+i
0 - y)) * ?Ci
0 (M (x), M (y)).
Even with Theorem 13, the closed form of HoMMPP copula C
l
i
0 can hardly be
derived. However the recursive relationship between C
l
i
0 and C
l-1
i
0 can be implemented
as a recursive algorithm to calculate HoMMPP copula values numerically. The algorithm will be introduced in Section 4.4.4
4.4.3 Theoretical Copula Analysis for HeMMPP
HeMMPP is very similar to HoMMPP in their definitions, except that the constituent
MMPPs in HeMMPP are different rather than the same. Thus we have to differentiate
the constituent MMPPs by numbering them. With a shuffling, we can get a random
order of constituent MMPPs, i.e., (
1
Q, 1
?), (
2
Q, 2
?), ..., (
l
Q, l
?), where (l
Q, l ?) represents the parameters of the l-th constituent MMPP. The notations for each
constituent MMPP will be labeled by the order value l on the bottom
left, for instance,
l Ai
,
l Ci
0 ,
l M are arrival counts, copula, marginal of l-th MMPP.
In HeMMPP, A
l
i
, C
l
i
0 , M
l
denote those notations of the superposition of the first
l number of constituent MMPPs. Note that we introduce this ordering for a clear
explaining and analysis.
We derive the following theorems to analyse the marginal distribution and the
copula of HeMMPP:
Theorem 14. The HeMMPP marginal distribution function has recursive relation-
52
ship between M
l
and M
l-1
as
M
l
(x
i ) =
x
i X
x=0
M
l-1
(x
i - x) *
l
p(x), (4.9)
where l
p is the probability mass function of the arrival count from l-th MMPP,
l
p(x) =
l M (x) - l M (x - 1).
Proof. The key idea of the proof is to divide the arrival from l number of MMPPs
into the arrival from the first l - 1 number of MMPPs plus that from the l-th MMPP,
i.e., A
l
i
= A
l-1
i
+ l Ai
. Thus, we have
M
l
(x
i ) = P r(A
l
i
= x
i ) =
x
i X
x=0
P r(A
l
i
= x
i
|
l Ai = x)P r(
l Ai = x)
=
x
i X
x=0
P r(A
l-1
i
= x
i - x)P r(
l Ai = x) =
x
i X
x=0
M
l-1
(x
i - x) *
l
p(x)
Theorem 15. The HeMMPP copula has the recursive relationship between C
l
i
0 and
C
l-1
i
0 as shown below:
C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 )) =
x
i X
x=0
x
i+i
0
X
y=0
C
l-1
i
0 (M
l-1
(x
i - x), M
l-1
(x
i+i
0 - y)) * ?
l Ci
0 (M (x), M (y)),
(4.10)
where ?l Ci
0 is the single MMPP copula gradient of the l-th MMPP.
Proof. The proof is omitted since it is just similar to that of Theorem 13 on the basis
of A
l
i
= A
l-1
i
+ l Ai
.
4.4.4 An Algorithm to Compute HeMMPP Copula
In Sections 4.4.2 and 4.4.3, we introduce the recursive relationships among HoMMPP/
HeMMPP copula. Although the HoMMPP/ HeMMPP copulas are not derived into
closed forms, they could be computed numerically with a recursive algorithm introduced in this section. Since both single MMPP and HoMMPP are special cases of
HeMMPP, we will introduce how our algorithm works on HeMMPP as a general case.
53
Consider HeMMPP with l number of heterogeneous constituent MMPPs. To limit
the running time of the algorithm, we narrow down the interested range of A
l
i
from its
infinite domain to finite range with an upper threshold ˆ a. In other words, although
the range of A
l
i
is on the whole non-negative integer domain, we are only interested
in computing marginal values M
l
(x
i
) and copula values C
l
i
0 (M
l
(x
i
), M
l
(x
i+i
0 )) for
x
i < ˆ a and x
i+i
0 < ˆ a. The selection of ˆ a is application dependent and can be set
appropriately based on observations. Narrowing down the interested range makes the
computation feasible and still fulfills the demand for real applications, because the
observations of arrival counts in real traffic flows always fall within a limited range.
On the interested range [0, ˆ a), we define three matrices in Table 4.1, M
l
to represent HeMMPP/HoMMPP marginal values, C
l
to represent HeMMPP/HoMMPP
copula values, and D
l
to represent HeMMPP/HoMMPP copula gradient values. Essentially, these three matrices are look-up tables for HeMMPP on the domain of
interested range [0, ˆ a). For constituent MMPPs, their values in PMF, CDF marginal,
copula and copula gradient are represented by matrices
l P,
l M,
l C,
l D, where l means
the order of constituent MMPP.
Note that C
l
is defined for copula between A
l
i
and A
l
i+i
0 , as time lag i
0
is set to a
certain constant. Similarly, C
l
, D
l
,
l C and
l D are defined under condition that i
0
is
preset as a constant. To emphasize the matrices’ dimension, we mark dimensions on
the bottom right, such as [M
l
]
ˆ a
,[C
l
]
ˆ a×ˆ a
etc. We also define notations for submatrix,
for instance, [C
l
]
x×y
to represent the submatrix of [C
l
]
ˆ a×ˆ a
with its first x rows and
first y columns.
With HeMMPP parameters (1
Q,1
?), (
2
Q,2
?), ..., (
l
Q, l ?) and a properly set threshold value ˆ a, we design Algorithm 2 (with the time complexity as O(ˆ a × l)) to calculate HeMMPP marginal matrix [M
l
]
ˆ a
and Algorithm 3 (with the time complexity as
O(ˆ a × ˆ a × l)) to calculate HeMMPP copula matrix [C
l
]
ˆ a×ˆ a
. In Algorithm 2, the recursive relationship in Theorem 14 is implemented as the procedure MarginalMatrixCalc.
Some details in this procedure are expanded here:
• In line 6 and line 11, the marginal matrix is calculated for single MMPP.
Given l-th constituent MMPP (l
Q, l
?), its stationary distribution l ? and conditional marginal vector
l G could be calculated from parameters according
to Theorem 10. Then the element of its marginal matrix is calculated as
l Mx = l ?l G(x - 1) as shown in Theorem 10;
• In line 12, [
l P]
ˆ a
is computed from [
l M]
ˆ a
by
l Px = l Mx - l Mx-1
for any x.
54
Table 4.1: Definition of Matrices
Matrix
Denotation
Matrix name Number in row x (and column y)
[M
l
]
ˆ a HeMMPP marginal matrix M
l
x
= M
l
(x - 1) = P r(A
l
i
= x - 1)
[C
l
]
ˆ a×ˆ a
HeMMPP copula matrix
C
l
xy
= C
l
i
0 (M
l
(x - 1), M
l
(y - 1))
= P r(A
l
i
= x - 1, A
l
i+i
0 = y - 1)
[D
l
]
ˆ a×ˆ a
HeMMPP copula gradient matrix
D
l
xy
= ?C
l
i
0 (M
l
(x - 1), M
l
(y - 1))
= P r(A
l
i
= x - 1, A
l
i+i
0 = y - 1)
[
l M]
ˆ a
l-th MMPP marginal matrix
l Mx = l M (x - 1) = P r(
l Ai = x - 1)
[
l P]
ˆ a
l-th MMPP PMF matrix l Px = l
p(x - 1) = P r(
l Ai = x - 1)
[
l C]
ˆ a×ˆ a
l-th MMPP copula matrix
l Cxy = l Ci
0 (
l M (x - 1),
l M (y - 1))
= P r(
l Ai = x - 1,
l Ai+i
0 = y - 1)
[
l D]
ˆ a×ˆ a
l-th MMPP copula gradient matrix
l Dxy = ?l Ci
0 (
l M (x - 1),
l M (y - 1))
= P r(
l Ai = x - 1,
l Ai+i
0 = y - 1)
Similarly, Algorithm 3 implements Theorem 15 via a recursive procedure called CopulaMatrixCalc:
• In line 6 and line 11, the marginal matrix is calculated for single MMPP.
Given l-th constituent MMPP (l
Q, l
?), its stationary distribution l ?, conditional marginal vector
l G and transition matrix
l P (i
0
?) could be calculated
from parameters. Then the element of its copula matrix is calculated as
l Cxy =
l G(x - 1)diag(
l
?)l P (i
0
?)l G(y - 1)
T
according to Theorem 11;
• In line 12, [
l D]
ˆ a×ˆ a
is computed from [
l C]
ˆ a×ˆ a
by
l Dxy = l Cxy + l C(x-1)(y-1) -
l C(x-1)y - l Cx(y-1)
for any x and y.
With Algorithm 2 and 3, marginal matrix [M
l
]
ˆ a
and copula matrix [C
l
]
ˆ a×ˆ a
are calculated as the numerical results of HeMMPP marginal distributions and its temporal
copula as summarized in the following theorem:
Theorem 16. (HeMMPP copula) Given HeMMPP with marginal matrix [M
l
]
ˆ a
and copula matrix [C
l
]
ˆ a×ˆ a
, its copula value of C
l
i
0 (u
i
, u
i+i
0 ) for any u
i = M
l
ˆ a
and
u
i+i
0 = M
l
ˆ a
will be calculated as steps:
55
Algorithm 2 An algorithm to compute HeMMPP marginal matrix M
l
Require: HeMMPP parameters (1
Q,1
?), (
2
Q,2
?), ..., (
l
Q, l
?), the upper threshold ˆ a
Ensure: [M
l
]
ˆ a
1: return MargalMatrixCalc([
1?, ...,
l
?], [
1
Q, ..., l Q], ˆ a)
2: procedure MarginalMatrixCalc([
1?, ...,
l
?], [
1
Q, ..., l Q], ˆ a)
3: l ? the vector length of [
1?, ...,
l
?] or of [1
Q, ..., l Q]
4: // Base Case
5: if l == 1 then
6: [M
1
]
ˆ a ? compute with parameters
1
? and 1Q based on Theorem 10
7: return [M
1
]
ˆ a
8: end if
9: // Inductive Step
10: [M
l-1
]
ˆ a ? MarginalMatrixCalc([
1?, ...,
l-1
?], [
1
Q, ..., l-1Q], ˆ a)
11: [
l M]
ˆ a ? compute with parameters
l
? and l Q based on Theorem 10
12: [
l P]
ˆ a ? compute from [
l M]
ˆ a
13: for x ? 1, ˆ a do
14: Rotate matrix [l P]
x
180 degree clockwise as [
l P
0
]
x
15: Calculate Hadamard product of [M
l-1
]
x
and [
l P
0
]
x
as [T]
x
16: M
l
x ? sum of all elements in matrix [T]
x
17: end for
18: return [M
l
]
ˆ a
19: end procedure
1. x
i
= (argmaxx M
l
x
= u
i
) - 1;
2. x
i+i
0 = (argmaxx M
l
x
= u
i+i
0 ) - 1;
3. C
l
i
0 (u
i
, u
i+i
0 ) = C
l
(x
i +1)(x
i+i
0 +1)
.
For short, C
l
i
0 (u
i
, u
i+i
0 ) = C
l
(argmaxx M
l
x
=ui
)(argmaxx M
l
x
=u
i+i
0 )
With all the analysis in this section, we find the way to calculate the copula for
HeMMPP as shown in Theorem 16. Although mathematically it is not in closed-form,
the copula values can be computed effectively. Therefore, Algorithm 2 and 3 can be
regarded as the theoretical analysis of HeMMPP copula, and offer the exact solution
for the temporal dependence.
56
Algorithm 3 An algorithm to compute HeMMPP copula matrix C
l
Require: HeMMPP parameters (1
Q,1
?), (
2
Q,2
?), ..., (
l
Q, l
?), the upper threshold ˆ a
Ensure: [C
l
]
ˆ a×ˆ a
1: return CopulaMatrixCalc([
1?, ...,
l
?], [
1
Q, ..., l Q], ˆ a)
2: procedure CopulaMatrixCalc([
1?, ...,
l
?], [
1
Q, ..., l Q], ˆ a)
3: l ? the vector length of [
1?, ...,
l
?] or of [1
Q, ..., l Q]
4: // Base Case
5: if l == 1 then
6: [C
1
]
ˆ a×ˆ a ? compute with parameters
1
? and 1Q based on Theorem 11
7: return [C
1
]
ˆ a×ˆ a
8: end if
9: // Inductive Step
10: [C
l-1
]
ˆ a×ˆ a ? CopulaMatrixCalc([
1?, ...,
l-1
?], [
1
Q, ..., l-1Q], ˆ a)
11: [
l C]
ˆ a×ˆ a ? compute with parameters
l
? and l Q based on Theorem 11
12: [
l D]
ˆ a×ˆ a ? compute from [
l C]
ˆ a×ˆ a
13: for x ? 1, ˆ a do
14: for y ? 1, ˆ a do
15: Rotate matrix [l D]
x×y
180 degree clockwise to be [
l D
0
]
x×y
16: Calculate Hadamard product of [C
l-1
]
x×y
and [
l D
0
]
x×y
as [T]
x×y
17: C
l
xy ? sum of all elements in matrix [T]
x×y
18: end for
19: end for
20: return [C
l
]
ˆ a×ˆ a
21: end procedure
4.5 Parametric Copula Modeling for MMPP trace
Parametric copula modeling is to fit trace to well known parametric copulas and
choose the best one for applications. The arrival count traces could be from any kind
of MMPPs: single MMPP, HoMMPP or HeMMPP. The marginal distribution will
be constructed empirically and parametric copulas can be chosen according to the
tail dependence. In general, we assume that the fitting trace, denoted as {x
i
}
1=i=n
,
is a sample trace from HeMMPP {A
l
i
}. Our goal is to model copula between A
l
i
and
A
l
i+i
0 . We proposed the following tail-dependence-based schema to conduct parametric copula modeling:
1. Compute the tail dependence from data.
The upper tail dependence, as the limit of a function as u approaches 1, can be
approximated by evaluating a function value at u where u is close to 1 [83], say
57
0.99. Similarly, the lower tail dependence can be approximated by evaluating
the function value at u where u is close to 0, say 0.01, i.e.,
?
+
t
˜ P r(X > F
-1
X
(u)|Y > F
-1
Y
(u))|
u=0.99 ˜
1 - 2u + C (u, u)
1 - u
|
u=0.99
?
-
t
˜ P r(X < F
-1
X
(u)|Y < F
-1
Y
(u))|
u=0.01 ˜
C (u, u)
u
|
u=0.01
.
(4.11)
The tail dependence between A
l
i
and A
l
i+i
0 is estimated from trace as follows
?
+
t
˜ P r(A
l
i
> x
+
|A
l
i+i
0 > x
+
) =
P
n-i
0
i=1
1(x
i
> x
+
, x
i+i
0 > x
+
)
P
n-i
0
i=1
1(x
i+i
0 > x
+
)
?
-
t
˜ P r(A
l
i
< x
-
|A
l
i+i
0 < x
-
) =
P
n-i
0
i=1
1(x
i
< x
-
, x
i+i
0 < x
-
)
P
n-i
0
i=1
1(x
i+i
0 < x
-
)
.
(4.12)
where x
+
and x
-
are high and low quantile values such that
ˆ
M
l
(x
+
) = 0.99 and
ˆ
M
l
(x
-
) = 0.01, and
ˆ
M
l
is the empirical marginal distribution of A
l
i
.
2. Choose one candidate copula based on tail dependence property.
Choose proper copula in the candidate set according to tail dependence, for
instance, we could choose Clayton copula if ?
+
t
˜ 0 and ?
-
t
> 0; choose Gumbel
copula if ?
+
t
> 0 and ?
-
t
˜ 0; choose Frank copula if ?
+
t
˜ ?
-
t
; or use any
mixtures of these three copulas, the mixtures will cover various tail dependences.
3. Fit data to determine the copula parameter.
Each observation of the sample trace is first evaluated in its marginal domain,
that is, u
i =
ˆ
M
l
(x
i
). Then the pairs of {(u
i
, u
i+i
0 )}
1=i=n-i
0 become the data to fit
to determine the copula parameter ?. The fitting is implemented by maximum
likelihood estimation method explained in details in [12]. The parametric copula
learned from MMPP trace is denoted as C (u
i
, u
i+i
0 ; ?).
4.6 Summary
This chapter theoretically derive the intricate temporal dependence structure in MMPPs
with copula analysis. It presents the theoretical solution for modeling temporal dependence in both single MMPP and HoMMPP/ HeMMPP. In addition, parametric
copula modeling schema has been proposed for MMPP traces. In the next three
chapters, we will apply the analytical results in this chapter under different scenarios.
58
Chapter 5
Application of MMPP Copulas for
Network Traffic Prediction
In this chapter, we apply MMPP copulas discussed in Chapter 4 for prediction of
network traffic flows.
5.1 Introduction
Nowadays, people rely heavily on the Internet and various digital platforms supported
by enterprise cloud-computing capabilities, where data volume from online banking,
video broadcast, and social networking increases at an unprecedented pace. The huge
amount and diverse patterns of Internet traffic require large enterprises and service
providers to develop a new spectrum of technologies for serving their customers easily,
quickly and with guaranteed quality of service (QoS). To face the challenge, some large
enterprises have started to explore the power of predictive resource provisioning so
that resource allocation aligns well with the dynamic service demands [8]. A good
prediction on traffic flow will benefit the service provisioning.
The network traffic flows can be regarded as time series. The prediction of network traffic flows can be made based on some existing methods, for instance, linear
predictive coding and autoregressive model. Different from these existing methods,
copula modeling characterize the full temporal dependence among network traffic,
and will benefit the prediction in several aspects:
• Copula can capture various temporal dependence. With either the theoretical
copula or plenty of parametric copulas to choose, a variety of temporal depen-
59
dencies can be modeled. In other words, the copula modeling provides us with
numerous choices of temporal dependence to model real-world network traffic;
• The invariant property of copula makes copula model stable when functional
changes occur on network traffic. Without a re-modeling process, copula-based
prediction will be as precise as before changing, while the other existing models
can’t guarantee it.
In this chapter, we conduct prediction on MMPP traffic flows. Both theoretical
copulas derived and parametric copulas modeling proposed in Chapter 4 are used to
build the temporal dependence and predict future trend of traffic flows. With a large
number of prediction on real-world traces and simulations, we show that copula-based
prediction outperforms classical prediction models, linear predictive coding model and
autoregressive model.
5.2 Copula-based Prediction
The problem of traffic prediction can be posed in different forms. In our work, we
focus on estimating the future arrival count Ai+i
0 based on the current observation of
arrival count Ai
. The prediction is made by maximizing the conditional probability
P r(Ai+i
0 |Ai
), i.e., ˆ x
i+i
0 = argmaxx
P r(Ai+i
0 = x|Ai = x
i
). When i
0
= 1, the prediction is made one-step forward; when i
0
> 1, the prediction is made multi-step forward.
In this section, we introduce the prediction method with theoretical copulas, followed
by a discussion for prediction with parametric copulas.
5.2.1 Prediction Based on Theoretical Copulas
With MMPP copula Ci
0 for single MMPP and theoretical copula C
l
i
0 for HoMMPP/
HeMMPP, Theorem 17 can be used to predict future arrivals.
Theorem 17. (1) Consider a MMPP having its copula Ci
0 between Ai
and Ai+i
0 .
If Ai = x
i
is the current observation from the arrival process and if the prediction
is made by maximizing the conditional probability P r(Ai+i
0 |Ai
), the predicted arrival
count ˆ x
i+i
0 is:
ˆ x
i+i
0 = argmax
x
?Ci
0 (M (x
i
), M (x)). (5.1)
(2) Consider a HoMMPP/HeMMPP having theoretical copula C
l
i
0 between A
l
i
and
A
l
i+i
0 . If A
l
i
= x is the current observation from the arrival process and if the prediction
60
is made by maximizing the conditional probability P r(A
l
i+i
0 |A
l
i
), the predicted arrival
count ˆ x
i+i
0 is:
ˆ x
i+i
0 = argmax
x
?C
l
i
0 (M
l
(x
i
), M
l
(x)). (5.2)
Proof. We only prove part (2), since part (1) is a special case of part (2). Since the
prediction is made by maximizing the conditional probability P r(A
l
i+i
0 |A
l
i
), we have
ˆ x
i+i
0 = argmax
x
P r(A
l
i+i
0 = x|A
l
i
= x
i
)
= argmax
x
P r(A
l
i
= x
i , A
l
i+i
0 = x)
P r(A
l
i
= x
i
)
= argmax
x
P r(A
l
i
= x
i , A
l
i+i
0 = x)
= argmax
x
?C
l
i
0 (M
l
(x
i
), M
l
(x))
According to the definitions in Table 4.1, the value of ?C
l
i
0 function is represented
by HeMMPP/HoMMPP copula gradient matrix D
l
. With this numerical transformation between the function and matrix, the predicted arrival count is
ˆ x
i+i
0 = argmax
x
D
l
(x
i +1)(x+1)
= (argmax
x
D
l
(x
i +1)x
) - 1.
It indicates that the predicted arrival count can be numerically determined as the
column number of maximum value in the (x
i
+ 1)-th row of the matrix D
l
minus 1.
5.2.2 Prediction Based on Parametric Copulas
Given a single MMPP or HoMMPP/HeMMPP trace {x
i
}, parametric copula modeling is conducted according to Section 4.5. The parametric copula is continuous on the
domain of [0, 1], however, the marginal distribution is discrete. Due to this reason, we
first study the prediction problem on stochastic processes with continuous marginals
(Theorem 18) and then extend its usage for discrete distributions (Theorem 19).
Theorem 18. Consider a stochastic process {Bi
} that has a parametric copula C (u
i
, u
i+i
0 ; ?)
between Bi
and Bi+i
0 , continuous marginal distribution F , and marginal probability
density function (PDF) f . We have the following the conditional PDF as
f (Bi+i
0 = x
i+i
0 |Bi = x
i ) = c(F (x
i
), F (x
i+i
0 ); ?)f (x
i+i
0 ), (5.3)
61
where c(u, v; ?) =
?
?u
?
?v
C (u, v; ?) is called the parametric copula density function.
For discrete marginals, we revise Theorem 18 by relating the probability density
function(PDF) in continuous distribution to the probability mass function(PMF) in
discrete distribution.
Theorem 19. Consider a statistic process {Bi
} that has a parametric copula C (u
i
, u
i+i
0 ; ?)
between Bi
and Bi+i
0 , discrete marginal distribution F , and marginal probability mass
function (PMF) p. We have the following the conditional pmf as
p(Bi+i
0 = x
i+i
0 |Bi = x
i ) = c(F (x
i
), F (x
i+i
0 ); ?)p(x
i+i
0 ). (5.4)
The proofs of Theorems 18 and 19 are straightforward using similar techniques
in [5]. With Theorem 19, prediction based on parametric copula on a MMPP trace
is given by Theorem 20.
Theorem 20. (1) Consider a MMPP having its parametric copula C (u
i
, u
i+i
0 ; ?)
between Ai
and Ai+i
0 . If Ai = x
i
is the current observation from the arrival process
and if the prediction is made by maximizing the conditional probability P r(Ai+i
0 |Ai
),
the predicted arrival count ˆ x
i+i
0 is:
ˆ x
i+i
0 = argmax
x
c(M (x
i
), M (x); ?)(M (x) - M (x - 1)); (5.5)
(2) Consider a HoMMPP/HeMMPP having its parametric copula C (u
i
, u
i+i
0 ; ?) between A
l
i
and A
l
i+i
0 . If A
l
i
= x
i
is the current observation from the arrival process and
if the prediction is made by maximizing the conditional probability P r(A
l
i+i
0 |A
l
i
), the
predicted arrival count ˆ x
i+i
0 is:
ˆ x
i+i
0 = argmax
x
c(M
l
(x
i
), M
l
(x); ?)(M
l
(x) - M
l
(x - 1)). (5.6)
5.3 Experimental Evaluation
We conduct experiments to show how the copula model could help traffic prediction.
In the evaluation, we first give a broad view of the methods to evaluate the per-
62
formance of copula-based prediction. We then show case studies on single MMPP,
HoMMPP and HeMMPP.
5.3.1 Evaluation Methods
Theoretical copula and parametric copulas discussed in Chapter 4 will be used for
traffic prediction according to Section 5.2. To evaluate copula models for prediction,
we implement two classic prediction models, autoregressive model (AR(1)) and linear
predictive coding (LPC(1)), for comparison. Note that the first order AR model and
the first order LPC model are used here for a fair comparison, because our copulabased prediction model is first order in the sense that only dependence between two
successive arrival counts is considered each time.
1. AR(1) model prediction
Consider a trace having AR(1) model with parameters ?1
, ?2
and white noise

t
. If A
l
i
= x
i
is the current observation, the prediction is made by:
ˆ x
i+1 = ?1 + ?2
x
i + 
i+1
,
ˆ x
i+2 = ?1 + ?2
ˆ x
i+1 + 
i+2
,
· · ·
ˆ x
i+i
0 = ?1 + ?2
ˆ x
i+i
0
-1 + 
i+i
0 ,
2. LPC(1) model prediction
Consider a trace having LPC(1) model with the parameter s. If A
l
i
= x
i
is the
current observation, the prediction is made by:
ˆ x
i+1 = sx
i
,
ˆ x
i+2 = s ˆ x
i+1
,
· · ·
ˆ x
i+i
0 = s ˆ x
i+i
0
-1
,
As a purely linear predictor, the parameter of LPC(1), s is directly determined
by auto-correlation of arrival count sequence. Since LPC(1) model is to predict data
only based on the dependence information in terms of autocorrelation, it is set as
the benchmark predictor to show how functional dependence modeling with copulas
63
improves over linear dependence. We also compare copula-based prediction with
AR(1) model since AR(1) model is the popular statistical method for prediction.
When applying any of the prediction models on a traffic trace, the trace is divided
into two parts, the training set and the testing set. The training set comes from the
first certain percentage of trace data, and the rest of the trace constitutes the testing
set. For example, if the training percentage is 50%, the first half of the trace will be
used to train a model, and the second half will be used to test prediction accuracy.
The prediction accuracy is measured by root-mean-square error (RMSE) across the
test set, defined as
RMSE =
v
u
u
t
1
n
n X
i=1
(ˆ x
i - x
i
)
2
, (5.7)
where x
i
is the i-th observed arrival count from test set, ˆ x
i
denotes the corresponding
predicted value, and n is the total number of time slots in the testing period.
For a prediction model, its average RMSE (aRMSE) over different experiment
scenarios represents its overall performance on MMPP traffic trace prediction. Its
performance improvement ratio (IMP RATIO) over benchmark model (LPC(1)) is
defined in Eq.(5.8). The larger the value is, the more the predictor improves over
LPC(1) model.
IMP RATIO =
aRMSE
benchmark - aRMSE
aRMSE
benchmark
* 100%. (5.8)
5.3.2 Case Study on A Single MMPP Trace from Real-world
BCpAug89 trace, one of Bellcore traces
1
, records the exact arrival times of 1,000,000
packets on an Ethernet at Bellcore Morristown Research and Engineering facility.
Previous research has shown that the trace is well characterized by MMPP [3, 62, 58].
We analyse the trace in terms of arrival counts every second, i.e., the length of time
slot is set as ? = 1 (second), and Ai
denotes random variable of arrival count in i-th
second. With learning algorithm proposed in [39], this trace is modeled by a 12-state
MMPP with parameters (A
Q, A
?) as shown in Eq. (5.9). In the case study, we will
apply copula to model its dependence structure and predict the trace flow. We will
also vary the trace by a functional transformation to show that the copula-based
dependence model is much more stable than other models.
1
The Bellcore traces are available on the website http://ita.ee.lbl.gov/html/contrib/BC.html
64
AQ =
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
-0.857 0.286 0.428 0.143 0 0 0 0 0 0 0 0
0.067 -0.900 0.267 0.233 0.233 0.067 0.033 0 0 0 0 0
0.023 0.078 -0.837 0.336 0.203 0.103 0.078 0 0.016 0 0 0
0 0.026 0.140 -0.722 0.274 0.153 0.085 0.030 0.007 0.007 0 0
0.002 0.008 0.051 0.173 -0.651 0.244 0.122 0.041 0.006 0.002 0.002 0
0 0.001 0.027 0.074 0.173 -0.696 0.303 0.094 0.014 0.009 0.001 0
0 0.001 0.004 0.019 0.099 0.233 -0.617 0.200 0.048 0.012 0.001 0
0 0 0.008 0.023 0.049 0.184 0.409 -0.775 0.084 0.015 0.003 0
0 0 0.008 0.015 0.015 0.120 0.301 0.218 -0.805 0.113 0.015 0
0 0.020 0 0 0.059 0.059 0.235 0.078 0.275 -0.824 0.098 0
0 0 0 0 0 0.077 0.231 0.231 0.154 0.077 -0.847 0.077
0 0 0 0 0 0 0 0 0 1 0 -1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
,
A? = (782.069, 674.207, 574.345, 482.483, 398.621, 322.759, 254.897, 195.035, 143.173, 99.311, 63.449, 35.587).
(5.9)
One-step Prediction on BCpAug89 trace
Given the learned MMPP parameter (A
Q, A?), we first construct copula for MMPP
theoretically and empirically. With theoretical analysis in Section 4.4.1, the MMPPcopula for learned MMPP is computed from the parameters (
A
Q, A
?) based on Theorem 11. The contour of the computed MMPP copula is shown in Fig 5.1a. Dependence
measures, including Kendall’s tau ?
t
, Spearman’s rho ?
s
, tail dependence ?
+
t
and ?
-
t
,
and Pearson correlation coefficient ?, between Ai
and Ai+1
are analysed in Table 5.1.
Theoretical results of ?
t , ?
s
and ?
t
are calculated from copula with Eqs.(2.2)-(2.5).
Pearson coefficient is calculated based on the analysis in [61]. Except Pearson coefficient, all other dependence measures can be obtained via copula, indicating that
copula includes rich information about dependence structure. In addition, the comparison between theoretical and empirical dependence measures shows that copula
accurately captures the trace dependence.
Table 5.1: Dependence Measures of BCpAug89 Trace from Theoretical Analysis and
Empirical Analysis
?
t
?
s
?
+
t
|
u=0.99
?
-
t
|
u=0.01
?
Theoretical 0.4788 0.6150 0.4067 0.3359 0.7555
Empirical 0.4212 0.5897 0.3935 0.3248 0.6149
Empirically, a parametric copula can be also chosen to model temporal dependence
65
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(a) Theoretical MMPP copula.
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(b) Parametric copula.
Figure 5.1: Copula contours for MMPP learned from BCpAug89 trace.
between Ai
and Ai+1
. As shown in Table 5.1, ?
+
t
is close to ?
-
t
, we thus choose Frank
copula to model the BCpAug89 trace. The parameter of Frank copula is determined
by fitting a training set from the trace. With different training percentage of data,
the parameter of Frank copula will be determined accordingly. Fig. 5.1b shows the
contour of parametric copula trained from 80% percentage of data in BCpAug89 trace.
Fig. 5.1a and Fig. 5.1b have close contour shape. The similarity of two copulas can be
quantified by the discrete L2
norm distance over the size of discrete lattice [27], which
is 0.0173 in our case. This value is close to those in the experiments of [27] when
selecting two similar copulas, indicating that the parametric copula trained from data
accords well with the theoretical copula from analysis.
0 100 200 300 400 500 600
0
200
400
600
800
1000
No. of time slots
Number of arrivals
The testing set of BCpAug89 Trace
Prediction on testing set
Figure 5.2: Prediction with theoretical copula on the testing set (last 20%) of BCpAug89 trace
With the copulas constructed from the training set of BCpAug89 trace, one-step
prediction is conducted on its testing set. Fig. 5.2 shows at a glance the prediction
66
Table 5.2: One-Step Prediction RMSE on BC-pAug89 trace with Different Training
Percentages.
Training
Percentage
Theoretical
Copula
Parametric
Copula
AR(1) LPC(1)
50% 94.2411 88.9070 92.2850 110.3130
60% 90.5974 87.5581 88.8550 106.1805
70% 93.1982 91.7414 90.8745 108.5895
80% 92.3244 88.1251 90.2618 105.2930
90% 94.4256 93.2204 92.9318 108.0254
aRMSE 92.95734 89.9104 91.04162 107.6803
IMP RATIO 13.67% 16.50% 15.45% —
with theoretical copula on the last 20% arrivals of BCpAug89 trace. To obtain multiple prediction results for the aRMSE measurement, we adjust the training percentage
from 50% to 90%. The prediction accuracy of copulas in measure of RMSE is shown
in detail and compared with AR(1) model and LPC(1) model in Table 5.2. From the
table, we can infer that both MMPP copula and parametric copula characterize the
temporal dependence of BCpAug89 trace well, leading to a good prediction. Copulabased predictions, including theoretical copula model and parametric copula model,
have more than 10% improvement ratio over the LPC(1) model, showing the advantage of functional dependence modeling (such as copulas) over linear dependence
measurement (such as autocorrelation). Copula-based predictions achieve accuracy
similar to the classical AR(1) model, showing that copula captures the dependence of
real-world MMPP trace effectively, which in turn helps the prediction. In addition,
copula-based predictions have other benefits compared to AR(1) model, as shown in
the next section.
The Stability of Copula-based Model
Nowadays, a network flow may pass through many middleboxes, which may transform
the traffic with some (potentially unknown) functions. In some scenarios, we may need
to consider another counting process closely associated with the incoming traffic, e.g.,
the number of CPU resources or the size of cache space that should be (dynamically)
67
allocated for processing the traffic. In these cases, the traffic is transformed with some
functions or the new counting process can be viewed as the traffic transformed with
a function. In the following, we study a new process A
0
i
= log(Ai
) as an example. We
note that the same conclusion could be drawn with other transformation functions.
We call A
0
i
an associated trace.
With the invariant property of copulas, the temporal dependence between A
0
i
and A
0
i+1
in terms of copula remains the same as that between Ai
and Ai+1
. The
measures ?
t , ?
s
and ?
t
among trace A
0
i
will also have the same theoretical results,
since all of them could be derived with copula. However, since Pearson correlation
does not satisfy the invariant property with the above transformation, ? of A
0
i
is
not theoretically tractable and thus needs to be calculated from empirical statistics.
Table 5.3 shows the measures of trace A
0
i
. Comparing Table 5.1 and Table 5.3, we can
see that copula-based dependencies are all the same while Pearson correlation varies,
indicating that copula is much more stable than Pearson correlation.
Table 5.3: Dependence Measures of the Associated Trace from Theoretical Analysis
and Empirical Analysis
?
t
?
s
?
+
t
|
u=0.99
?
-
t
|
u=0.01
?
Theoretical 0.4788 0.6150 0.4067 0.3359 —
Empirical 0.4212 0.5897 0.3935 0.3248 0.5916
0 100 200 300 400 500 600
No. of time slots
6
7
8
9
10
Log number of arrivals
The testing set of the associated trace
Prediction on the testing set
Figure 5.3: Prediction with theoretical copula on the testing set (last 20%) of the
associated trace
Taking the advantage of invariant property, we do not need to rebuild the depen-
68
dence model when it comes to the prediction of A
0
i
with copula, because the same
copula model for Ai
can be applied and the marginal function of A
0
i
can be obtained
from Ai
by MA
0 (x) = MA
(2
x
). Therefore, all copula models for Ai
in Section 5.3.2
can be applied directly to predict A
0
i+1
given the A
0
i
value. Fig. 5.3 shows the prediction on the last 20% of the associated trace by using the same copula model of
Ai
. Nevertheless, without rebuilding the dependence model, the AR(1) and LPC(1)
models for Ai
applied to A
0
i
will lead to poor prediction performance. It is worth
noting that rebuilding a new model for A
0
i
may be non-trivial due to the potentially
unknown transformation function and the need of collecting and recording historical
data of A
0
i
.
Table 5.4: One-Step Prediction RMSE on the Associated Trace with Different Training Percentages.
Training
Percentage
Theoretical
Copula
Parametric
Copula
AR(1) LPC(1)
50% 0.4788 0.4344 0.5918 0.6953
60% 0.4653 0.4286 0.5868 0.6982
70% 0.4710 0.4393 0.5863 0.7137
80% 0.3955 0.3659 0.5471 0.6765
90% 0.3780 0.3655 0.5324 0.7390
aRMSE 0.4377 0.4068 0.5689 0.7045
IMP RATIO 37.87% 42.26% 19.26% —
To test prediction performance without rebuilding a model, we apply the trained
models (i.e., copula, AR(1), and LPC(1)) from Ai
to predict the associated trace
A
0
i
. The one-step prediction RMSEs on A
0
i
of four methods are listed in Table 5.4.
Both theoretical and parametric copulas outperform AR(1) and LPC(1) significantly.
The results indicate that copula-based prediction is much more stable in the presence
of traffic transformation, and both AR(1) and LPC(1) cannot capture the dependence in the associated trace accurately without a re-modeling process. Copulas take
advantage of the invariant property to avoid the re-modeling process whenever an
increasing functional transformation is imposed on the original traffic, leading to its
much better performance over other models.
69
5.3.3 Case Study on HoMMPP Trace with Simulation
In real world, the availability of HoMMPP traffic traces is limited because it is not easy
to identify them with proper fitting and goodness-testing methods. So we generate
HoMMPP traces by simulation. We consider a scenario that there are 3 independent
sources sending the traffic flows, with features similar to BCpAug89 trace, to one
destination. The flow to the destination can be simulated as the aggregation trace of
3 independent MMPP traces generated by parameters (
A
Q, A
?) shown in Eq.(5.9).
The simulation lasts for 7200 seconds. We conduct the prediction on the generated
HoMMPP trace. We analysed the trace in terms of arrival counts every second, i.e.,
? = 1. On the HoMMPP trace, we perform both one-step prediction and two-step
prediction and compare copula models with others.
One-step Prediction on the HoMMPP Trace
When constructing HoMMPP copula matrix, the threshold is set as ˆ a = 2000 according to observation of samples. Given parameters (
A
Q, A
?) in Eq.(5.9), the theoretical
copula of HoMMPP is calculated based on Theorems 16. The contour of the theoretical copula is shown in Fig. 5.4a. Note that even though we only compute the
theoretical copula for A
l
i
= 2000, it is almost the complete copula, because the threshold is large enough to make C
l
i
0 (M
l
(ˆ a), M
l
(ˆ a)) = 0.99998 ˜ 1 (i
0
= 1, l = 3 in this
case), meaning that the probability for arrival counts to go beyond the threshold is
extremely small. With the theoretical copula C
l
i
0 constructed, dependence measures
are calculated accordingly and compared with empirical results from trace data. Table 5.5 show the results, which indicate the accuracy of copula in modeling the trace.
The theoretical Pearson correlation ? is missing since its value on aggregate MMPP
is extremely hard to calculate when the underlying MMPP has a large number of
states.
Table 5.5: Dependence Measures of the HoMMPP trace from Theoretical Analysis
and Empirical Analysis
?
t
?
s
?
+
t
|
u=0.99
?
-
t
|
u=0.01
?
Theoretical 0.5681 0.7329 0.3367 0.2484 —
Empirical 0.5500 0.7370 0.2857 0.2875 0.7566
Since the trace has similar upper and lower tail dependencies as shown in Table 5.5,
70
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2 0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(a) Theoretical HoMMPP copula
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(b) Parametric copula.
Figure 5.4: One-step copula contours for HoMMPP.
we choose Frank copula as the parametric copula. It indicates that the HoMMPP
inherits the tail dependence features from single MMPP. The parameter of Frank
copula is fitted according to different training set. Fig 5.4b shows the contour of the
parametric copula trained from first 80% data of the HoMMPP trace. Contours in
Fig. 5.4a and 5.4b are very similar. Their discrete L2
norm distance [27] is 0.0100,
which is small enough to justify the similarity of two copulas according to the results
in [27].
0 200 400 600 800 1000 1200 1400
0
500
1000
1500
2000
The testing set of the HoMMPPtrace
Prediction on the testing test
Figure 5.5: Prediction with theoretical HoMMPP copula on the testing set (last 20%)
With both theoretical copula and parametric copula, we perform one-step prediction on the HoMMPP trace. Fig. 5.5 shows the prediction with theoretical HoMMPP
copula on the testing set of the last 20% arrival counts. We adjust the training percentage from 50% to 90%, and the prediction results are shown in Table 5.6. From
the table, copula-based prediction has the highest IMP RATIO over the benchmark
71
Table 5.6: One-Step Prediction RMSE on the HoMMPP Trace with Different Training
Percentage.
Training
Percentage
Theoretical
Copula
Parametric
Copula
AR(1) LPC(1)
50% 133.2363 135.5447 138.2753 197.2492
60% 133.3298 137.0763 138.4576 199.7005
70% 131.5002 135.3443 136.6052 199.2604
80% 131.0944 132.6420 136.3443 198.3723
90% 127.9317 128.9178 133.2959 198.4436
aRMSE 131.4185 133.9050 136.5957 198.6052
IMP RATIO 33.82% 32.58% 31.22% —
prediction regarding the aggregate MMPP traffic, indicating that copulas capture the
temporal dependence of HoMMPP the best.
Two-step Prediction on the HoMMPP trace
We also experiment two-step prediction on the HoMMPP trace. That is, with any
observation A
l
i
in the test set, A
l
i+2
is predicted. In order to make two-step prediction,
the two-step theoretical HoMMPP copula is constructed as shown in Fig. 5.6a. Based
on the two-step copula, the dependence measures are given and compared with empirical results in Table 5.7. Compared to one-step dependencies in Table 5.5, two-step
dependencies between A
l
i
and A
l
i+2
are smaller because the dependence decreases as
the step increases.
Table 5.7: Two-step Dependence Measures of the HoMMPP Trace from Theoretical
Analysis and Empirical Analysis
?
t
?
s
?
+
t
|
u=0.99
?
-
t
|
u=0.01
?
Theoretical 0.2979 0.4189 0.1338 0.1077 —
Empirical 0.3372 0.4836 0.1690 0.1286 0.5104
Based on a training set of the HoMMPP trace, the parametric copula between
A
l
i
and A
l
i+2
is trained accordingly. Fig. 5.6b shows the contour of the two-step
72
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(a) Two-step theoretical HoMMPP copula
0.1
0.1
0.1
0.1
0.1
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.6
0.6
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(b) Two-step Parametric copula for
HoMMPP trace.
Figure 5.6: Two-step copula contours for HoMMPP.
0 200 400 600 800 1000 1200 1400
0
500
1000
1500
2000
The testing set of the HoMMPPtrace
Prediction on the testing set
Figure 5.7: Two-step prediction with theoretical copula on the testing set (last 20%)
of the HoMMPP trace
parametric copula trained from a training set consisting of the first 80% data of the
HoMMPP trace. For two-step dependence, the theoretical copula and the parametric
copula are also close to each other (Their discrete L2
norm distance [27] is 0.0045).
With different training percentages, two-step predictions are performed on the
HoMMPP trace. The prediction results of applying the theoretical copula on the
last 20% of the HoMMPP trace are shown in Fig. 5.7. Prediction errors in terms of
RMSE with different training percentages are shown in Table 5.8. Our copula models
have significant improvement ratio (IMP RATIO) over benchmark model regarding
the two-step predictions. Compared with AR(1), copulas also have a much better
performance, indicating that copula can better characterize multi-step temporal dependence of arrival counts in MMPP.
73
Table 5.8: Two-Step Prediction RMSE on the HoMMPP Trace with Different Training Percentage.
Training
Percentage
Theoretical
Copula
Parametric
Copula
AR(1) LPC(1)
50% 173.3941 174.2526 196.5692 242.6005
60% 174.9574 176.4601 198.0623 247.9247
70% 173.3858 174.7150 197.3366 250.4306
80% 173.2769 174.3444 196.9888 250.7354
90% 169.6243 170.0910 194.5932 246.3572
aRMSE 172.9277 173.9726 196.7100 247.6097
IMP RATIO 30.16% 29.74% 20.56% —
5.3.4 Case Study on HeMMPP trace
There are not many HeMMPP traces in real world ideal for the case study. Besides BCpAug89 trace, we add another Bellcore trace, BCpOct89 trace, for study.
BCpOct89 trace record LAN traffic for about 1759.62 seconds. Analysing the traffic
arrival in every 1 second, the BCpOct89 trace is fitted into a 13-state MMPP with
parameters (
O
Q, O
?) listed in Eq.(5.10). Since BCpAug89 trace and BCpOct89 traces
are modelled by heterogeneous MMPPs, their aggregation, with them chopped into
the same length, is ideal as a HeMMPP trace for prediction. Based on the observations of HeMMPP trace, the threshold for marginal and copula matrix computation
is chosen as ˆ a = 1500. The probability that the arrival count A
l
i
exceeds the threshold
is less than 0.01, resulting very few observations beyond the threshold.
74
OQ =
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
-1.00 0.75 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.04 -0.64 0.26 0.25 0.06 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.00
0.00 0.13 -0.72 0.34 0.16 0.03 0.03 0.02 0.00 0.00 0.00 0.00 0.00
0.01 0.06 0.12 -0.68 0.31 0.13 0.04 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.10 0.25 -0.74 0.20 0.11 0.06 0.01 0.00 0.00 0.00 0.00
0.00 0.00 0.04 0.09 0.23 -0.71 0.20 0.10 0.03 0.02 0.00 0.00 0.00
0.00 0.00 0.00 0.03 0.06 0.31 -0.68 0.16 0.08 0.02 0.01 0.00 0.00
0.00 0.00 0.01 0.02 0.04 0.19 0.34 -0.81 0.16 0.05 0.01 0.01 0.01
0.00 0.00 0.00 0.01 0.04 0.09 0.23 0.29 -0.83 0.14 0.04 0.00 0.00
0.00 0.00 0.00 0.00 0.03 0.02 0.07 0.22 0.28 -0.80 0.13 0.05 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.21 0.33 -0.71 0.04 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.50 0.17 -0.83 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 -1.00
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
,
O? = (1125.89, 995.67, 873.46, 759.24, 653.02, 554.81, 464.59, 382.37, 308.15, 241.94, 183.72, 133.50, 91.28).
(5.10)
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(a) One-step HeMMPP copula.
0.1
0.1
0.1
0.1 0.1
0.2
0.2
0.2
0.2
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.5
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.9
U
V
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
(b) Two-step HeMMPP copula.
Figure 5.8: Copula contours for HeMMPP.
The one-step and two-step HeMMPP copula are constructed as shown in Fig. 5.8.
Based on copulas, we conduct one-step and two-step predictions on the HeMMPP
trace. The training percentage is adjusted from 50% to 90%. The prediction accuracy
is shown in Table 5.9 and 5.10. From the comparison, the copulas-based prediction
has great improvement on accuracy over LPC(1) method. It also outperform AR(1)
regarding the dependence modeling as well as the trace predictions.
75
Table 5.9: One-Step Prediction RMSE on the HeMMPP trace with Different Training
Percentages.
Training
Percentage
Theoretical
Copula
Parametric
Copula
AR(1) LPC(1)
50% 175.7469 172.1645 177.5212 206.1258
60% 179.5210 180.3826 181.8779 206.0690
70% 184.5205 185.6766 187.2252 209.7664
80% 185.9480 175.9524 188.0131 213.9332
90% 201.4238 186.3535 203.8665 226.9021
aRMSE 185.4320 180.1059 187.7008 212.5593
IMP RATIO 12.76% 15.27% 11.69% —
Table 5.10: Two-Step Prediction RMSE on the HeMMPP trace with Different Training Percentages.
Training
Percentage
Theoretical
Copula
Parametric
Copula
AR(1) LPC(1)
50% 187.3233 191.8553 204.7920 215.7492
60% 187.0732 197.0069 205.4122 208.3386
70% 188.6817 197.3415 207.6971 217.6215
80% 187.6349 189.2811 207.2992 224.9221
90% 200.2666 196.8840 220.5304 250.7282
aRMSE 190.1959 194.4738 209.1463 223.4719
IMP RATIO 14.89% 12.98% 6.41% —
76
5.4 Summary
Both real-world traffic trace and simulated trace are used to evaluate the copula
model and its application to traffic prediction. In Section 5.3.2, the trace BCpAug89
is chosen to evaluate the accuracy of using MMPP copula to model the real-world
trace. Prediction on a transformed trace from BCpAug89 in Section 5.3.2 shows that
copula has much better dependence characterization and makes more accurate prediction than existing models in the presence of traffic transformation. Experiments in
Sections 5.3.3 and 5.3.4 show copula’s good performance on HoMMPP and HeMMPP,
and multi-step dependence modeling.
From all the experiments, copula-based model has advantages over other dependence models (AR(1) and LPC(1)) in three aspects: First, it provides theoretical
dependence structure of MMPP, including one or multiple step dependence of single
MMPP and superposition of MMPPs. Second, it provides more information on dependence beyond linear scope. Third, it is more stable than other models. In the
presence of traffic transformation, copula-based model does not require rebuilding a
new dependence model but still guarantees accuracy.
77
Chapter 6
Application of MMPP Copulas in
Composite Cloud Service
Provisioning
In this chapter, we will use MMPP copulas in Chapter 4 and copula-based prediction
in Chapter 5 to predict arrivals of cloud calls and design a dynamic service provisioning policy accordingly.
6.1 Introduction
Service composition has been broadly used to aggregate a set of services that work
collaboratively to carry out a particular business task [45]. In the area of cloud
computing, service composition has fostered a large service brokerage market, where
cloud brokerages can deploy a new business service by integrating basic services from
a pool of cloud service providers [45]. These basic services are normally offered with
applications in virtual machines over cloud, and as such they are called virtualized
functions (VFs) in the context.
A conceptual model of service composition in cloud computing is shown in Fig. 6.1.
The conceptual model illustrates a large category of composite services. For example,
the composite service could be the service that helps a customer with travel planning,
where VFs consist of flight searching, hotel booking, tour recommendation, payment
service, and so on. To distinguish service requests from end users to the cloud brokerage and the requests from cloud brokerage to VFs, we call the former as tasks and the
78
latter as calls, as shown in Fig. 6.1. As another example, Microsoft recently released
Azure Service Fabric [56], with which the functional parts making up a service are
split into small units that can be individually deployed, updated, distributed, and
scaled. While the smaller units are run in containers rather than directly on VMs,
Azure Service Fabric adopts a similar composite service model.
Composite Service Cloud Brokerage
Basic Service
Providers
VF VF VF
End Users
Tasks
Calls Calls Calls
Figure 6.1: The conceptual diagram of service composition
It is critical that a composite service guarantees quality of service (QoS) to end
users. QoS guarantee requires appropriate resource provisioning, and cloud computing
provides us with an opportunity to dynamically scale up service capacity to alleviate
the negative impact of burst service requests, and to scale down service capacity for
cost saving. Nevertheless, QoS guarantee for composite services poses great challenges
to the cloud brokerage due to the following reasons. First, existing auto-scaling
techniques in commercial cloud, such as Amazon EC2, normally adjust the capacity
of virtual machines (VMs) based on their utilization [1]. This auto-scaling strategy
makes decisions only based on local VM utilization. Without a global view on the
workflow of a composite service, existing auto-scaling techniques [1, 37, 74] may shift
the bottleneck from one VF to another VF, leading to overall poor QoS in the presence
of high task demands. Second, when the volume of task demands changes, without
an accurate modeling or prediction of corresponding changes on the volume of calls
to individual VFs, the auto-scaling of individual VFs may be triggered sequentially.
This increases the delay of auto-scaling for the composite service.
A natural idea to tackle the above challenges is to orchestrate the auto-scaling of
VFs based on a global view of composite service at the cloud brokerage. We call this
idea collaborative auto-scaling, in the sense that VFs scale up/down their capacity
cooperatively to maintain QoS of the composite service.
79
There are several key challenges in designing effective collaborative auto-scaling.
First, how can we capture the dependence in the amount of calls to different VFs?
While a good dependence model lays the foundation for collaborative auto-scaling,
dependence modeling is difficult since a task to the composite service may trigger
different amount of calls at different VFs. In other words, when the arrival rate of
end users’ tasks becomes high, the triggered amount of calls at different VFs may scale
up differently. Second, with the dependence structure captured, how can we utilize
the dependence to predict future service calls to different VFs? Third, how can
we properly adjust the capacity of VFs to guarantee the QoS of a composite service,
taking into consideration the delay in scaling up/down the capacity of VFs and tiered
capacity levels in the cloud environment? This chapter makes use of MMPP copulas
to address the above questions and propose a collaborative auto-scaling policy as well.
6.2 Related Work
Most Cloud providers, e.g., Amazon, Windows Azure, and Google, offer rule based
auto-scaling features to deal with time variant application workloads. Related work in
designing auto-scaling mechanisms include workload forecasting, performance modeling, and cost optimal resource provisioning. In [74], a second order autoregressive
moving average method (ARMA) is used to predict workload and an analysis algorithm for response time is used to find out the bottleneck server (with the highest
utilization) under the predicted workload. In [33], a queuing network is used to model
the relationship among response time, workload, and allocated resource. Kalman filter is used to derive the parameters used in the queuing network model. Cost optimal
resource allocation in auto-scaling can be found in [81, 37, 55]. Systematic design on
monitoring technique and scaling event handlers can be found in [84].
6.3 System Model
We first introduce our mathematical model to study the performance of composite
services. Following the conceptual diagram in Fig. 6.1, we have the following assumptions:
1. We assume that the total number of VFs involved in the composite service is m.
To fulfill a business task from end users, the composite service needs to make
80
a series of calls to VFs. The (call) arrivals to ordered sequence of VFs in the
composite service is also referred as the workflow of the task. As an example
shown in Fig. 6.2, the workflow of Task A is arrivals to VF1, arrivals to VF2,
and arrivals to VF3.
2. We assume that the workflow may pass through m VFs in an arbitrary order
and may pass a VF multiple times. The workflow may also skip a VF.
3. We assume that a task does not trigger parallel calls to VFs. Note that this assumption is needed to avoid the intricacy in modeling the degree of parallelism.
This assumption, however, has no impact on the analytical results, because we
can always decompose a task into sub-tasks such that each sub-task only makes
sequential calls to VFs. In this case, the task is the aggregate of sub-tasks, each
only making sequential calls to VFs.
4. We assume that when the task’s workflow passes VF j each time, the calls to VF
j follows a Poisson arrival process of mean rate ?
j
. Note that this assumption is
made not only because it eases analysis but also because Poisson arrivals have
been used broadly as a good approximation for a variety of random arrivals [25]. MMPP
VF1 VF2 VF3
Workflow of Task A
Virtual Queue
Figure 6.2: A queueing model for composite service
Performance modeling of composite services is difficult, because the workflows of
different tasks may be different and a workflow may pass the same VF for multiple
times. While network of queues is a natural choice for modeling the spatially separated
queues, the non-deterministic order of queues and the multiple occurrences of the same
queue in the queue chain make the analysis challenging.
Intuitively, there are some similarities between a composite service and the widelyused Markov Modulated Poisson Process (MMPP). MMPP assumes that a system
81
could be at different states and the arrivals to the system at different states may have
different arrival rates. If we treat a VF as a state, then MMPP is a good model for a
composite service. For a given task, only one VF, say VF j , works for the task at any
given time instance, implying that the MMPP is at the state j if each VF corresponds
to one state of the MMPP. With this intuition, we overcome the difficulty of modeling
a composite service by approximating the workflow of a task as Markov Modulated
Poisson Process (MMPP).
Remark 8. Instead of focusing on the workflow of individual tasks, we study the
long-term behavior of all tasks. Considering the aggregate workflows as a whole, it is
not easy to model the arrival process as an m-state MMPP any more. Thus we will
use copulas model the aggregate workflows.
To summarize, the composite service is modeled as a single virtual queuing system with MMPP inputs, where each state of MMPP corresponds to an VF. In the
sequel, we will answer the following critical questions in order to use the model for
collaborative auto-scaling of VFs: (1) what is the dependence structure of calls in
the composite service (Sections 6.4)? (2) how can we estimate the total resources of
the virtual queue and accordingly decompose the total resource to that of individual
VFs (Section 6.5)?
6.4 A Copula Model for Latent Dependence Structure in Service Composition
In the system model as we introduced in Section 6.3, the workflow of one task through
the composite service can be approximated with single MMPP. In real-world scenario,
the workflows to composite service systems will be from multiple tasks. Considering a
system serving multiple tasks, the dependence structure among VFs in the composite
service system will actually be presented by the temporal dependence in call arrivals
from multiple tasks. As the system model is the same for all tasks, each task will
follow the same MMPP, and the aggregate task workflow is a HoMMPP as defined
in Chapter 4. HoMMPP is analysed in call arrival counts A
l
i
within small time
intervals ?. In the application discussed in this chapter, A
l
i
represents the number
of call arrivals from l number of tasks to the system in the i-th time slot. With
either theoretical copula or parametric copula, the temporal dependence between A
l
i
82
and A
l
i+i
0 can be modeled. As the theorem and numerous experiments shown in
Chapter 5, the value of A
l
i+i
0 is effectively predicted by an observation of A
l
i
. The
predicted value shows the future call amount, and could be used to auto-scale the
service that composite cloud system provides.
6.5 Collaborative Auto-Scaling of Virtualized Functions
6.5.1 Overview
Since the amount of calls in each time slot is considered as the aggregate of calls to
VFs, this copula model between call arrival counts implicitly captures the dependence
structure of VFs as well. With the help of the copula model, we introduce a strategy
for collaboratively auto-scaling the capacity of VFs. The strategy also guarantees
the utilization of each VF with individual auto-scaling embedded. Our method includes three main steps: (1) establishing the scaling matrix from the copula-based
auto-scaling; (2) establishing the scaling matrix from the utilization-based individual
auto-scaling; (3) collaboratively auto-scaling with the integrated scaling matrix. The
scaling matrix is defined as the amount of capacity to scale up/down. Specifically, a
positive value indicates scale-up and a negative value indicates scale-down.
To unify the measuring unit, both the workload and the capacity are measured
in terms of rate. That is, the workload to the composite service and the capacity of
a VF (or the virtual queue) are measured as the average rate (i.e., number of calls
arrived/served per second). Before introducing the auto-scaling policy, we describe
the following system parameters:
• Observation time interval ?: As we defined in Section 6.4, we divide the time
into time slots. ? is the length of time slot. Each time slot is also the observation
time interval, at the end of which scaling matrices are generated and decision
of collaborative scaling is made.
• Scaling delay: A VF needs time to scale up/down. To align with our previous
copula analysis, the scaling delay is measured in term of time slots, i.e., the
scaling delay is set to d?.
83
• Capacity unit ß : The capacity of VF is tiered at the multiples of ß . This is
because in practice people do not adjust the capacity of VF by an infinitesimal
amount.
• Current capacity: ? is the current total capacity for all VFs, and ?
j
is the
capacity for VF j .
6.5.2 Copula-based Scaling Matrix
Copula-based scaling matrix considers the predicted workload. As we discussed in
the above sections, the arrivals to the composite service system are considered as a
HoMMPP. With copula modeling, the temporal dependence structure of the aggregate
flows can be revealed and exploited to make inference on the future workload. Due
to the scaling delay, we should construct the copula between A
l
i
and A
l
i+d+1
. Based
on A
l
i
sample value x
i
, A
l
i+d+1
is predicted by copula according to Theorem 17 or
Theorem 20. The prediction of A
l
i+d+1
is denoted as ˆ x. The capacity needs to be
adjusted to ˆ x/? to satisfy the expected workload. Below we outline the auto-scaling
procedure.
1. Scaling trigger: if ˆ x/? - ? > mß , the virtual capacity needs to scale up; if
? - ˆ x/? > mß, the virtual capacity needs to scale down.
2. Scaling dispatcher: if copula-based scaling is triggered, the predicted call arrival
count is decomposed to VF j based on the stationary distribution of the CTMC
associated to the modeling MMPP as: ˆ x(j ) = p
j
ˆ x. The capacity of VF j should
be adjusted to the level of µ
j
, such that (µ
j - 1)ß < ˆ x(j )/? = µ
j
ß .
Overall, the copula-based scaling matrix for VF j is defined as
Sc =
(
µ
j
ß - ?
j
if | ˆ x/? - ? | > mß
0 otherwise.
(6.1)
6.5.3 Utilization-based Individual Scaling Matrix
Utilization-based scaling is a traditional method for individual scaling [55]. The
utilization-based individual scaling matrix is defined to guarantee that the utilization
of each individual VF is not too high. This utilization-based scaling matrix carries
information of whether the VF has a high amount of backlogs. This scaling matrix
is also able to offset the prediction errors from the copula.
84
Given the utilization in the observation interval ? for the VF j , %
j , the utilizationbased individual scaling matrix is defined as
Su =
(
ß if %
j
is high, e.g., > 0.9
0 otherwise.
(6.2)
6.5.4 Integrated Scaling Matrix
For collaborative auto-scaling, we consider both workload information from Sc
and
the historical backlog information from Su
. Integrating the two matrices, the final
collaborative scaling matrix Sg
for a VF is calculated based on Table 6.1. The main
idea is to scale up capacity when either the copula-based scaling matrix or utilizationbased scaling matrix is positive, and to scale according to copula-based scaling matrix
otherwise. This collaborative method can quickly modify the capacity following future
workload trend without causing bottleneck or over provisioning in individual VFs.
Table 6.1: Calculation of Collaborative Scaling Matrix Sg
Sc > 0 Sc
= 0 Sc < 0
Su > 0 Sc + Su Su Su
Su
= 0 Sc
0 Sc
6.6 Performance Evaluation
To the best of our knowledge, there is no trace data for composite services currently
available to the public. Due to this reason, we first study a real-world trace data
for cloud requests, showing that MMPP indeed can be used to model the workflow
of cloud requests. With the learned parameter from the real-world trace data, we
then generate synthetic data with multiple workflows so that the performance of
auto-scaling could be evaluated.
6.6.1 MMPP modeling of Real-world Cloud Trace
We first evaluate the effectiveness of MMPP modeling on Google cluster data [85],
which is widely used for cloud computing performance analysis. Google cluster data
85
records arrival information to about 11,000 machines over a long period of 29 days in
May 2011. The recorded data type related to our modeling are listed here:
• Time Stamp - arrival time in seconds of tasks since the start of data collection,
• TaskID - unique identifier of the executing task,
• JobID - unique identifier of the job to which the task belongs.
In our framework, calls are equivalent to the tasks in Google cluster data, tasks are
equivalent to the jobs in Google cluster data. To align Google cluster data modeling
with our previous analysis, the tasks in Google Cluster data is hereinafter called as
calls, the jobs in Google cluster data called as tasks. Using this terminology, each
task contains a series of calls to the Google cluster.
Recall that we model the workflow of a single task as MMPP, and the workflow
is divided into small time slots for analysis. To match with the model and analysis,
we set the length of time slot as ? = 300 seconds (5 minutes) and pre-process the
Google cluster trace as follows:
1. count the number of call arrivals in every ? seconds, denoted as Ai
(call),
2. count the number of task arrivals in every ? seconds, denoted as Ai
(task ),
3. normalize the call arrivals with the number of tasks, i.e., Ai =
Ai (call)
Ai (task)
The normalized call arrivals Ai
could be regarded as the workflow of a single task to
the Google cluster. We choose the normalized call arrivals Ai
in the first 24 hours
for modeling. The Google trace Ai is fitted into a MMPP model with the algorithm
proposed in [39]. The learned MMPP is a 7-state MMPP with parameters (Q, ?) as
shown in Eq. (6.3). The unit of those parameters is second.
The common method to evaluate goodness of fitting real trace into MMPP is to
compare a simulated trace from learned MMPP with the real trace statistically. Thus
we generated a simulated trace for a duration of 24 hours according to the learned
parameters (Q, ?) in Eq.(6.3). The simulated arrivals are grouped into every 300
seconds, and then compared with Google trace in two statistical aspects - first two
order of moments and distribution feature. The first two order of moments, including
mean value, standard deviation (std), and skewness are compared in Table 6.2. The
first two order of moments of the two traces are quite close, indicting that these two
traces have similar statistical properties. Their distribution features are compared
86
Q =
?
?
?
?
?
?
?
?
?
?
-0.0033 0 0 0.0008 0 0.0025 0
0 -0.0034 0.0008 0.0013 0.0013 0 0
0 0.0002 -0.0023 0.0014 0.0005 0.0002 0
0.0001 0.0001 0.0002 -0.0024 0.0014 0.0006 0
0.0001 0.0001 0.0001 0.0007 -0.0022 0.0012 0
0 0.0001 0.0001 0.0004 0.0010 -0.0016 0
0 0 0 0 0 0.0033 -0.0033
?
?
?
?
?
?
?
?
?
?
,
? = (0.7594, 0.5715, 0.4102, 0.2756, 0.1677, 0.0865, 0.0319).
(6.3)
with Quantile-Quantile (Q-Q) plot in Fig 6.3. Both moments results and Q-Q plot
figure shows that Google trace and the simulated trace have the same statistical
behaviour and come from the same distribution.
Table 6.2: Comparison of The First Two Order of Moments of Arrival Counts in
Every 300 Seconds
Mean Std Skewness
Google trace 56.5882 37.9708 2.2016
Simulated trace 57.6021 37.5139 1.8323
6.6.2 Performance Evaluation with Synthetic Data
Our investigation on the Google trace data discloses that the workflow of a task
submitted to cloud can be modeled with an MMPP model. Nevertheless, there is
no trace data for composite services currently available to the public, and it is thus
unclear how, and whether or not, the calls correspond to composite services. To
overcome this problem, we evaluate the proposed collaborative auto-scaling method
with synthetic data, created with simulation that aggregates multiple homogeneous
workflows, each modeled with an MMPP with parameters learned from the Google
trace as shown in Eq. (5.9). The synthetic aggregate workflows last for 48 hours and
are equally split into two parts over time. The first half is used to train the copula
parameter so as to model the temporal dependence of the aggregate homogeneous
MMPPs. The second half is used as the input to a simulated composite service
87
0 50 100 150 200 250 300
0
50
100
150
200
250
300
Simulated trace
Google Trace
Figure 6.3: Q-Q plot of arrival counts in every 300 seconds
system with the implementation of our proposed collaborative auto-scaling policy.
The parameters for simulated composite system are listed in Table 6.3.
Table 6.3: Parameters of Simulated Composite System
Simulation duration 24 hours
Number of workflows l 50
Observation time interval ? 300 seconds
Scaling delay (d = 1) 300 seconds
Capacity unit ß 0.01 per second
In order to implement and evaluate our solution to resource provisioning in cloud
composite service system, we first model the temporal dependence of aggregate workflows (equivalently the dependence between VFs) with parametric copula as discussed
in Section 4.5. With the parametric copula disclosing the dependence structure in
composite services, the collaborative auto-scaling described in Section 6.5 is implemented. The proposed collaborative scaling will be compared with the traditional
utilization-based individual scaling.
Copula modeling for Aggregate MMPPs
Considering the scaling delay, we need to construct the copula between A
l
i
and A
l
i+d+1
,
where A
l
i
represents the call arrival counts of l aggregate MMPP workflows in i-th
time slot.
88
We use the mixture of Gumbel and Clayton copula to model temporal structure
of HoMMPP:
C (u, v; ?
1
, ?
2
) =0.5 * exp[-((- log u)
?1
+ (- log v)
?1
)
1/?
1
]
+ 0.5 * (u
-?2
+ v
-?2
- 1).
(6.4)
Gumbel copula is powerful in capturing upper tail dependence; Clayton copula, on
the contrary, is used to model the lower tail dependence. This chosen parametric
copula will be able to characterize the sudden increase and decrease in the MMPP
workflows, and model the temporal dependence of MMPPs well. The first half of the
synthetic aggregate workflows is fitted into the chosen copula to obtain the copula
parameters as ?
1
= 1.4994, ?
2
= 1.1654.
Performance of collaborative auto-scaling
With the parametric copula built for HoMMPP, we can make inference on the arrival
trend of the synthetic aggregate workflows. That is, given a observation of A
l
i
= x
i
,
we predict the future call arrival count A
l
i+d+1
. The inference on the second half
of the synthetic aggregate workflows is shown in Fig. 6.4. The y-axis in the figure
represents the number of aggregate arrivals within one time slot (5 minutes). From
the figure, the predicted call arrival counts are close to the real call arrival counts.
The accuracy of the prediction is also quantified by mean absolute percentage error
(MAPE), defined as
MAPE =
1
n
n X
i=1
| ˆ x
i - x
i
|
x
i
, (6.5)
where ˆ x
i
is the prediction for arrival count in i-th time slot and x
i
is the real observed
arrival count, n is the number of time slot in prediction period. The accuracy of
copula-based inference is 0.0613, demonstrating the power of dependence structure
modeled by copulas.
The collaborative auto-scaling is implemented following the policy in Section 6.5.
We also implemented the traditional individual scaling algorithm for comparison.
With individual scaling strategy, the capacity of each VF scales up ß when its utilization is above 0.7, and scales down ß when its utilization is below 0.2 [55]. We use
the following performance matrices to compare the two auto-scaling strategies:
• Average response time of calls in seconds (ART): the total duration from the
time when a call arrives to the time when the call departs over number of calls;
89
0 50 100 150 200 250 300
No. of time slot
2000
2500
3000
3500
4000
Number of arrivals
Synthetic Arrival Counts
Predicted Arrival Counts
Figure 6.4: Copula-based inference on call arrival counts
• Average cost (AC): the total number of capacity units over number of time slots
in the whole simulation duration.
Table 6.4: Simulation results with initial capacity as ?
j
= 1
VF1 VF2 VF3 VF4 VF5 VF6 VF7 Virtual Queue
Collaborative ART 1067.1 546.8 776.6 126.1 51.5 50.3 320.6 252.5
Scaling AC 64.9 93.2 122.2 309.6 307.2 175.6 2.2 1075.3
Individual ART 160.4 72.1 110.1 13760.5 13952.2 1010.7 133.8 7809.6
Scaling AC 108.9 140.7 167.9 243.5 243.5 215.0 19.3 1138.9
Table 6.5: Simulation results with initial capacity as ?
j
= 2
VF1 VF2 VF3 VF4 VF5 VF6 VF7 Virtual Queue
Collaborative ART 1067.5 547.3 756.6 124.8 46.7 50.1 320.6 248.5
Scaling AC 64.9 93.4 123.5 309.5 307.6 175.9 2.5 1077.6
Individual ART 51.3 15.8 17.4 1234.1 1322.3 1.0 44.7 751.3
Scaling AC 157.3 187.3 212.4 331.7 329.6 255.3 70.5 1544.5
The simulation results are shown in Table 6.4 and Table 6.5. Without using any
prior-knowledge, we initialize the capacity of seven VFs equally. For experiment of
Table 6.4, we choose small initial values, i.e., ?
j
= 1 (j = 1, . . . , 7). For experiment
of Table 6.5, we choose large initial values, i.e., ?
j
= 2 (j = 1, . . . , 7). Using
these two experiments, we investigate whether auto-scaling can adjust the capacity
90
following the actual workload quickly, and at the same time keep the response time
and cost as small as possible. Table 6.4 and 6.5 record the performance matrices
of each VF, as well as that of the virtual queue. Since virtual queue is an abstract
concept for the integration of all the VFs, its performance matrices are, in fact,
the performance matrices of the whole composite service system. From the level of
virtual queue, we can observe that the collaborative auto-scaling performs better than
individual scaling in the measure of both average response time and average cost. The
results indicate that the copula modeling of dependence structure is effective. The
collaborative auto-scaling makes good use of the prediction information from copulas
to reduce the total cost while maintaining a small response time.
6.7 Summary
We have presented a new collaborative auto-scaling algorithm based on the temporal
dependence of call arrives in cloud-based composite services. A key insight in our work
is to model a task to the composite service as an MMPP. This, in turn, allows us to use
copula analysis of MMPPs for understanding the dependence structure between calls
to a composite service as well as predicting future calls. Our technical contributions
include applying parametric copula models for incoming call prediction. Using realworld trace data and synthetic data, we have demonstrated that our collaborative
auto-scaling method performs much better than the traditional auto-scaling method
in which each VF auto-scales its capacity independently based on its local view of
VF utilization.
91
Chapter 7
Application of MMPP Copulas in
Parameter Estimation
In this chapter, we apply the theoretical copula in Chapter 4 to develop an accurate
and fast parameter estimation method for MMPP.
7.1 Introduction
MMPP can capture a large range of traffic types, ranging from multimedia traffic,
Poisson traffic, to burst traffic [31, 41, 79]. For all the applications of MMPP, the
parameter estimation method is necessary for modeling.
The parameter estimation problem of MMPP has been studied for decades. Existing estimation methods can be broadly split into two categories. One category of work
is maximum likelihood (ML) estimation with its implementation via expectationmaximization (EM) algorithm. Among the existing work, most research methodologies estimate MMPP parameters using data of inter-arrival times (or arrival times),
and very few estimation methods can deal with data of number of arrivals over evenlyslotted time, which we call arrival counts in this thesis. In addition, none of the
existing methods have utilized the functional dependence structure in MMPP traffic,
that has the potential to further enhance the performance of parameter estimation.
In practice, there are some scenarios where arrival counts data is much more easier
to capture and process. For instance, in Chapter 3, the aggregate of Skype flows is
studied in terms of arrival counts in order to estimate its queueing performance. To
give another example, arrival counts data is always used in performance monitoring
92
tools such as Windows Performance Monitor [63] to save memory resources especially
for long-term recording. Since arrival counts are a more readily available form of
data from most performance monitoring tools and, in addition, given the high cost
of capturing and storing inter-arrival times, we are motivated to build an efficient
estimation with only arrival counts data.
Nevertheless, the convenience of using arrival counts comes with a cost, since the
arrival counts group arrivals within a time slot and thus contain less information than
data of exact inter-arrival times. The loss of information makes estimation with arrival
counts much more challenging than that with inter-arrival times. Up till now, only
a few papers have proposed estimation methods that can learn MMPP parameters
from arrival counts [6, 39, 13, 63]. Compared with the extensive studies of MMPP
estimation based on inter-arrival times, MMPP estimation based on arrival counts is
a relatively new topic and, for reasons described above, a much harder problem.
We tackle this challenging problem by utilizing the MMPP copula derived in
Chapter 4. Traditional ML estimation emphasizes on likelihood, which is a variable
representing joint behavior of the whole process, leading to high computational cost.
In this chapter, we consider the joint behavior of successive arrival counts, i.e., Ai
and Ai+1
. Taking advantage of the copula, the joint behavior of successive arrival
counts can be modeled by studying marginal distribution of arrival counts and copula
between Ai
and Ai+1
separately. Thus an estimation algorithm, termed as MarCpa, is
proposed in this chapter to estimate MMPP parameters from arrival counts. MarCpa
is fast and accurate, and it only includes two basic steps: one for marginal matching
and one for copula matching.
7.2 Related Work
The parameter estimation problem of MMPP has been studied for several decades.
According to different fitting objectives, the traditional estimation algorithms can
be mainly categorized into two groups: the maximum likelihood estimation (MLE)
algorithms and the moment-based algorithms. The former type was shown to achieve
consistent results [75]. The MLE-based algorithms were implemented via expectationmaximization (EM) algorithm in [76]. Ryd´en’s EM algorithm for estimating MMPP
in [76] was further enhanced to ease the calculation of integrals [72, 28], and to
estimate parameters from observations of either arrival times or arrival counts [13,
28].
93
The moment-based algorithms learn MMPP parameters by finding the moments,
such as marginal moments and autocovariance. Compared with the MLE-based algorithms, the moment-based algorithms are usually fast and emphasize more on emulating specific dependence structures of real traces. For instance, moment-based
algorithms were broadly used to emulate the self-similarity or long range dependence
(LRD) of network traffic [2, 3, 87, 47, 77, 80]. The superposition of 2-state MMPPs
was shown to be capable of modeling the self-similarity [2]. A high dimensional
MMPP is constructed with superposition of 2-state MMPPs, because the moment of
a 2-state MMPP is easy to compute. Following this idea, the superposition of 2-state
MMPPs has been used to model the self-similarity or LRD of network traffic traces
by matching their asymptotic covariances [3] or exact variances over different time
scales [47, 77, 80, 87]. The learned MMPP parameters were integrated into queueing
theory to predict the queueing performance.
In addition to the above two main categories, other fitting algorithms have also
been developed. Algorithms were developed to fit IP traces into discrete MMPP by
assuming that the Poisson arrivals of each state fall into certain range of variation [6,
39]. A Bayesian learning algorithm based on the posterior probability was developed
to model and detect the bursty events [41]. The most recent algorithm learns MMPP
parameters by first detecting the points of state switching and then estimating the
arrival rates at the corresponding state [14].
Among all the literature in MMPP parameter estimation, most utilized the interarrival times, and only a few (e.g., [6, 39, 13, 63]) utilized arrival counts, which are
related to our work of Chapter 7. Our work in Chapter 7 also uses arrival counts but
differs significantly from the related works, since none of existing works used copula to
analyse MMPP. We develop a two-step estimation method under this new analytical
framework.
7.3 Copula-based Parameter Estimation of MMPP
With copula analysis of arrival counts in MMPP in Section 4.4.1, we develop an
estimation method, called MarCpa, which consists of two matching steps: 1) matching
theoretical marginal distribution of arrival counts with empirical marginal distribution
from traces to learn the parameters ? = (p
1
, · · · , pm) and ? = (?
1
, · · · , ?m); 2) after
? and ? are determined, matching theoretical copula into empirical copula from traces
to determine the rest parameter Q. With the two steps, the proposed estimation
94
method will fully model the joint behavior of successive arrival counts. Moreover, our
method with two separate matching steps will keep computational cost low. In the
rest of this section, we will explain the proposed estimation method step by step.
7.3.1 Matching Marginal Distribution
The goal of this step is to match the empirical distribution of arrival counts of the
sample trace with theoretical distribution. Given a sample trace {x
i
}
1=i=n
with n
number of arrival counts observed, the empirical distribution value is calculated as
ˆ u
i =
ˆ
M (x
i ) =
1
n
n X
i
0
=1
1(x
i
0 = x
i
), ?1 = i = n. (7.1)
The goal is to minimize the difference between the theoretical marginal distribution and the empirical marginal distribution, i.e., to minimize the following objective
function W1
W1 =
n X
i=1
(u
i - ˆ u
i
)
2
, (7.2)
where u
i = M (x
i
) is calculated with Theorem 10.
The parameters involved in marginal distribution matching are ? and ?. Considering that p
1 + p
2 + · · · + pm = 1, pm can be always determined by (p
1
, · · · , pm-1
).
Thus there are only 2m - 1 parameters to estimate in this step. These parameters
are combined into one vector as T
1
= (p
1
, · · · , pm-1
, ?
1
, · · · , ?m). The parameter
estimation in this step turns out to be an optimization problem of the following form:
T1
= argmin
T1
W1
,
subject to
?
?
?
?
?
?
?
?
?
0 = p
1
, · · · , pm-1 = 1,
0 = p
1 + · · · + pm-1 = 1,
?
1
, · · · , ?m = 0.
(7.3)
The optimization in Eq. (7.3) is a constrained non-linear problem. The existing
methods to directly deal with constrained non-linear optimization include geometric
programming, quadratic programming, gradient-based methods, and metaheuristic
methods such as genetic algorithm and simulated annealing [34]. Geometric programming and quadratic programming cannot be used here because the objective function
95
W1 is a function of parameters T
1
, which is much more complex than geometric
or quadratic. Gradient-based method, which finds local optima, works efficiently in
memory and computation. It can often find reasonably good solutions in a relatively
short time. Because of this, it has been widely used to solve non-linear optimization
in many applications [52]. Genetic algorithms and simulated annealing search for the
global optima. As they iterate randomly, these two algorithms suffer from uncertain
outcomes and may find solutions very slowly. Therefore, we use a gradient-based
method, gradient descent, to solve the optimization in Eq. (7.3). The key steps of
gradient descent method are parameter initialization, gradient derivation, choice of
step size, and stopping criteria, which will be explained in detail below.
Parameter Initialization
The first step of our gradient descent method, is to initialize the values of parameters
? and ?. ? is initialized as local maxima on the frequency of observed arrival rate.
Given a sample sequence of arrival counts {x
i
}
1=i=n
, the arrival rate sequence is
{x
i
/?}
1=i=n
. Detecting the local maxima on frequency of the arrival rate sequence
helps to locate the most frequent but distinct arrival rates appearing in the sample,
and these detected rates are reasonable initial values for ?.
Fig. 7.1 shows an example where a 3-state MMPP is initialized with ?
(0)
=
(1, 8, 16) based on detection of local maxima on arrival rate frequency. The number of local maxima to detect can be set as known or unknown, which means we
can either specify the number of states m or leave it to be automatically determined
as the number of local maxima that the program could find. Thus the estimation
method is flexible about the choice of number of states of MMPP.
To determine the stationary distribution ?, we need to first initialize the state in
every time slot Si
. The initial value of the state in i-th time slot Si
is set as the state
that has the closest arrival rate to observed arrival rate, i.e.,
S
(0)
i
= arg
m
min
j =1
|?
(0)
j
- x
i
/?|, i = 1, 2, · · · , n.
Based on initial values of S
(0)
i
, the stationary distribution is initialized as
p
(0)
j
=
1(S
(0)
i
= j )
n
, j = 1, 2, · · · , m - 1.
96
0 5 10 15 20
Arrival Rate Value
0
50
100
150
200
250
300
Frequency
1
8
16
Figure 7.1: An example of the initialization of parameter ?
Gradient of Parameters
The key step of gradient descent method is to obtain the gradient of parameters.
In our problem, the parameter gradient
?W1
?T1
consists of
?W1
?pj
for j = 1, 2, · · · , m - 1
and
?W1
??j
for j = 1, 2, · · · , m. The closed-forms for the two gradients are derived in
Theorems 21 and 22:
Theorem 21. The gradient of distribution probability p
j
is
?W
1
?p
j
=
n X
i=1
2(u
i - ˆ u
i
)(Gj
(x
i
) - Gm(x
i
)), j = 1, · · · , m - 1. (7.4)
Proof. Based on the marginal u
i
given in Theorem 10 Chapter 4, the gradient is
derived as follows:
?W
1
?p
j
=
n X
i=1
2(u
i - ˆ u
i
)
?u
i
?p
j
=
n X
i=1
2(u
i - ˆ u
i
)
? (
m-1 P
j
0
=1
p
j
0 Gj
0 (x
i
) + (1 -
m-1 P
j
0
=1
p
j
0 )Gm(x
i
))
?p
j
=
n X
i=1
2(u
i - ˆ u
i
)
? (Gm(x
i ) +
m-1 P
j
0
=1
p
j
0 (Gj
0 (x
i
) - Gm(x
i
)))
?p
j
=
n X
i=1
2(u
i - ˆ u
i
)(Gj
(x
i
) - Gm(x
i
)).
97
Theorem 22. The gradient of arrival rate ?
j
is
?W
1
??
j
=
n X
i=1
-2p
j
?(u
i - ˆ u
i
)e
-?j ?
(?
j
?)
x
i
x
i
!
, j = 1, · · · , m. (7.5)
Proof. Based on the marginal u
i
given in Theorem 10 in Chapter 4,
?W
1
??
j
=
n X
i=1
2(u
i - ˆ u
i
)
?u
i
??
j
=
n X
i=1
2(u
i - ˆ u
i
)
? (
m P
j
0
=1
p
j
0 Gj
0 (x
i
))
??
j
=
n X
i=1
2(u
i - ˆ u
i
)p
j
?G
j
(x
i
)
??
j
,
where
?G
j
(x
i
)
??
j
=
? (e
-?j ?
P
x=x
i
x=0
(?j ?)
x
x!
)
??
j
=
? (e
-?j ?
)
??
j
+
x
i X
x=1
? (
e
-?
j
?
(?j ?)
x
x!
)
??
j
= - ?e
-?j ?
+ ?
x
i X
x=1
? (
e
-?
j
?
(?j ?)
x
x!
)
? (?
j
?)
= -?e
-?j ?
- ?
x
i X
x=1
e
(-?
j
?)
(?j ?)
x
x!
+ ?
x
i X
x=1
e
(-?
j
?)
x(?j ?)
x-1
x!
= - ?e
-?j ?
- ?
x
i X
x=1
e
(-?
j
?)
(?j ?)
x
x!
+ ?
x
i X
x=1
e
(-?
j
?)
(?j ?)
x-1
(x-1)!
= - ?e
-?j ?
- ?
x
i X
x=1
e
(-?
j
?)
(?j ?)
x
x!
+ ?
x
i -1 X
x=0
e
(-?
j
?)
(?j ?)
x
x!
= - ?e
-?j ?
- ?
e
(-?j ?)
(?
j
?)
x
i
x
i
!
+ ?e
-?j ?
= - ?
e
(-?j ?)
(?
j
?)
x
i
x
i
!
.
With the gradients derived in Theorems 21 and 22, the parameters in each iterative
98
step is updated to T
(r+1)
1
= T
(r)
1
- a
(r) ?W1
?T1
|
T1=T
(r)
1
, where the specific updates are
p
(r+1)
j
= p
(r)
j
- a
(r)
?W
1
?p
j
|
T1=T
(r)
1
j = 1, 2, · · · , m - 1;
?
(r+1)
j
= ?
(r)
j
- a
(r)
?W
1
??
j
|
T1=T
(r)
1
j = 1, 2, · · · , m.
Note that p
(r+1)
m is always determined by p
(r+1)
m = 1 - p
(r+1)
1
- · · · - p
(r+1)
m-1
. In above
updates, a
(r)
is the step-size of r-th iterative step. The choice of step-size is discussed
next.
Choice of Step-size
The step-size a
(r)
is a positive value, which can be changed as iteration number r
increases. For the optimization problem in Eq. (7.3), since we are required to consider
the constraints on parameters, the step-size will be adjusted accordingly to guarantee
that the constraints are satisfied. As the initial parameters T
(0)
1
certainly satisfy the
constraints, we only need to guarantee that every T
(r+1)
1
obtained from T
(r)
1
satisfies
the constraints. Specifically, the step-size of r-th iteration a
(r)
is randomly chosen as
a positive number satisfying the constraints:
?
?
?
?
?
?
?
?
?
0 = p
(r)
j
- a
(r) ?W1
?pj
|
T1=T
(r)
1
= 1, j = 1, 2, · · · , m - 1;
0 =
P
m-1
j =1
p
(r)
j
- a
(r)
P
m-1
j =1
?W1
?pj
|
T1=T
(r)
1
= 1;
?
(r)
j
- a
(r) ?W1
??j
|
T1=T
(r)
1
= 0, j = 1, 2, · · · , m.
(7.6)
Stopping Criteria
The iteration continues until it meets some predetermined criteria. Two stopping
criteria are considered in our gradient descent progress: 1) the iteration r reaches a
predetermined maximum number of iteration n Itr; 2) the decreasing ratio of the objective function W1
drops below a preset threshold th, i.e.,
W
(r-1)
1
-W
(r)
1
W
(r-1)
1
= th. Whenever any of the two stopping criteria is satisfied, the iteration stops and values of
parameters ? and ? are returned as output.
99
7.3.2 Matching Copula
In the second step, the theoretical copula in Theorem 11 in Chapter 4 is matched to
the empirical copula calculated from the trace. Given a sample trace of arrival counts
{x
i
}
1=i=n
, empirical copula value of successive arrival counts is
ˆ
?
i =
1
n - 1
n-1 X
i
0
=1
1(x
i
0 = x
i
, x
i
0
+1 = x
i+1
), ?1 = i = n - 1. (7.7)
The goal of the matching is to minimize the difference between theoretical copula of
successive arrival counts and their empirical copula as represented by W2
:
W2 =
n-1 X
i=1
(?
i -
ˆ
?
i
)
2
, (7.8)
where ?
i
is calculated from theoretical copula given in Theorem 11, i.e., ?
i = C1
(M (x
i
), M (x
i+1
)).
With the parameters ? and ? determined, the parameters required to estimate ?
i
are
entries of matrix P (?) = [p(?)]m×m. Thus, we obtain the following optimization
problem:
P (?) = argmin
P (?)
W2
,
subject to
?
?
?
?
?
?
?
?
?
?P (?) = ?,
P
m
j2
=1
p
j1j2
(?) = 1, j
1
= 1, 2, · · · , m
p
j1j2
(?) = 0.
(7.9)
The optimization problem in Eq. (7.9) is a classical quadratic programming problem with linear constraints. To make it clearer, we now define several vectors and
matrices to illustrate how the problem will be solved:
• Parameter vector T2
T2
is a m
2
× 1 parameter vector reshaped from P (?) in the way:
T2
= (p
11
(?), p
21
(?), · · · , pm1
(?), p
12
(?), p
22
(?), · · ·
, pm2
(?), · · · , p
1m(?), p
2m(?), · · · , pmm(?))
T
,
i.e., the k-th element in T
2
is p
j1j2
(?) where j
1
= (k - 1)%m + 1, j
2
= [(k -
1)/m] + 1, % is modulo operation and [·] operation rounds down values to
100
integers.
• Coefficient matrix H
H is a (n - 1) × m
2
dimensional matrix with its elements as
h
ik = Gj2
(x
i+1
)Gj1
(x
i
)p
j1
where j
1
= (k - 1)%m + 1 and j
2
= [(k - 1)/m] + 1.
Based on Theorem 11, ?
i = h
i
* T2
, where h
i
is the i-th row vector of H.
Moreover, we have
?
?
?
?
?
1
.
?
n-1
?
?
?
?
= HT2
.
• Constraints coefficient matrix E
E is a 2m × m
2
matrix with all non-zero elements defined as
Eik = p
j
for
i = 1, · · · , m, j = 1, · · · , m and k = (i - 1)m + j,
Eik
= 1 for
i = m + 1, · · · , 2m, j = 1, · · · , m and k = (j - 1)m + i.
• Constraints vector b
The vector b is a 2m × 1 vector defined as b = (p
1
, p
2
, · · · , pm, 1, 1, · · · , 1)
T
.
Example 4. Taking a 2-state MMPP as an example, the four vectors or matrices
defined above are in the following forms:
T2
= (p
11
(?), p
21
(?), p
12
(?), p
22
(?))
T
,
H =
?
?
?
?
?
?
p
1G1
(x
1
)G1
(x
2
) p
2G2
(x
1
)G1
(x
2
) p
1G1
(x
1
)G2
(x
2
) p
2G2
(x
1
)G2
(x
2
)
p
1G1
(x
2
)G1
(x
3
) p
2G2
(x
2
)G1
(x
3
) p
1G1
(x
2
)G2
(x
3
) p
2G2
(x
2
)G2
(x
3
)
. . . .
p
1G1
(x
n-1
)G1
(x
n
) p
2G2
(x
n-1
)G1
(x
n
) p
1G1
(x
n-1
)G2
(x
n
) p
2G2
(x
n-1
)G2
(x
n
)
?
?
?
?
?
?
,
101
E =
?
?
?
?
?
?
p
1
p
2
0 0
0 0 p
1
p
2
1 0 1 0
0 1 0 1
?
?
?
?
?
?
,
b = (p
1
, p
2
, 1, 1)
T
.
With T2
, H , E, b defined, T
2
fully represents P (?), H helps derive objective
function W2 in terms of T
2
, and E and b characterize the constraints on T
2
. The
optimization problem in Eq. (7.9) is reformulated as
T2
= argmin
T2
W2
= argmin
T2
n-1 X
i=1
(?
2
i
- 2
ˆ
?
i
?
i
)
= argmin
T2

?
1
· · · ?
n-1

?
?
?
?
?
1
.
?
n-1
?
?
?
?
- 2

ˆ
?
1
· · ·
ˆ
?
n-1

?
?
?
?
?
1
.
?
n-1
?
?
?
?
= argmin
T2
1
2
T
T
2
H
T
HT2 -

ˆ
?
1
· · ·
ˆ
?
n-1

H T2
subject to
?
?
?
ET2 = b,
T2 = 0.
(7.10)
Now the problem in Eq. (7.10) becomes clear as a classic quadratic programming
with linear constraints. We thus use quadprog, a solver from Matlab optimization
toolbox, to get the optimal values in T
2
. According to the mapping rule between
elements’ indexes, P (?) is easily to obtain from T
2
.
The final step to complete parameter evaluation is to recover the rate matrix Q
from transition probability matrix P (?). As ? is small, infinitesimal term o(?) is
ignorable. Based on Eq.(4.1), Q can be approximated from P (?) with Eq. (7.11).
q
j1j2
= (p
j1j2
(?) - 1)/?, j
1 = j
2
;
q
j1j2
= p
j1j2
(?)/?, j
1
6 = j
2
.
(7.11)
7.3.3 A Summary of MarCpa Algorithm
Matching marginal distributions in Section 7.3.1 and matching copula in Section 7.3.2
are combined to make our proposed MMPP parameter estimation algorithm, Mar-
102
Cpa algorithm, the sketch of which is shown in Algorithm 4. The step of matching
marginal distributions solves a constrained non-linear optimization problem with gradient descent, with the time complexity O(m×n×n Itr). The step of matching copula
solves a quadratic program with linear constraints. Since we use Matlab solver quadprog in this step, its time complexity depends on how Matlab implements its solver.
Although MarCpa uses existing algorithms, gradient descent and Matlab solver, it
is the first time that the estimation process is decomposed into separate steps with
copula to ease analysis.
Algorithm 4 MarCpa Algorithm
Require: a sequence of arrival counts {x
i
}, the length of time slots ?
Ensure: MMPP parameters ? and Q
1: // First step: matching marginal distributions
2: // Note that n Itr and th are maximum iteration number and threshold value
defined in Section 7.3.1
3: Determine initial parameters ?
(0)
and ?
(0)
according to Section 7.3.1, and compute initial objective function W
(0)
1
;
4: Initialize ? = ?
(0)
, ? = ?
(0)
, W1 = W
(0)
1
;
5: for r ? 0 : n Itr - 1 do
6: Choose a proper step-size a
(r)
according to Section 7.3.1;
7: Update parameters ?
(r)
to ?
(r+1)
, ?
(r)
to ?
(r+1)
based on Section 7.3.1, and
compute objective function W
(r+1)
1
;
8: if W
(r+1)
1
< W1 then
9: ? = ?
(r+1)
, ? = ?
(r+1)
, W1 = W
(r+1)
1
;
10: end if
11: if (W
(r)
1
- W
(r+1)
1
)/W
(r)
1
= th then
12: Break;
13: end if
14: end for
15: // Second step: matching copulas
16: Construct matrices H and E, vector b based on their definitions in Section 7.3.2;
17: Obtain the optimal value of T2
by inputting H , {
ˆ
?
i
}
1=i=n-1
, E, b, and 0 in a
proper form to qradprog solver;
18: Reshape T2
to P (?);
19: Recover parameter Q from P (?) according to Eq.(7.11).
103
7.4 Performance Evaluation
In this section, the performance of MarCpa is evaluated with a large number of simulations. We first use one simulated sample trace as a concrete example that presents
ground truth along with the estimated parameters. The comparison of estimated
parameters with ground truth parameters illustrates how well MarCpa retrieves parameters from arrival counts. As a further step, the evaluation is conducted on multiple simulations. The average goodness-of-fitting of multiple independent simulations
quantifies the performance of MarCpa over different parameter settings. We also
compare the performance of MarCpa with that of existing Expectation-Maximization
(EM) learning algorithm (e.g., the one in [63]) and non-EM algorithm learning algorithm (e.g., the one in [39]).
7.4.1 Performance Evaluation Based on Ground Truth
We consider a 2-state MMPP with parameters
Q =
q
11
q
12
q
21
q
22
!
=
-1 1
0.1 -0.1
!
,
? = (?
1
, ?
2
) = (10, 1).
A trace was generated with simulation according to the above parameters for a period
of 1000 unit of time. We group the arrivals within every 1 unit of time, i.e., ? = 1.
The arrival counts of this trace are shown in Fig. 7.2.
0 200 400 600 800 1000
0
5
10
15
20
No. of time slot
Number of arrivals
Figure 7.2: Arrival counts of simulation trace.
104
Table 7.1: Estimated parameters for the simulation trace.
q
11
q
22
?
1
?
2
MarCpa -1.0000 -0.0834 10.0000 1.0400
EM -1.0650 -0.1070 10.4320 0.9508
non-EM -0.5896 -0.0925 9.7500 1.2614
Ground truth -1 -0.1 10 1
The estimated parameters with MarCpa are shown in Table 7.1. The table also
contains the results from the Expectation-Maximization (EM) learning algorithm
in [63] and those from the non-EM algorithm learning algorithm proposed in [39].
Among the three results, the estimated parameters from MarCpa look closer to the
ground truth parameters. To quantitatively compare the three estimation methods
and check their results with ground truth, Kolmogorov-Smirnov (K-S) tests are performed. Essentially, the K-S test compares a fitted distribution with a sample in
terms of cumulative distribution function. We extend the classic K-S test to measure
the difference of copulas so the temporal dependence goodness-of-fitting is evaluated
as well. We use the following two distances for measurement:
DM =
n
max
i=1
|u
i - ˆ u
i
|, (7.12)
DC =
n-1
max
i=1
|?
i -
ˆ
?
i
|. (7.13)
The critical value of K-S test D0.01
is determined by the size of samples, n. D0.01
for DM is calculated as D0.01
= 1.63/
v
n and for DC
is D0.01
= 1.63/
v
n - 1. If the
sample statistics DM and DC
are both equal to or smaller than the corresponding
critical value, the sample trace is accepted as one from estimated model; otherwise,
it is rejected as sample from estimated model. The K-S test results with parameters
estimated from three different methods plus ground truth parameters are listed in Table 7.2. Compared to the two state-of-art estimation methods, our proposed MarCpa
method has the closest values of DM and DC
to ground truth parameters, indicating
its estimation result is the closest to the ground truth. Moreover, MarCpa is the
only method that retrieves parameters passing the K-S test, implying that MarCpa
is the most effective algorithm to identify hidden MMPP from the trace. Therefore,
we conclude that the MMPP model estimated with our proposed MarCpa algorithm
105
performs the best to recover ground truth and characterize the sample trace.
Table 7.2: Kolmogorov-Smirnov test results on sample trace.
DM D0.01 DC D0.01
MarCpa 0.04317
0.05153
0.03975
0.05157
EM 0.05254 0.05834
non-EM 0.09414 0.10163
Ground truth 0.03734 0.04260
The running time of MarCpa, EM and non-EM algorithms are recorded in Table 7.3. All the three estimation algorithms run in a computer with Intel Core i7-2600
CPU @ 3.40GHz, 4.00 GB RAM. Among the three, the non-EM algorithm is fastest,
and the EM algorithm takes the longest time. While MarCpa is slightly slower than
the non-EM algorithm, MarCpa returns much more accurate results.
Table 7.3: Running time in seconds.
MarCpa EM non-EM
0.9305 2.0900 0.6119
7.4.2 Performance Evaluation Based on Average Goodnessof-Fitting and Running Time
In this section, we evaluate the effectiveness and stability of MarCpa method by
analysing its average performance results over multiple independent experiments. We
first consider a 3-state MMPP with following parameters:
Q =
?
?
?
-1 0.5 0.5
0.25 -0.50 0.25
0.05 0.05 -0.1
?
?
? ,
? = (10, 5, 1).
Thirty independent traces are generated from this 3-state MMPP: ten of them
are traces with a duration of 1000 units of time, ten are traces with a duration of
106
5000 units of time, and the rest ten are with a duration of 10000 units of time. The
arrivals counts are the number of arrivals in every 1 unit of time, i.e., ? = 1.
For each trace, the three methods, MarCpa, EM and non-EM, are applied to
learn MMPP parameters. The performance evaluation includes both goodness-offitting and running time. The goodness-of-fitting is measured by K-S distances DM
and DC
. The running times are converted with log
10
operation for easy illustration.
1000 5000 10000
0
0.1
0.2
0.3
0.4
Trace length
D
M
MarCpa EM non-EM
Figure 7.3: Performance in DM for 3-state MMPP traces.
1000 5000 10000
0
0.1
0.2
0.3
0.4
0.5
Trace length
D
C
MarCpa EM non-EM
Figure 7.4: Performance in DC
for 3-state MMPP traces.
Fig. 7.3, Fig. 7.4 and Fig. 7.5 show the average performance results of the three
methods in DM , DC
, and running time, respectively. In these figures, the results are
grouped by trace length. Ten results from traces with the same length are analysed in
their average and standard deviation, that is, each bar shows the average estimation
performance on ten traces with length indicated by horizontal axis, and the error
107
1000 5000 10000
-1
0
1
2
3
4
Trace length
Time in log
10
(s)
MarCpa EM non-EM
Figure 7.5: Performance in running time for 3-state MMPP traces.
bars represent the variation (variation in log
10
for running time) of ten independent
experiments.
The above experiments and evaluation are repeated on a 5-state MMPP with
parameters:
Q =
?
?
?
?
?
?
?
?
-1 0.25 0.25 0.25 0.25
0.2 -0.8 0.2 0.2 0.2
0.125 0.125 -0.5 0.125 0.125
0.075 0.075 0.075 -0.3 0.075
0.025 0.025 0.025 0.025 -0.1
?
?
?
?
?
?
?
?
,
? = (20, 15, 10, 5, 1).
1000 5000 10000
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Trace length
D
M
MarCpa EM non-EM
Figure 7.6: Performance in DM for 5-state MMPP traces.
108
1000 5000 10000
0
0.1
0.2
0.3
0.4
Trace length
D
C
MarCpa EM non-EM
Figure 7.7: Performance in DC
for 5-state MMPP traces.
1000 5000 10000
-1
0
1
2
3
4
5
Trace length
Time in log
10
(s)
MarCpa EM non-EM
Figure 7.8: Performance in running time for 5-state MMPP traces.
The estimation performances in DM , DC
, and running time of the three methods
on the 5-state MMPP are compared in Fig 7.6, Fig. 7.7 and Fig. 7.8, respectively.
Experiments on both the 3-state MMPP and the 5-state MMPP show that the proposed MarCpa algorithm has a stable and effective performance. Compared with the
EM algorithm, MarCpa has competitive (even better in the 5-state MMPP) goodnessof-fitting results but uses over 10 times less running time. Compared with the non-EM
algorithm, it takes a longer time but has much better fitting. Table 7.4 shows the
number of times that the estimated parameters pass K-S tests. Among the 30 independent experiments of either 3-state or 5-state MMPP, MarCpa is the best at
retrieving parameters that represent the MMPP trace well.
To conclude, the proposed MarCpa algorithm improves the learning accuracy over
both EM and non-EM algorithms, and improves the time-efficiency significantly over
109
Table 7.4: Ratio of experiments that pass K-S tests.
MarCpa EM non-EM
3-state MMPP experiments 14/30 13/30 0/30
5-state MMPP experiments 17/30 10/30 0/30
the EM algorithm.
7.5 Summary
This chapter proposed a new learning algorithm, called MarCpa, to estimate MMPP
parameters from arrival counts data. With copula theory, it is the first time that
functional dependence has been applied to estimate MMPP parameters. With extensive simulation evaluation, our proposed method outperforms existing methods by
improving estimation accuracy and by keeping running time small.
110
Chapter 8
Conclusions and Future Work
In this thesis, we investigate copula theory, and apply copula theory to analyse the
contemporaneous dependence between traffic flows and the temporal dependence in
one traffic flow. Our analytical results are applied in several application scenarios in
computer networks. In this chapter, we summarize the contributions of this thesis
and discuss future research directions.
8.1 Contemporaneous Dependence Modeling
In Chapter 3, we apply copula to model the contemporaneous dependence between
traffic flows. With a case study of Skype traffic flows, we show how to model the
contemporaneous dependence between network flows with copula, and how copula
disclose the dependence between flows in a novel way. With copula analysis, we
obtain tight and accurate models for aggregate flows, which further benefits statistical network calculus by tightening the performance bounds of network backlog and
queueing delay.
As the first work to explore copula analysis in stochastic network calculus, it is
expected to motivate a new spectrum of interests in extending SNC research and
further enhance its impact in practice. Along these lines, many interesting research
problems deserve further investigation. These include copula structures with different
sub-sampling techniques other than sliding windows, better dynamic scheduling and
multiplexing strategies aligning with the underlying changes of traffic flows, and new
types of copula structures tailored for specific network environment.
111
8.2 Temporal Dependence Modeling
In Chapter 4, we apply copula to model the temporal dependence in a traffic flow,
which is modeled as Markov Modulated Poisson process. We model the temporal
dependence of MMPP with copula by deriving the theoretical copula between arrival
counts in different time slots. Recursive algorithms are developed to compute the theoretical copula of superposition of multiple independence MMPPs. We also propose
the parametric copula modeling steps to model the temporal dependence of MMPP.
In Chapter 5, the temporal dependence of MMPP is applied for traffic prediction.
With numerous case studies, we show that copula-based dependence works effectively
to predict future arrivals of single MMPP flow, and future arrivals of superposition of
homogeneous/ heterogeneous MMPP flows. The accuracy and stable traffic prediction
demonstrates the power of copula modeling of temporal dependence.
In Chapter 6, we combine the contribution of the MMPP copula in Chapter 4 and
copula-based prediction in Chapter 5, and design a service provisioning policy based
on prediction of cloud future call arrivals. We study the call arrivals to composite
cloud service system approximated as MMPP. With the copula modeling temporal
dependence between call arrival counts and prediction made by copula, we can predict
future service demand. A collaborative auto-scaling policy is proposed to fulfill future
service demand and keep the cost low at the same time. With simulations, we show
that our collaborative auto-scaling policy based on temporal dependence modeling
outperforms traditional auto-scaling policy in which each component of composite
cloud system scales capacity independently.
Another application of MMPP copula is investigated in Chapter 7. The temporal
dependence in terms of copula is applied to estimate parameters of MMPP. On the
basis of analytical results of marginal distribution and copula of MMPP, we propose
a two-step matching algorithm to learn MMPP parameters from arrival counts. With
extensive evaluations, our proposed estimation method works better than the stateof-art methods in the sense that it improves estimation accuracy and keeps running
time small.
8.3 Future Work
In the future, the results of this thesis can be extended in several directions.
First, we can use copula-based temporal dependence to solve various challenges
112
in network domains. In our thesis, we mainly capture the temporal dependence in a
specific traffic model, MMPP. However, copula, has the potential to characterize the
temporal dependence of different types of traffic. Studying the temporal dependence
of different traffic types will benefit applications involving traffic model.
Second, another future work direction is to apply high order copula to model
temporal dependence for network traffic. This thesis uses 2-copula to model the
dependence between arrival counts in two time slots. Using high order copula, the
dependence among traffic from multiple time slots can be modeled and would be more
powerful and general for many applications.
Third, we can explore the power of copula models in other network applications.
For instance, when network is under attacks, the temporal dependence of traffic may
change. Modeling temporal dependence with copula, the abnormal traffic could be
differentiated from normal traffic. Thus copula analysis will help to identify different
traffic as well as detect network anomaly.
113
Bibliography
[1] Amazon. Auto scaling. http://aws.amazon.com/autoscaling/, Accessed in July
2015.
[2] Allan T Andersen and Bo Friis Nielsen. An application of superpositions of two
state markovian source to the modelling of self-similar behaviour. In Proceedings
of INFOCOM’97, pages 196–204, Kobe, Japan, 1997. IEEE.
[3] Allan T Andersen and Bo Friis Nielsen. A markovian approach for modeling
packet traffic with long-range dependence. IEEE Journal on Selected Areas in
Communications, 16(5):719–732, 1998.
[4] Tomasz Andrysiak and  Lukasz Saganowski. Network anomaly detection based
on statistical models with long-memory dependence. In Theory and Engineering
of Complex Systems and Dependability, pages 1–10. Springer, 2015.
[5] Kazim Azam and Michael K Pitt. Bayesian inference for a semi-parametric
copula-based Markov chain, 2014. Working paper.
[6] Soshant Bali and Victor S Frost. An algorithm for fitting MMPP to IP traffic
traces. IEEE Communications Letters, 11(2):207–209, 2007.
[7] Jerry Banks, John S Carson, Barry L Nelson, and David Nicol. Discrete event
system simulation. Prentice hall, 2010.
[8] Aaron K Baughman, Richard Bogdany, Benjie Harrison, Brian OConnell, Herbie Pearthree, Brandon Frankel, Cameron McAvoy, Sandy Sun, and Clay Upton. IBM predicts cloud computing demand for sports tournaments. Interfaces,
46(1):33–48, 2016.
114
[9] Michael A Beck, Sebastian A Henningsen, Simon B Birnbach, and Jens B
Schmitt. Towards a statistical network calculus–dealing with uncertainty in arrivals. In INFOCOM, 2014 Proceedings IEEE, pages 2382–2390. IEEE, 2014.
[10] Khalid Begain, Gunter Bolch, and Helmut Herold. Practical performance modeling: application of the MOSEL language. Springer Science & Business Media,
US, 2012.
[11] Vladislav B´ina and Radim Jirou?sek. A short note on multivariate dependence
modeling. Kybernetika, 49(3):420–432, 2013.
[12] Eric Bouy´e, Valdo Durrleman, Ashkan Nikeghbali, Ga¨el Riboulet, and Thierry
Roncalli. Copulas for finance-a reading guide and some applications. Available
at SSRN 1032533, 2000.
[13] Lothar Breuer and Alfred Kume. An EM algorithm for markovian arrival processes observed at discrete times. In Measurement, Modelling, and Evaluation
of Computing Systems and Dependability and Fault Tolerance, pages 242–258.
Springer, Berlin Heidelberg, 2010.
[14] Yulia Burkatovskaya, Tatiana Kabanova, and Sergey Vorobeychikov. CUSUM
algorithms for parameter estimation in queueing systems with jump intensity of
the arrival process. In Information Technologies and Mathematical ModellingQueueing Theory and Applications, pages 275–288. Springer, Switzerland, 2015.
[15] Cheng-Shang Chang. Performance guarantees in communication networks.
Springer, 2000.
[16] Tiberiu Chis and Peter G Harrison. Adapting hidden Markov models for online
learning. Electronic Notes in Theoretical Computer Science, 318:109–127, 2015.
[17] Doo Il Choi, Tae-Sung Kim, and Sangmin Lee. Analysis of an MMPP/G/1/K
queue with queue length dependent arrival rates, and its application to preventive congestion control in telecommunication networks. European Journal of
Operational Research, 187(2):652–659, 2008.
[18] Florin Ciucu, Almut Burchard, and J¨org Liebeherr. A network service curve
approach for the stochastic analysis of networks. In ACM SIGMETRICS Performance Evaluation Review, volume 33, pages 279–290. ACM, 2005.
115
[19] Florin Ciucu, Felix Poloczek, and Jens Schmitt. Sharp bounds in stochastic
network calculus. In Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems, pages 367–368.
ACM, 2013.
[20] Florin Ciucu and Jens Schmitt. Perspectives on network calculus: no free lunch,
but still good value. In Proceedings of the ACM SIGCOMM 2012 Conference on
Applications, Technologies, Architectures, and Protocols for Computer Communication, pages 311–322. ACM, 2012.
[21] Mark E Crovella and Azer Bestavros. Self-similarity in world wide web traffic:
evidence and possible causes. IEEE/ACM Transactions on Networking, 5(6):835–
846, 1997.
[22] Rene L Cruz. A calculus for network delay. I. network elements in isolation.
Information Theory, IEEE Transactions on, 37(1):114–131, Jan 1991.
[23] Rene L Cruz. A calculus for network delay. II. network analysis. Information
Theory, IEEE Transactions on, 37(1):132–141, Jan 1991.
[24] Tibor Cs´oka and Jaroslav Polec. Modeling Poisson error process on wireless
channels. International Journal of Communication Networks and Information
Security, 7(1):1–7, 2015.
[25] Anirban DasGupta. Poisson processes and applications. In Probability for Statistics and Machine Learning, pages 437–462. Springer, 2011.
[26] Jadran Dobri´c and Friedrich Schmid. Testing goodness of fit for parametric
families of copulasapplication to financial data. Communications in Statistics–
Simulation and Computation, 34(4):1053–1068, 2005.
[27] Valdo Durrleman, Ashkan Nikeghbali, Thierry Roncalli, et al. Which copula is
the right one, 2000. Working paper.
[28] Robert J Elliott and W Paul Malcolm. Discrete-time expectation maximization algorithms for Markov-modulated Poisson processes. IEEE Transactions on
Automatic Control, 53(1):247–256, 2008.
116
[29] Paul Embrechts, Filip Lindskog, and Alexander McNeil. Modelling dependence with copulas. Rapport technique, D´epartement de math´ematiques, Institut
F´ed´eral de Technologie de Zurich, Zurich, 2001.
[30] Markus Fidler and Jens B Schmitt. On the way to a distributed systems calculus: An end-to-end network calculus with data scaling. In ACM SIGMETRICS
Performance Evaluation Review, volume 34, pages 287–298. ACM, 2006.
[31] Wolfgang Fischer and Kathleen Meier-Hellstern. The Markov-modulated Poisson
process (MMPP) cookbook. Performance Evaluation, 18(2):149–171, 1993.
[32] Henry J Fowler and Will E Leland. Local area network characteristics, with
implications for broadband network congestion management. IEEE Journal on
Selected Areas in Communications, 9(7):1139–1149, 1991.
[33] Anshul Gandhi, Parijat Dube, Alexei Karve, Andrzej Kochut, and Li Zhang.
Adaptive, model-driven autoscaling for cloud applications. In Proceedings of
ICAC 14, pages 57–64. USENIX Association, 2014.
[34] Amir Hossein Gandomi, Xin-She Yang, Siamak Talatahari, and Amir Hossein
Alavi. Metaheuristic applications in structures and infrastructures. Newnes,
2013.
[35] Christian Genest and Anne-Catherine Favre. Everything you always wanted
to know about copula modeling but were afraid to ask. Journal of hydrologic
engineering, 12(4):347–368, 2007.
[36] Christian Genest, Bruno R´emillard, and David Beaudoin. Goodness-of-fit tests
for copulas: A review and a power study. Insurance: Mathematics and economics,
44(2):199–213, 2009.
[37] Hamoun Ghanbari, Bradley Simmons, Marin Litoiu, Cornel Barna, and Gabriel
Iszlai. Optimal autoscaling in an IaaS cloud. In Proceedings of ICAC ’12, pages
173–178. ACM, 2012.
[38] Mahmood Mollaei Gharehajlu, Saadan Zokaei, and Yousef Darmani. Statistical analysis of different traffic types effect on QoS of wireless ad hoc networks.
Journal of Information Systems & Telecommunication, 3(1(9)):7–15, 2015.
117
[39] Daniel P Heyman and David Lucantoni. Modeling multiple IP traffic streams
with rate limits. IEEE/ACM Transactions on Networking, 11(6):948–958, 2003.
[40] Ling Hu. Dependence patterns across financial markets: a mixed copula approach. Applied Financial Economics, 16(10):717–729, 2006.
[41] Alexander Ihler, Jon Hutchins, and Padhraic Smyth. Learning to detect events
with Markov-modulated Poisson processes. ACM Transactions on Knowledge
Discovery from Data, 1(3):13, 2007.
[42] Y.M. Jiang. Network calculus and queueing theory: Two sides of one coin. In
Proceedings of VALUETOOLS 2009, Pisa, Italy, Oct. 2009.
[43] Yuming Jiang. Stochastic network calculus for performance analysis of Internet
networks–an overview and outlook. In Computing, Networking and Communications (ICNC), 2012 International Conference on, pages 638–644. IEEE, 2012.
[44] Yuming Jiang and Yong Liu. Stochastic network calculus. Springer, 2008.
[45] Amin Jula, Elankovan Sundararajan, and Zalinda Othman. Cloud computing
service composition: A systematic literature review. Expert Systems with Applications, 41(8):3809–3824, 2014.
[46] Thomas Karagiannis, Mart Molle, and Michalis Faloutsos. Long-range dependence ten years of Internet traffic modeling. IEEE internet computing, 8(5):57–
64, 2004.
[47] Shoji Kasahara. Internet traffic modeling: Markovian approach to self-similar
traffic and prediction of loss probability for finite queues. IEICE Transactions
on Communications, 84(8):2134–2141, 2001.
[48] Pradeeban Kathiravelu and Luis Veiga. An expressive simulator for dynamic network flows. In Cloud Engineering (IC2E), 2015 IEEE International Conference
on, pages 311–316. IEEE, 2015.
[49] Krishna H Koirala, Ashok K Mishra, Jeremy M D’Antoni, and Joey E Mehlhorn.
Energy prices and agricultural commodity prices: Testing correlation using copulas method. Energy, 81:430–436, 2015.
118
[50] Krishna H Koirala, Ashok K Mishra, Joey Mehlhorn, et al. Using copula to
test dependency between energy and agricultural commodities. In 2014 Annual
Meeting, July 27-29, 2014, Minneapolis, Minnesota. Agricultural and Applied
Economics Association, 2014.
[51] Jean-Yves Le Boudec and Patrick Thiran. Network calculus: a theory of deterministic queuing systems for the Internet. Springer, 2001.
[52] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature,
521(7553):436–444, 2015.
[53] Ian WC Lee and Abraham O Fapojuwo. Stochastic processes for computer network traffic modeling. Computer Communications, 29(1):1–23, 2005.
[54] Chengzhi Li, Almut Burchard, and J¨org Liebeherr. A network calculus with
effective bandwidth. IEEE/ACM Transactions on Networking, 15(6):1442–1453,
2007.
[55] Ming Mao and Marty Humphrey. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In Proceedings of SC 2011, pages 1–12.
ACM, 2011.
[56] Microsoft. Azure service fabric. http://azure.microsoft.com/enus/campaigns/service-fabric//, Accessed in July 2015.
[57] Andrew W Moore and Konstantina Papagiannaki. Toward the accurate identification of network applications. In International Workshop on Passive and Active
Network Measurement, pages 41–54. Springer, 2005.
[58] Luca Muscariello, Marco Mellia, Michela Meo, M Ajmone Marsan, and R Lo
Cigno. Markov models of Internet traffic and a new hierarchical MMPP model.
Computer Communications, 28(16):1835–1851, 2005.
[59] Roger B. Nelson. An introduction to copulas. Springer, New York, 2006.
[60] David Neuh¨auser, Christian Hirsch, Catherine Gloaguen, and Volker Schmidt.
A parametric copula approach for modelling shortest-path trees in telecommunication networks. In Analytical and Stochastic Modeling Techniques and Applications, pages 324–336. Springer, 2013.
119
[61] Marcel F Neuts. Structured stochastic matrices of M/G/1 type and their applications. Taylor & Francis, New York, USA, 1989.
[62] Ant´onio Nogueira, Paulo Salvador, Rui Valadas, and Ant´onio Pacheco. Modeling
self-similar traffic through Markov modulated Poisson processes over multiple
time scales. In High-Speed Networks and Multimedia Communications, pages
550–560. Springer, Berlin Heidelberg, 2003.
[63] Hiroyuki Okamura, Tadashi Dohi, and Kishor S Trivedi. Markovian arrival process parameter estimation with group data. IEEE/ACM Transactions on Networking, 17(4):1326–1339, 2009.
[64] Sergio Pacheco-Sanchez, Giuliano Casale, Bryan Scotney, Sally McClean, Gerard Parr, and Stephen Dawson. Markovian workload characterization for QoS
prediction in the cloud. In 2011 IEEE International Conference on CLOUD,
pages 147–154, Washington, D.C., USA, 2011. IEEE.
[65] Andrew Patton. Copula methods for forecasting multivariate time series. Handbook of Economic Forecasting, 2:899–960, 2012.
[66] Andrew J Patton. Copula–based models for financial time series. In Handbook
of Financial Time Series, pages 767–785. Springer, 2009.
[67] Andrew John Patton. Applications of copula theory in financial econometrics.
PhD thesis, University of California, San Diego, 2002.
[68] Felix Poloczek and Florin Ciucu. A martingale-envelope and applications. ACM
SIGMETRICS Performance Evaluation Review, 41(3):43–45, 2014.
[69] Ali Rajabi and Johnny W Wong. MMPP characterization of web application
traffic. In 2012 IEEE 20th International Symposium on MASCOTS, pages 107–
114, Washington, D.C., USA, 2012. IEEE.
[70] Ali Rajabi and Johnny W Wong. Provisioning of computing resources for web
applications under time-varying traffic. In 2014 IEEE 22nd International Symposium on MASCOTS, pages 152–157, Paris, France, 2014. IEEE.
[71] Bruno R´emillard, Nicolas Papageorgiou, and Fr´ed´eric Soustra. Copula-based
semiparametric models for multivariate time series. Journal of Multivariate Analysis, 110:30–42, 2012.
120
[72] William JJ Roberts, Yariv Ephraim, and Elvis Dieguez. On Ryd´en’s EM algorithm for estimating MMPPs. IEEE Signal Processing Letters,, 13(6):373–376,
2006.
[73] Sheldon M. Ross. Introduction to probability models. Academic Press, Burlington,
2003.
[74] Nilabja Roy, Abhishek Dubey, and Aniruddha Gokhale. Efficient autoscaling in
the cloud using predictive models for workload forecasting. In Proceedings of
CLOUD, pages 500–507. IEEE, 2011.
[75] Tobias Ryd´en. Parameter estimation for Markov modulated Poisson processes.
Stochastic Models, 10(4):795–829, 1994.
[76] Tobias Ryd´en. An EM algorithm for estimation in Markov-modulated Poisson
processes. Computational Statistics & Data Analysis, 21(4):431–447, 1996.
[77] Paulo Salvador, Rui Valadas, and Ant´onio Pacheco. Multiscale fitting procedure
using Markov modulated Poisson processes. Telecommunication Systems, 23(1-2):123–148, 2003.
[78] Mischa Schwartz. Broadband integrated networks. Prentice Hall PTR New Jersey,
1996.
[79] Steven L Scott. Detecting network intrusion using a Markov modulated nonhomogeneous Poisson process. Available online, 2001.
[80] Shou-Kuo Shao, Malla Reddy Perati, Meng-Guang Tsai, Hen-Wai Tsao, and
Jingshown Wu. Generalized variance-based markovian fitting for self-similar
traffic modelling. IEICE Transactions on Communications, 88(4):1493–1502,
2005.
[81] Upendra Sharma, Prashant Shenoy, Sambit Sahu, and Anees Shaikh. A costaware elasticity provisioning system for the cloud. In Proceedings of ICDCS,
pages 559–570. IEEE, 2011.
[82] David Starobinski and Moshe Sidi. Stochastically bounded burstiness for communication networks. IEEE Trans. Information Theory, 46(1):206–212, Jan.
2000.
121
[83] Maarten RC Van Oordt and Chen Zhou. The simple econometrics of tail dependence. Economics Letters, 116(3):371–373, 2012.
[84] Luis M Vaquero, Luis Rodero-Merino, and Rajkumar Buyya. Dynamically scaling applications in the cloud. ACM SIGCOMM Computer Communication Review, 41(1):45–52, 2011.
[85] John Wilkes. More google cluster data. Google research blog, 2011.
[86] Ury Yechiali and Pinhas Naor. Queuing problems with heterogeneous arrivals
and service. Operations Research, 19(3):722–734, 1971.
[87] Tadafumi Yoshihara, Shoji Kasahara, and Yutaka Takahashi. Practical timescale fitting of self-similar traffic with Markov-modulated Poisson process.
Telecommunication Systems, 17(1-2):185–211, 2001.
[88] Ming Yu and Mengchu Zhou. A model reduction method for traffic described by
MMPP with unknown rate limit. IEEE Communications Letters, 10(4):302–304,
2006.
[89] Xinggong Zhang, Yang Xu, Hao Hu, Yong Liu, Zongming Guo, and Yao Wang.
Profiling skype video calls: Rate control and video quality. In INFOCOM, 2012
Proceedings IEEE, pages 621–629. IEEE, 2012.