Random effects mixture models for clustering time series




Coke, Geoffrey Bryan

Journal Title

Journal ISSN

Volume Title



In this thesis, we study cluster analysis of time series and develop a new mixture model which is effective for such analysis. Our study is motivated by a real life problem of clustering time series of electricity load (demand) for BC Hydro customers. BC Hydro collects electricity load data for selected customers for the purpose of grouping customers into homogeneous classes in terms of the load. Such homogeneous classes or clusters are useful for rate setting and long term generation capacity planning. The BC Hydro data set that we use in this thesis contains 923 load series representing 923 BC Hydro customers. Each load series consists of repeated hourly load measurements over a one year period and thus is a long time series. There are a number of clustering methods in the literature for clustering general multivariate data but these are not effective for clustering such long time series. This is because time series such as the BC Hydro customer’s load series typically have high dimensions and special covariance structures. Existing clustering methods are not designed to accommodate these special characteristics. The contributions of this thesis are the following: We first develop a mixture model based clustering method for time series which cannot only handle their high dimensions but also makes effective use of their special covariance structures. Our method is based on the random effects mixture model, a mixture model which we develop specifically for time series. We devise a special EM algorithm based on the AECM algorithm of Meng and van Dyk (1997) to handle the computation of the random effects mixture model. Once the model is computed, we assign individual time series to clusters by their posterior probabilities of belonging to the components of the mixture model. Then to demonstrate the application of our method, we apply it to analyse BC Hydro data. We obtain a new clustering of the BC Hydro sample which is superior to the existing clustering in terms of relevance and interpretability.



Time series, Electricity