PICS: Probabilistic Inference for ChIP-seq

Date

2011

Authors

Zhang, Xeukui
Robertson, G.
Krzywinski, M.
Droit, A.
Jones, S.
Gottardo, R.
Ning, Kaida

Journal Title

Journal ISSN

Volume Title

Publisher

Biometrics

Abstract

ChIP-seq, which combines chromatin immunoprecipitation with massively parallel short-read sequencing, can profile in vivo genome-wide transcription factor-DNA asso- ciation with higher sensitivity, specificity and spatial resolution than ChIP-chip. While it presents new opportunities for research, ChIP-seq poses new challenges for statistical analysis that derive from the complexity of the biological systems characterized and the variability and biases in its digital sequence data. We propose a method called PICS (Probabilistic Inference for ChIP-seq) for extracting information from ChIP-seq aligned-read data in order to identify regions bound by transcription factors. PICS identifies enriched regions by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent bind- ing events via a Bayesian hierarchical t-mixture model. Its per-event fragment length estimates also allow it to remove from analysis regions that have atypical lengths. PICS uses pre-calculated, whole-genome read mappability profiles and a truncated t- distribution to adjust binding event models for reads that are missing due to local genome repetitiveness. It estimates uncertainties in model parameters that can be used to define confidence regions on binding event locations and to filter estimates. Finally, PICS calculates a per-event enrichment score relative to a control sample, and can use a control sample to estimate a false discovery rate. We compared PICS to the alternative methods MACS, QuEST, and CisGenome, using published GABP and FOXA1 data sets from human cell lines, and found that PICS’ predicted binding sites were more consistent with computationally predicted binding motifs.

Description

Keywords

Bayesian hierarchical model, ChIP-seq, EM algorithm, Mappability, Missing values, Mixture model, Transcription factor, Truncated data, t-distribution

Citation

Zhang, X., Robertson, G., Krzywinski, M., Ning, K., Droit, A., Jones, S., & Gottardo, R. (2016). PICS: Probabilistic inference for ChIP-seq. Biometrics, 67(1):151-63. https://doi.org/10.1111/j.1541-0420.2010.01441.x