Heterogeneous Continuous Dynamic Bayesian Networks with Flexible Structure and Inter-Time Segment Information Sharing
Frank Dondelinger
[email protected] Biomathematics and Statistics Scotland, JCMB, EH9 3JZ, Edinburgh, UK Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh Sophie L` ebre
[email protected] UniversitΒ΄e de Strasbourg, LSIIT - UMR 7005, 67412 Illkirch, France Dirk Husmeier Biomathematics and Statistics Scotland, JCMB, EH9 3JZ, Edinburgh, UK
Abstract Classical dynamic Bayesian networks (DBNs) are based on the homogeneous Markov assumption and cannot deal with heterogeneity and non-stationarity in temporal processes. Various approaches to relax the homogeneity assumption have recently been proposed. The present paper aims to improve the shortcomings of three recent versions of heterogeneous DBNs along the following lines: (i) avoiding the need for data discretization, (ii) increasing the ο¬exibility over a time-invariant network structure, (iii) avoiding over-ο¬exibility and overο¬tting by introducing a regularization scheme based in inter-time segment information sharing. The improved method is evaluated on synthetic data and compared with alternative published methods on gene expression time series from Drosophila melanogaster.
1. Introduction There has recently been considerable interest in structure learning of dynamic Bayesian networks (DBNs), with a variety of applications in signal processing and computational biology; see e.g. (Robinson & Hartemink, 2009) and The standard (Grzegorczyk & Husmeier, 2009). assumption underlying DBNs is that time-series Appearing in Proceedings of the 27 π‘β International Conference on Machine Learning, Haifa, Israel, 2010. Copyright 2010 by the author(s)/owner(s).
[email protected]
have been generated from a homogeneous Markov process. This assumption is too restrictive in many applications and can potentially lead to erroneous conclusions. In the recent past, various research efforts have therefore addressed this issue and proposed more ο¬exible models. (Robinson & Hartemink, 2009) proposed a discrete heterogeneous DBN, which allows for diο¬erent structures in diο¬erent segments of the time series, with a regularization term penalizing diο¬erences among the structures. (Grzegorczyk & Husmeier, 2009) proposed a continuous heterogeneous DBN, in which only the parameters are allowed to vary, with a common network structure providing information sharing among the time series segments. (L`ebre, 2007) proposed an alternative continuous heterogeneous DBN, which is more ο¬exible in that it allows the network structure to vary among the segments. The model proposed in (Ahmed & Xing, 2009) and (Kolar et al., 2009) can be regarded as a heterogeneous DBN where inference is based on sparse L1-regularized regression (LASSO) of the interaction parameters, and a second L1 regularization term penalizes diο¬erences between networks associated with diο¬erent segments. Parameter estimation in (Ahmed & Xing, 2009) and (Kolar et al., 2009) is based on penalized maximum likelihood for ο¬xed regularization parameters, which are optimized using BIC or cross-validation. In the present paper, we follow (Robinson & Hartemink, 2009), (Grzegorczyk & Husmeier, 2009) and (L`ebre, 2007) to infer the network structure, the interaction parameters, as well as the number and location of changepoints in a Bayesian framework by sampling
Heterogeneous Continuous Dynamic Bayesian Networks
them from the posterior distribution with RJMCMC (Green, 1995). The objective of our paper is to propose a model that addresses the principled shortcomings of the three Bayesian methods mentioned above. Unlike (Robinson & Hartemink, 2009), our model is continuous and therefore avoids the information loss inherent in a discretization of the data. Unlike (Grzegorczyk & Husmeier, 2009), our model allows the network structure to change among segments, leading to greater model ο¬exibility. As an improvement on (L`ebre, 2007), our model introduces information sharing among time series segments, which provides an essential regularization eο¬ect.
2. Background This paragraph summarizes brieο¬y the heterogeneous DBN proposed in (L`ebre, 2007). The model is based on the ο¬rst-order Markov assumption. This assumption is not critical, though, and a generalization to higher orders is straightforward. The value that a node in the graph takes on at time π‘ is determined by the values that the nodeβs parents take on at the previous time point, π‘β1, as well as the time series segment. More speciο¬cally, the conditional probability of the observation associated with a node at a given time point is a conditional Gaussian distribution, where the conditional mean is a linear weighted sum of the parent values at the previous time point, and the weights themselves depend on the time series segment. The latter dependence adds extra ο¬exibility to the model and thereby relaxes the homogeneity assumption. The interaction weights, the variance parameters, the number of potential parents, the location of changepoints demarcating the time series segments, and the number of changepoints are given (conjugate) prior distributions in a hierarchical Bayesian model. For inference, all these quantities are sampled from the posterior distribution with RJMCMC. Note that a complete speciο¬cation of all node-parent conο¬gurations determines the structure of a regulatory network: each node receives incoming directed edges from each node in its parent set. In what follows, we will refer to nodes as genes and to the network as a gene regulatory network. The method is not restricted to molecular systems biology, though. 2.1. Model Multiple changepoints. Let π be the number of observed genes, whose expression values π¦ = {π¦π (π‘)}1β€πβ€π,1β€π‘β€π are measured at π time points. β³ represents a directed graph, i.e. the network deο¬ned by a set of directed edges among the π genes. β³π is
the subnetwork associated with target gene π, determined by the set of its parents (nodes with a directed edge feeding into gene π). The regulatory relationships among the genes, deο¬ned by β³, may vary across time, which we model with a multiple changepoint process. For each target gene π, an unknown number ππ of changepoints deο¬ne ππ + 1 non-overlapping segments. Segment β = 1, .., ππ +1 starts at changepoint ππββ1 and stops before ππβ , where ππ = (ππ0 , ..., ππββ1 , ππβ , ..., ππππ +1 ) with ππββ1 < ππβ . To delimit the bounds, ππ0 = 2 and ππππ +1 = π + 1. Thus vector ππ has length β£ππ β£ = ππ + 2. The set of changepoints is denoted by π = {ππ }1β€πβ€π . This changepoint process induces a partition of the time series, π¦πβ = (π¦π (π‘))πββ1β€π‘ 0. For all π β= 0, πβππ = 0 if π β For all genes π, for all time points π‘ in segment β (ππββ1 β€ π‘ < ππβ ), random variable ππ (π‘) depends on the π variables {ππ (π‘ β 1)}1β€πβ€π according to β ππ (π‘) = πβπ0 + πβππ ππ (π‘ β 1) + ππ (π‘) (1) β πββ³π
where the noise ππ (π‘) is assumed to be Gaussian with mean 0 and variance (ππβ )2 , ππ (π‘) βΌ π (0, (ππβ )2 ). 2.2. Prior The ππ + 1 segments are delimited by ππ changepoints, where ππ is distributed a priori as a truncated Poisson random variable with mean π and maximum π = π β2: ππ π (ππ β£π) β πππ ! 1l{ππ β€π} . 1 Conditional on ππ changepoints, the changepoint positions vector ππ = (ππ0 , ππ1 , ..., ππππ +1 ) takes nonoverlapping integer values, which we take to be uniformly distributed a priori. There are (π β 2) possible positions for the ππ changepoints, ) thus vector ππ has ( prior density π (ππ β£ππ ) = 1/ π β2ππ . For all genes π and all segments β, the number π βπ of parents for node π 1
A restrictive Poisson prior encourages sparsity of the network, and is therefore comparable to a sparse exponential prior, or an approach based on the LASSO.
Heterogeneous Continuous Dynamic Bayesian Networks
follows a truncated Poisson distribution with mean Ξ and maximum π = 5: β
Ξ π π π (π βπ β£Ξ) β β 1l{π βπ β€π } . π π !
(2)
Conditional on π βπ , the prior for the parent set β³βπ is a uniform distribution over all parent sets with cardiΒ― nality π βπ : π (β³βπ Β―β£β³βπ β£ = π βπ ) = 1/( ππ β ). The overall π prior on the network structures is given by marginalization: βπ π (β³βπ β£Ξ) = π (β³βπ β£π βπ )π (π βπ β£Ξ) (3) β π π =1
Conditional on the parent set β³βπ of size π βπ , the π βπ + 1 regression coeο¬cients, denoted by πβ³βπ = (πβπ0 , (πβππ )πββ³βπ ), are assumed zero-mean multivariate β 2 Gaussian with covariance matrix (π βπ ) Ξ£β³βπ , β 1
π (πβπ β£β³βπ , ππβ )=β£2π(ππβ )2 Ξ£β³β β£β 2 expββ π
πβ β³β Ξ£β1 π β β³β β³ π
π
π
2(ππβ )2
β ,
where the symbol β denotes matrix transposition, β β Ξ£β³βπ = πΏ β2 π·β³ β (π¦)π·β³β (π¦) and π·β³β (π¦) is the (ππ β π π π
ππββ1 )Γ(π βπ +1) matrix whose ο¬rst column is a vector of 1 (for the constant in model (1)) and each (π +1)π‘β column contains the observed values (π¦π (π‘))πββ1 β€π‘