## José M. Bernardo, M. J. Bayarri, James O. Berger, A. P. Dawid, David Heckerman, Adrian F. M. Smith, and Mike West

Print publication date: 2011

Print ISBN-13: 9780199694587

Published to Oxford Scholarship Online: January 2012

DOI: 10.1093/acprof:oso/9780199694587.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use. Subscriber: null; date: 10 May 2021

# Parameter Inference for Stochastic Kinetic Models of Bacterial Gene Regulation: A Bayesian Approach to Systems Biology

Chapter:
(p.679) Parameter Inference for Stochastic Kinetic Models of Bacterial Gene Regulation: A Bayesian Approach to Systems Biology
Source:
Bayesian Statistics 9
Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780199694587.003.0023

# Abstract and Keywords

Bacteria are single‐celled organisms which often display heterogeneous behaviour, even among populations of genetically identical cells in uniform environmental conditions. Markov process models arising from the theory of stochastic chemical kinetics are often used to understand the genetic regulation of the behaviour of individual bacterial cells. However, such models often contain uncertain parameters which need to be estimated from experimental data. Parameter estimation for complex high‐dimensional Markov process models using diverse, partial, noisy and poorly calibrated time‐course experimental data is a challenging inferential problem, but a computationally intensive Bayesian approach turns out to be effective. The utility and added‐ value of the approach is demonstrated in the context of a stochastic model of a key cellular decision made by the gram‐positive bacterium Bacillus subtilis, using quantitative data from single‐cell fluorescence microscopy and flow cytometry experiments.

Summary

Bacteria are single‐celled organisms which often display heterogeneous behaviour, even among populations of genetically identical cells in uniform environmental conditions. Markov process models arising from the theory of stochastic chemical kinetics are often used to understand the genetic regulation of the behaviour of individual bacterial cells. However, such models often contain uncertain parameters which need to be estimated from experimental data. Parameter estimation for complex high‐dimensional Markov process models using diverse, partial, noisy and poorly calibrated time‐course experimental data is a challenging inferential problem, but a computationally intensive Bayesian approach turns out to be effective. The utility and added‐ value of the approach is demonstrated in the context of a stochastic model of a key cellular decision made by the gram‐positive bacterium Bacillus subtilis, using quantitative data from single‐cell fluorescence microscopy and flow cytometry experiments.

Keywords and Phrases: Bacillus subtilis; GENETIC REGULATION; GFP; LIKELIHOOD‐FREE MCMC; MOTILITY; TIME‐LAPSE FLUORESCENCE MICROSCOPY.

# 1. Introduction

Bacteria are single‐celled prokaryotic organisms. Despite being relatively simple organisms, they often display complex heterogeneous behaviour, even among populations of genetically identical cells in uniform environmental conditions (Wilkinson 2009). Markov process models arising from the theory of stochastic chemical kinetics (Wilkinson 2006) are often used to understand the genetic regulation of the behaviour of individual bacterial cells. However, such models often contain uncertain parameters which need to be estimated from experimental data. Parameter estimation for complex high‐dimensional Markov process models using diverse, partial, noisy and poorly calibrated time‐course experimental data is a challenging (p.680) inferential problem, but several previous studies have demonstrated that progress is possible (Golightly and Wilkinson 2005, 2006, Boys et al., 2008, Henderson et al., 2009). It will be demonstrated here that a computationally intensive Bayesian approach can, in principle, be effective for understanding the information in the data regarding plausible parameter values. The utility and added‐value of the approach will be demonstrated in the context of a stochastic model of a key cellular decision, the decision to become motile, made by the gram‐positive bacterium Bacillus subtilis. The inferential issues will be illustrated using simulated data based on single‐cell fluorescence microscopy and flow cytometry experiments.

# 2. Bacterial Gene Regulation

## 2.1. Bacillus Subtilis

Bacillus subtilis (Sonenshein et al., 2002) is the most widely studied model gram positive bacterium. It is relatively easy to culture in the lab, and is highly genetically tractable, being naturally competent for genetic transformation (Dubnau 1991). It was the first gram positive bacterium to be sequenced, and its genome is relatively well characterized (Moszer et al., 2002). B. subtilis has a relatively interesting life cycle, and must make expensive cellular decisions on the basis of the information it has regarding its environment. The default behaviour for a B. subtilis cell in a rich nutrient environment is to grow and divide, but in response to certain stresses it may choose to become competent for genetic transformation (Dubnau 1991), sporulate (Errington 1993), or become motile (Kearns and Losick 2005).

## 2.2. Motility Regulation

One of the key decisions a B. subtilis cell must make is whether or not to grow flagella and become motile (Kearns and Losick 2005), leading to the possibility of swimming away from its current location to a new and better environment. Like most other decision systems in living organisms, the precise details of how this decision is made is extremely complex. In this paper we will focus on one small aspect of this problem, in order to illustrate the important concepts without getting lost in biological complexity.

Bacteria typically use special proteins called σ factors in order to regulate transcription. Most genes cannot be transcribed (are turned off) unless an appropriate σ factor is available. The B. subtilis sigma factor σ D is key for the regulation of motility. Many of the genes and operons encoding motility‐related proteins are governed by this σ factor, and so understanding its regulation is key to understanding the motility decision. The gene for σ D is embedded in a large operon containing several other motility‐related genes, known as the fla/che operon. The fla/che operon itself is under the control of another σ factor, σ A, but is also regulated by other proteins. In particular, transcription of the operon is strongly repressed by the protein CodY, which is encoded upstream of fla/che. CodY inhibits transcription by binding to the fla/che promoter. Since CodY is upregulated in good nutrient conditions, this is thought to be a key mechanism for motility regulation.

As previously mentioned, many motility‐related genes are under the control of σ D. For simplicity we focus here on one such gene, hag, which encodes the protein flagellin (or Hag), the key building block of the flagella. It so happens that hag is also directly repressed by CodY. The regulation structure can be illustrated using the simple schematic given in Figure 1. It should be emphasized that this is (p.681)

Figure 1: A small component of the regulation of motility in B. subtilis

only one small component of the regulation of motility, and that a great deal more is known about the complex regulation of motility than is presented here. However, the aspect presented here is sufficient to illustrate the essential statistical issues.

# 3. Modelling And Inference

## 3.1. Stochastic Kinetic Models

Computational systems biology (Kitano 2002) is concerned with developing dynamic simulation models of biological processes such as the motility regulation network model previously described. Such models are useful for developing a quantitative understanding of the process, for testing current understanding of the mechanisms, and to allow in silico experimentation that would be difficult or time consuming to carry out on the real system in the lab. Traditionally, continuous deterministic models were developed, typically using an assumption of mass‐action chemical kinetics leading to systems of ordinary differential equations. However, in recent years there has been increasing recognition of the importance of modelling intrinsic stochasticity in intra‐cellular biological processes, not captured by the traditional approaches (Wilkinson 2009). The theory of stochastic chemical kinetics forms the basis of a more realistic class of models, which models cellular dynamics using a Markov jump process (Wilkinson 2006).

For mass‐action stochastic kinetic models, it is assumed that the state of the system at a given time is represented by the number of molecules of each reacting chemical “species” present in the system at that time, and that the state of the system is changed at discrete times according to one or more reaction “channels”. We assume there are u species denoted 𝒳1,…,𝒳u, and v reactions, ℛ1,…,ℛv. Each reaction ℛi is of the form

$Display mathematics$

Here p ij denotes the number of molecules of Xj that will be consumed by reaction ℛi, and q ij the number of molecules produced. Let P be the v × u matrix formed from the p ij and Q be the corresponding matrix of the q ij. We can write the entire reaction system in matrix/vector form as

$Display mathematics$

The matrices P and Q are typically sparse, and this fact can be exploited in computational algorithms. The u × v matrix S = (QP) is the stoichiometry matrix of the system, and is especially important in computational analysis of stochastic kinetic models, as its columns encode the change of state in the system caused by the different reaction events. Let X jt denote the number of molecules of X j at time t (p.682) and X t = (X 1t,…,X ut). We assume that reaction ℛi has hazard (or rate law, or propensity) h i(X t, c i), where c i is a rate parameter. We put c = (c 1,…, c v) and h(X t, c) = (h 1(X t, c 1),…, h v(X t, c v)) in order to simplify notation. Under certain assumptions (Gillespie 1992), it can be shown that the system evolves as a Markov jump process with independent reaction hazards for each reaction channel. Further, for mass‐action stochastic kinetics, the algebraic form of each rate law is given as

$Display mathematics$

Hence, given a reaction network structure, the vector of reaction rate constants, c, determines the stochastic behaviour of the system.

A mathematical representation of this Markov jump process can be constructed, known as the random time change representation (Kurtz 1972), which turns out to be very helpful for mathematical analysis of the system. Let R it denote the number of reactions of type ℛi in the time window (0,t], and then define R t = (R 1t, …, R vt). It should be clear that X tX 0 = SR t (this is known as the state updating equation). Now for i = 1,…, v, define N i(t) to be the count functions for v independent unit Poisson processes. Then

$Display mathematics$

Putting N(t 1,…, t v) = (N 1(t 1),…, N v(t v)), we can write

$Display mathematics$

to get

$Display mathematics$

the random time‐change representation of the Markov jump process. See Ball et al. (2006) for applications of this representation to analysis of approximate system dynamics.

This process is typically nonlinear with unbounded state space. Consequently the models are generally analytically intractable, but realizations of the model can be simulated exactly using a computer, using a discrete event simulation algorithm, known in this context as the Gillespie algorithm (Gillespie 1977). The inference, or inverse problem, is to determine plausible values for the rate constants, c, from partial, discrete and noisy observations of the system state.

## 3.2. Bayesian Inference for Complex Markov Process Models

Concepts and notation. At some level, inference for complex Markov process models is not fundamentally more difficult than for many other high‐dimensional nonlinear statistical models. Given complete information about the trajectory of the process over a given fixed time window, the likelihood of the process can be computed exactly. If we observe the process x = {x(t) : t ∈ [0,T]} where x(t) represents (p.683) the values of X t for one particular (observed) realization of the stochastic process, we can determine from the reaction structure the time (t i) and type (ν t) of the n reaction events occurring in the time interval (0,T]. Suppose that the ith reaction event is (t i, ν i), i = 1,… n. Also define t 0 = 0, t n+1 = T. Let r j be the total number type j events occurring $( so n= ∑ j = 1 v r j )$. Then the complete‐data likelihood for the observed sample path is

$Display mathematics$

See Chapter 10 of Wilkinson (2006) for further details. Note that the integral occurring in the above equation is just a finite sum, so there are no computational issues associated with evaluating it (though as usual, it is numerically advantageous to actually work with the log of the likelihood).

There are further simplifications which arise for rate laws of the form h i(x, c i) = c i g i (x) (true for basic mass‐action stochastic kinetic models), as then the complete‐ data likelihood factorises as

$Display mathematics$

These component likelihoods are semi‐conjugate to priors of the form c j ~ Γ(a j, b j) and hence can be combined to get full‐conditional posterior distributions of the form

$Display mathematics$

All of the inferential complications arise from the fact that, in practise, we cannot hope to observe the system perfectly over any finite time window. Observations of the system state will typically occur at discrete times, will usually be partial (not all species in the model will be measured), and will often be subject to measurement error. This data‐poor scenario leads to a challenging missing‐data problem. Consider first the best‐case scenario—perfect observation of the system at discrete times. Conditional on discrete‐time observations, the Markov process breaks up into a collection of independent bridge processes that appear not to be analytically tractable. We can attempt to use MCMC to explore sample paths consistent with the end‐points of the random intervals. Considering just one interval, we need to explore r t consistent with x t+1x t= Sr t. Both reversible jump and block‐updating strategies are possible—see Boys et al. (2008) for details, but these standard MCMC techniques do not scale well to large, complex models with very large numbers of reaction events.

One way forward is to approximate the true Markov jump process by a diffusion process, known in this context as the chemical Langevin equation (CLE) (Gillespie 2000). Then techniques for Bayesian estimation of stochastic differential equation models can be applied (Golightly and Wilkinson 2005, 2006, 2008), but this approach (p.684) too is far from straightforward, and for many interesting problems the diffusion approximation will be unsatisfactory.

Likelihood‐free MCMC. One of the problems with the above approaches to inference in realistic data‐poor scenarios is the difficulty of developing algorithms to explore a huge (discrete) state space with a complex likelihood structure that makes conditional simulation difficult. Such problems arise frequently, and in recent years interest has increasingly turned to methods which avoid some of the complexity of the problem by exploiting the fact that we are easily able to forward‐simulate realizations of the process of interest. Methods such as likelihood‐free MCMC (LF‐ MCMC) (Marjoram et al., 2003) and Approximate Bayesian Computation (ABC) (Beaumont et al., 2002) are now commonly used to tackle problems which would be extremely difficult to solve otherwise.

A likelihood‐free approach to this problem can be constructed as follows. Let π(x ǀ c) denote the (complex) likelihood of the simulation model. Let π(Ɗǀx,τ) denote the (simple) measurement error model, giving the probability of observing the data Ɗ given the output of the stochastic process and some additional parameters, τ. Put θ = (c, τ), and let π(θ) be the prior for the model parameters. Then the joint density can be written

$Display mathematics$

Suppose that interest lies in the posterior distribution π(θ,xǀƊ). A Metropolis‐ Hastings scheme can be constructed by proposing a joint update for θ and x as follows. Supposing that the current state of the Markov chain is (θ, x), first sample a proposed new value for θ, θ*, by sampling from some (essentially) arbitrary proposal distribution f(θθ). Then, conditional on this newly proposed value, sample a proposed new sample path, x* by forwards simulation from the model π(x* θ*). Together the newly proposed pair (θ*,x*) is accepted with probability min{1, A}, where

$Display mathematics$

Crucially, the potentially problematic likelihood term, π(x ǀ θ) does not occur in the acceptance probability, due to the fact that a sample from it was used in the construction of the proposal. Note that choosing an independence proposal of the form f(θ* ǀ θ) = π(θ*) leads to the simpler acceptance ratio

$Display mathematics$

This “canonical” choice of proposal also lends itself to more elaborate schemes, as we will consider shortly.

This “vanilla” LF‐MCMC scheme should perform reasonably well provided that Ɗ is not high‐dimensional, and there is sufficient “noise” in the measurement process to make the probability of acceptance non‐negligible. However, in practice Ɗ is often of sufficiently large dimension that the overall acceptance rate of the scheme is intolerably low. In this case it is natural to try and “bridge” between the prior and the posterior with a sequence of intermediate distributions. There are several ways to do this, but here it is most natural to exploit the Markovian nature of the process and consider the sequence of posterior distributions obtained as each (p.685) additional time point is observed. For notational simplicity consider equispaced observations at integer times and define the data up to time t as Ɗt = {d 1, …, d t}. Similarly, define sample paths xt ≡ {x s ǀt ‐1 〈 st}, t = 1,2,…, so that x = {x1,x2,…}. The posterior at time t can then be computed inductively as follows.

1. (i) Assume at time t we have a (large) sample from π(θ,x tǀƊt) (for time 0, initialise with sample from prior).

2. (ii) Run an MCMC algorithm which constructs a proposal in two stages:

1. (a) First sample $( θ ∗ , x t ∗ ) ∼ π ( θ , x t ∣ D t )$ by picking at random and perturbing θ* slightly (sampling from a kernel density estimate of the distribution).

2. (b) Next sample $x t + 1 ∗$ by forward simulation from $π ( x t + 1 ∗ | θ ∗ , x t ∗ )$.

3. (c) Accept/reject $( θ ∗ , x t + 1 ∗ )$ with probability min{1, A} where

$Display mathematics$

3. (iii) Output the sample from π(θ, x t+1 ǀ Ɗt+1), put t : = t + 1, return to step 2.

Consequently, for each observation d t, an MCMC algorithm is run which takes as input the current posterior distribution prior to observation of d t and outputs the posterior distribution given all observations up to d t. As d t is typically low‐ dimensional, this strategy usually leads to good acceptance rates.

It is worth emphasizing the generality of this algorithm. Although we are here applying it to stochastic kinetic models, it is applicable to any Markov process discretely observed with error. It is also trivially adaptable to non‐uniform observations, and to observation of multiple independent time courses (the posterior distribution from one time course can be used to form the prior distribution for the next). It is also adaptable to data from multiple models which share many parameters—an important scenario in systems biology, as we shall see later.

CaliBayes. The sequential likelihood‐free algorithm described above can be implemented in a reasonably generic manner. The resulting algorithms are very powerful, but exceptionally computationally intensive. It is therefore natural to want to exploit powerful remote computing resources connected to a local machine via the Internet. CaliBayes (http://www.calibayes.ncl.ac.uk) is an example of such a remote facility. Simulation models (either deterministic or stochastic) are encoded using the Systems Biology Markup Language (SBML) (Hucka et al., 2003), and these are sent to the remote server together with a large sample from the prior distribution and the experimental data. When the computations are completed, a large sample from the posterior distribution is returned to the user. The CaliBayes system uses a service‐oriented architecture (SOA), and makes use of modern web‐ service technology—further details are provided in Chen et al. (2010). The forward simulation of SBML models is carried out using third‐party simulators such as CO‐ PASI (Hoops et al., 2006), FERN (Erhard et al., 2008) or BASIS (Kirkwood et al., 2003), and these may be specified by the user. An R package (calibayesR) which provides a user‐friendly interface to most of the CaliBayes services is available from R‐forge (http://r-forge.r-project.org).

(p.686) Approximate Bayesian computation. There is a close connection between LF‐ MCMC methods and those of approximate Bayesian computation (ABC). Consider first the case of a perfectly observed system, so that there is no measurement error model. Then there are model parameters θ described by a prior π(θ), and a forwards‐ simulation model for the data Ɗ, defined by π (Ɗ ǀ θ). It is clear that a simple algorithm for simulating from the desired posterior π(θ ǀ Ɗ) can be obtained as follows. First simulate from the joint distribution π(θ,Ɗ) by simulating θ* ~ π(θ) and then Ɗ* ~ π(Ɗ ǀ θ*). This gives a sample (θ*,Ɗ*) from the joint distribution. A simple rejection algorithm which rejects the proposed pair unless Ɗ* matches the true data Ɗ clearly gives a sample from the required posterior distribution. However, in many problems this will lead to an intolerably high rejection rate. The “approximation” is to accept values provided that Ɗ* is “sufficiently close” to Ɗ. In the simplest case, this is done by forming a (vector of) summary statistic(s), s(Ɗ*) (ideally a sufficient statistic), and accepting provided that ǀs(Ɗ*) − s(Ɗ)ǀ 〈 ε for some suitable choice of metric and ε (Beaumont et al., 2002). However, in certain circumstances this “tolerance”, ε, can be interpreted as a measurement error model (Wilkinson 2008), and for problems involving large amount of data, ABC may be applied sequentially (Sisson et al., 2007). Sequential ABC approaches have been applied to systems biology problems by Toni et al. (2009). Further, it is well known that ABC approaches can be combined with MCMC to get approximate LF‐MCMC schemes (Marjoram et al., 2003).

# 4. Motility Regulation Model

## 4.1. Model Structure

The essential relationships central to the model for motility regulation depicted in Figure 1 can be translated into a set of biochemical reactions as given in Table 1.

Table 1: Basic reaction structure for the motility regulation model.

 codY →codY + CodY CodY →∅ flache → flache+ SigD SigD →∅ SigD_hag → SigD + hag + Hag Hag →∅ SigD + hag → SigD_hag SigD_hag → SigD + hag CodY + flache → CodY flache CodY_flache → CodY + flache CodY + hag → CodY_hag CodY_hag → CodY + hag

The usual convention of starting names of genes with lower case letters and the corresponding proteins with upper case letters has been adopted. Again note that for illustrative purposes, many simplifications have been made in this model. In particular, the processes of transcription, translation, folding and protein maturation have been collapsed into a single reaction step.

(p.687) Given specification of the initial conditions of the system and all reaction rate constants, it is straightforward to simulate realizations from the associated Markov jump process model using the Gillespie algorithm. A typical trajectory starting from zero protein molecules is given in Figure 2. We can use simulated trajectories of this nature in order to understand the associated inferential problem. Again, to keep the problem as simple as possible, we will assume that just three rate constants are uncertain, and that these are the object of inference, using appropriate time course data. The three “unknowns” and their corresponding true values are

$Display mathematics$

They correspond to the maximal rate of production of SigD, and the binding and unbinding of CodY to the fla/che operon, respectively. These are plausibly the parameters of greatest scientific interest in the context of this model. The specification of sensible prior distributions for rate constants is a non‐trivial problem (Liebermeister and Klipp, 2005), but here we will adopt independent finite uniform priors on the log scale, as these have proven to be useful in applied work (Henderson et al., 2010):

$Display mathematics$

These priors cover two orders of magnitude either side of the true value, and hence represent very vague prior knowledge.

Figure 2: A typical realization of the motility model.

## 4.2. Single‐cell time course data

Observation of σ D. We will start by assuming that it is possible to directly observe the number of molecules of σ D in a single cell over time. Observations will be (p.688) made every 5 minutes (300 seconds) for 2 hours (7,200 seconds) giving a total of 24 observations. We make the simplifying (and unrealistic) assumption that the initial state of the cell is known. We assume that the measurements are subject to a small amount of measurement error that is I.I.D. Gaussian with a known standard deviation of 10 molecules.

It is straightforward to apply the LF‐MCMC algorithm described in Section 3.2 to this problem. Here, 1,000,000 particles were used, together with a burn‐in of 1,000 iterations and a thin of 5, so that, in total, 5,001,000 MCMC iterations are performed per observation. These figures were sufficient to give adequate coverage and low autocorrelations in the particle chain. The marginal posterior distributions for the three parameters of interest are shown in Figure 3 (top). The [5%, 50%, 95%] quan‐ tiles of the marginal distributions for kSigDprod, kflacherep and kflacheunrep are [−0.13, 0.90, 2.66], [−5.93, −1.97, 0.45] and [−4.86, −1.72, 1.07], respectively. It is clear that there is a great deal of information in the data regarding the likely value of kSigDprod, the maximum rate of production of σ D, but apparently much less about the other two parameters. This is somewhat misleading, as the two parameters are partially confounded and have high posterior correlation as shown in Figure 3 (middle). The data therefore clearly contains a reasonable amount of information about all three parameters. Figure 3 (bottom) shows in grey 90% equitailed pointwise posterior predictive probability intervals for the key model species, with the (unknown, unobserved) true values overlaid. Clearly the interval for σ D is tight around the true values, as this is the observed species, but the other two species are also reasonably well identified by the observed data (and the model). Note that if further information is required, pooling observations from multiple cells is straightforward, as the parameter posterior from one cell can be used as the parameter prior for the next in a natural sequential manner.

Observation of Hag. It turns out not to be completely straightforward to observe levels of σ D directly, partly because the σ D gene is embedded in the middle of the large fla/che operon. Before examining in detail exactly how measurements are typically made, it is instructive to consider observation of Hag, which has its own promoter, and is strongly activated by σ D. We consider the same observation protocol as above, but this time use (noisy) measurements of Hag levels in order to make inferences about the three key unknowns.

The marginal posterior distributions for the three parameters of interest given data on Hag are shown in Figure 4 (top). The [5%, 50%, 95%] quantiles of the marginals are [0.29, 1.76, 3.61], [−6.32, −2.26, 0.41] and [−6.58, −4.01, −0.32], respectively. These inferences are broadly consistent with the inference obtained by observing σ D, but there is less information in the Hag data than in the corresponding data for σ D.

Time‐lapse microscopy and GFP reporters. In fact, it turns out not to be straightforward to accurately measure any native protein directly. To observe and track gene expression in single living cells over time, some kind of reporter system is typically employed. Although there are alternatives, fluorescent reporters are often used, with green fluorescent protein GFP being the most common. GFP was originally isolated from a jellyfish, and can be detected in single living cells with a fluorescence camera attached to a powerful microscope if the cells are first exposed to UV light. The gene for GFP, gfp, has to be integrated into the host genome in such a way as to try to make the levels of mature GFP correlate strongly with the levels of the target protein of interest. This often turns out to be technically (p.689)

Figure 3: Top: Marginal posterior distributions for the (log of the) three parameters of interest, based on 24 observations of σ D. True value shown as a vertical line. Middle: Contour plot of the bivariate posterior distribution of the (log of the) fla/che binding and unbinding constants. True value shown as the intersection of the two lines. Bottom: Predictive distributions for the key model species (90%, equitailed, pointwise) in grey, with true (unknown) values overlaid.

difficult, and less‐than‐perfect alternatives are often employed.

In the case of σ D, the standard strategy is to form a fusion of the promoter of hag, P hag to gfp, to get P haggfp, and then integrate this construct into a convenient (p.690)

Figure 4: Top: Marginal posterior distributions for the (log of the) three parameters of interest, based on 24 observations of Hag. Middle: Contour plot of the bivariate posterior distribution of the (log of the) fla/che binding and unbinding constants. Bottom: Predictive distributions for the key model species, with true (unknown) values overlaid.

place in the genome, which is often at the locus known as amyE. The genotype of the resulting mutant is typically written amyE::P haggfp (Kearns and Losick 2005). The rationale behind this construction is that P hag is strongly activated by σ D, and so when levels of σ D are high, the production rate of GFP should also be high. Note (p.691) however, that there is absolutely no reason to suppose a linear relationship between the levels of σ D and the level of GFP, and hence the measured levels of fluorescence. There are several additional sources of discrepancy, including the fact that GFP is a relatively stable protein, and therefore decays more slowly than most other proteins. Additionally, since the amyE locus is close to the origin of replication, there will typically be two copies of this gene per cell, whereas the hag and σ D genes are far from the origin, and hence will typically be single‐copy only. Although there clearly is a relationship between the levels of σ D and GFP, this relationship must be explicitly modelled in a quantitative way. Some actual time lapse microscopy images of cells of this genotype are shown in Figure 5. Images such as these must be analysed to track individual cells over time, and to quantify the levels of GFP fluorescence in each cell at each time point. Specialist image analysis algorithms (Wang et al., 2010) can be used to automate this process.

Figure 5: Time‐lapse microscopy images of growing and dividing B. subtilis cells with genotype amyE::P hag‐gfp. Experiment conducted by the author using a DeltaVision microscopy system during a visit to the lab of Dr. Leendert Hamoen (Newcastle).

The additional species and reactions can be added into the model considered previously, and the SBML‐shorthand (Wilkinson 2006) corresponding to the full resulting model is given in the appendix. A typical realization from this full model is shown in Figure 6, showing the relationship between a few of the key species. Note the less‐than‐perfect relationship between the levels of σ D and GFP.

Inference for this enlarged model can be carried out using the same LF‐MCMC algorithm as previously described. Again, assuming 24 measurements of GFP levels (subjecting the cells to UV light more than once every 5 minutes is toxic), inference for the three key unknowns can proceed as before. (p.692)

Figure 6: A typical realization of the motility model, including the GFP reporter.

The marginal posterior distributions obtained using this extended model for the three parameters of interest are shown in Figure 7 (top). The [5%, 50%, 95%] quantiles of the marginals are [0.14, 1.35, 3.31], [−6.91, −2.69, 0.37] and [−6.23, −3.14, 0.23], respectively.

Although the GFP data is not quite as informative about the model parameters as direct observations of levels of σ D would be, considerable information can still be gained. See Finkenstadt et al. (2008) for related work based on a linear noise approximation. It is natural to wonder whether it is worth the effort of modelling GFP levels explicitly as we have done here, rather than simply assuming that the GFP levels correspond to levels of σ D. We can examine this question by re‐running our inferential procedure for measurements on σ D, but using the actual measured levels of GFP.

The marginal posterior distributions for the three parameters of interest are shown in Figure 8 (top). The [5%, 50%, 95%] quantiles of the marginals are [−0.36, −0.08, 0.22], [−5.88, −3.62, −1.91] and [−1.81, 0.36, 2.11], respectively. This (incorrect) posterior distribution is potentially misleading. There appears to be very strong information regarding kSigDprod—more information than we really have. It so happens in this case that the posterior contains the true value, but that is simply a consequence of the fact that the rates of production of GFP and σ D are assumed to be the same in this model. Further, the posteriors for the other two parameters are not correctly centred on the true parameter values—the true parameter values are very unlikely according to this posterior distribution. The filtering distributions are also (obviously) very badly calibrated. Thus, quantitative modelling of the relationship between measured GFP levels and the target protein of interest is clearly worthwhile.

There is a further potential complication with the use of fluorescence (and luminescence) data that has not yet been discussed. Although there is reason to believe that the measured fluorescence intensity will be in direct proportion to the number of molecules of mature GFP, often the data is uncalibrated in the sense that the (p.693)

Figure 7: Top: Marginal posterior distributions for the (log of the) three parameters of interest, based on 24 observations of GFP. Middle: Contour plot of the bivariate posterior distribution of the (log of the) fla/che binding and unbinding constants. Bottom: Predictive distributions for the key model species, with true (unknown) values overlaid.

constant of proportionality is (at least partially) unknown. Often it is possible to get a good handle on it using calibration data, but in general it will be desirable to (p.694)

Figure 8: Top: Marginal posterior distributions for the (log of the) three parameters of interest, based on 24 observations of GFP treated (incorrectly) as observations of σ D. Middle: Contour plot of the bivariate posterior distribution of the (log of the) fla/che binding and unbinding constants. Bottom: Predictive distributions for the key model species, with true (unknown) values overlaid.

include this constant as a further model parameter—see Henderson et al. (2010) for an example. Furthermore, it is not even completely clear that the measured fluo‐ (p.695) rescence is in fact directly proportional to the number of GFP molecules, as there is some suggestion that at high concentration the GFP molecules form aggregates which are not fluorescent (Iafolla et al., 2008).

## 4.3. Population data and knock‐out variants

Ultimately, obtaining just one read‐out on one particular protein is inevitably going to be limited in terms of the information that can be obtained. There are several obvious strategies to improve this situation. The first is to use multiple reporters in the same cells. This can be accomplished by using different coloured fluorescent reporters for different proteins of interest. In principle it is possible to use up to around four such reporters within a cell using current technology, but in practice it seems to be technically difficult to use more than two reliably. Another useful technique is to obtain data from cells with key genes knocked out. Provided that the gene is non‐essential, it is easy to construct the model corresponding to the knockout, and this new model will have many parameters in common with the original. Data from multiple models can be combined sequentially by taking the posterior for relevant parameters from one model as priors for the next.

Time‐lapse microscopy is currently the only practical way to track expression in individual cells over time. However, there are other technologies, such as flow cytometry, which can take measurements on thousands of individual cells at a given time. This technology can be used to monitor how the distribution of expression in a population changes over time (and in different knock‐outs). This data too is informative for model parameters, and is an effective alternative to time‐lapse microscopy in certain situations. There are several ways that such population level data can be used for model parameter inference. Perhaps the simplest (but computationally intensive) method is to use the ABC techniques described in Section 3.2 in conjunction with ensemble forward simulations from the model, conditioning by checking whether the simulated distribution of measurements is sufficiently close to the observed distribution, under some suitable metric on empirical distributions.

# 5. Summary

This paper has shown how Markov process models can be used to understand the stochastic dynamics of bacterial gene regulation. Inference for model parameters from time‐course measurements of system state is an important problem, and computationally intensive Bayesian algorithms such as LF‐MCMC and ABC have been shown to be useful here due to their inherent flexibility. Explicit quantitative modelling of the measurement process (including the relationship between fluorescent reporters and their target proteins) has been shown to be an important and non‐ ignorable aspect of the modelling process. There is clearly still a long way to go before such techniques can be routinely used in practice as part of a systems biology approach. Combining time‐lapse data from multiple experiments, mutants and conditions, together with similar data from flow cytometry experiments, for parameter estimation and model comparison, is still technically challenging, and the experimental systems themselves require improvement and calibration in order to be suitable for fully quantitative analysis. Integrating these single‐cell analyses with other molecular biology technologies such as microarrays and RNA‐sequencing data is a further challenge. However, many of the issues to be faced are fundamentally statistical in nature, and so it seems that statisticians have an important role to play in advancing current biological knowledge.

# (p.696) Acknowledgements

This work was funded by the Biotechnology and Biological Sciences Research Council through grants BBF0235451, BBSB16550 and BBC0082001. The author would also like to thank Dr Leendert Hamoen and members of his lab for hosting a visit by the author during the first half of 2009.

# References

Bibliography references:

Ball, K., Kurtz, T. G., Popovic, L. and Rempala, G. (2006). Asymptotic analysis of multiscale approximations to reaction networks. Ann. Prob. 16, 1925–1961.

Beaumont, M. A., Zhang, W. and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035.

Boys, R. J., Wilkinson, D. J. and Kirkwood, T. B. L. (2008). Bayesian inference for a discretely observed stochastic kinetic model. Statist. Computing 18, 125–135.

Chen, Y., Lawless, C., Gillespie, C. S., Wu, J., Boys, R. J. and Wilkinson, D. J. (2010). CaliBayes and BASIS: integrated tools for the calibration, simulation and storage of biological simulation models. Briefings in Bioinformatics 11, 278–289.

Dubnau, D. (1991). Genetic competence in Bacillus subtilis. Microbiology and Molecular Biology Reviews 55, 395–424.

Erhard, F., Friedel, C. C. and Zimmer, R. (2008). FERN – a Java framework for stochastic simulation and evaluation of reaction networks. BMC Bioinformatics 9, 356. doi:10.1186/1471‐2105‐9‐356.

Errington, J. (1993). Bacillus subtilis sporulation: regulation of gene expression and control of morphogenesis. Microbiology and Molecular Biology Reviews 57, 1–33.

Finkenstadt, B., Heron, E. A., Komorowski, M., Edwards, K., Tang, S., Harper, C. V., Davis, J. R., White, M. R., Millar, A. J. and Rand, D. A. (2008). Reconstruction of transcriptional dynamics from gene reporter data using differential equations. Bioinformatics 24, 2901–2907.

Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Physical Chemistry 81, 2340–2361.

Gillespie, D. T. (1992). A rigorous derivation of the chemical master equation. Physica A 188, 404–425.

Gillespie, D. T. (2000). The chemical Langevin equation. J. Chemical Physics 113, 297–306.

Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically reacting systems. J. Chemical Physics 115, 1716–1733.

Golightly, A. and Wilkinson, D. J. (2005). Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61, 781–788.

Golightly, A. and Wilkinson, D. J. (2006). Bayesian sequential inference for stochastic kinetic biochemical network models. J. Computational Biology 13, 838–851.

Golightly, A. and Wilkinson, D. J. (2008). Bayesian inference for nonlinear multivariate diffusion models observed with error. Comput. Statist. Data Anal. 52, 1674–1693.

Henderson, D. A., Boys, R. J., Krishnan, K. J., Lawless, C. and Wilkinson, D. J. (2009). Bayesian emulation and calibration of a stochastic computer model of mitochondrial DNA deletions in substantia nigra neurons. J. Appl. Statist. 104, 76–87.

Henderson, D. A., Boys, R. J., Proctor, C. J. and Wilkinson, D. J. (2010). Linking systems biology models to data: a stochastic kinetic model of p53 oscillations. The Oxford Handbook of Applied Bayesian Analysis. (A. O'Hagan and M. West, eds.). Oxford: Oxford University Press, 155–187.

(p.697) Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendes, P. and Kummer, U. (2006). COPASI—a complex pathway simulator. Bioinformatics 22(24), 3067–3074.

Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J. C., Kitano, H., Arkin, A. P., Bornstein, B. J., Bray, D., Cornish‐Bowden, A., Cuellar, A. A., Dronov, S., Gilles, E. D., Ginkel, M., Gor, V., Goryanin, I. I., Hedley, W. J., Hodgman, T. C., Hofmeyr, J.‐H., Hunter, P. J., Juty, N. S., Kasberger, J. L., Kremling, A., Kummer, U., Novere, N. L., Loew, L. M., Lucio, D., Mendes, P., Minch, E., Mjolsness, E. D., Nakayama, Y., Nelson, M. R., Nielsen, P. F., Sakurada, T., Schaff, J. C., Shapiro, B. E., Shimizu, T. S., Spence, H. D., Stelling, J., Takahashi, K., Tomita, M., Wagner, J. and Wang, J. (2003). The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531.

Iafolla, M. A., Mazumder, M., Sardana, V., Velauthapillai, T., Pannu, K. and McMillen, D. R. (2008). Dark proteins: effect of inclusion body formation on quantification of protein expression. Proteins 72, 1233–1242.

Kearns, D. B. and Losick, R. (2005). Cell population heterogeneity during growth of Bacillus subtilis. Genes and Development 19, 3083–3094. 10.1101/gad.1373905.

Kirkwood, T. B. L., Boys, R. J., Gillespie, C. S., Proctor, C. J., Shanley, D. P. and Wilkinson, D. J. (2003). Towards an e‐biology of ageing: integrating theory and data. Nature Reviews Molecular Cell Biology 4, 243–249.

Kitano, H. (2002). Computational systems biology. Nature 420, 206–210.

Kurtz, T. G. (1972). The relationship between stochastic and deterministic models for chemical reactions. J. Chemical Physics 57, 2976–2978.

Liebermeister, W. and Klipp, E. (2005). Biochemical networks with uncertain parameters', IEE Systems Biology 152, 97–107.

Marjoram, P., Molitor, J., Plagnol, V. and Tavare, S. (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. U.S.A. 100, 15324–15328.

Moszer, I., Jones, L. M., Moreira, S., Fabry, C. and Danchin, A. (2002). Subtilist: the reference database for the Bacillus subtilis genome. Nucleic acids research 30, 62–65. 10.1093/nar/30.1.62.

Sisson, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. U.S.A. 104, 1760–1765.

Sonenshein, A. L., Hoch, J. A. and Losick, R., eds (2002). Bacillus Subtilis and its Closest Relatives. New York: ASM Press.

Toni, T., Welch, D., Strelkowa, N., Ipsen, A. and Stumpf, M. P. H. (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems', J. R. Soc. Interface 6, 187–202.

Wang, Q., Niemi, J., Tan, C. M., You, L. and West, M. (2010). Image segmentation and dynamic lineage analysis in single‐cell fluorescence microscopy. Cytometry A 77, 101–110.

Wilkinson, D. J. (2006). Stochastic Modelling for Systems Biology. London: Chapman and Hall.

Wilkinson, D. J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics 10, 122–133. 10.1038/nrg2509.

Wilkinson, R. D. (2008). Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Tech. Rep., Sheffield University, UK.

# (p.698) Appendix

## Motility Model

The full SBML‐shorthand (Wilkinson 2006) for the model considered in this paper is given below. This can be converted to full SBML (Hucka at al. 2003) using the tools available from:

• @model:2.1.1=BSMod02 “Bacillus subtilis motility with GFP”

• @units

• substance=item

• @compartments

• Cell=1

• @species

• Cell:codY=1 s

• Cell:CodY=0 s

• Cell:flache=1 s

• Cell:SigD=0 s

• Cell:hag=1 s

• Cell:Hag=0 s

• Cell:CodY_flache=0 s

• Cell:CodY_hag=0 s

• Cell:SigD_hag=0 s

• Cell:Phag_gfp=2 s

• Cell:SigD_Phag_gfp=0 s

• Cell:CodY_Phag_gfp=0 s

• Cell:GFP=0 s

• @parameters

• kProtDeg=0.0002

• kCodOn=0.02

• kCodOff=0.1

• kProdSigD=1

• @reactions

• @r=CodYprod

• codY‐〉codY+CodY

• k*codY : k=0.1

• @r=CodYdeg

• CodY‐〉

• kProtDeg*CodY

• @r=SigDprod

• flache‐〉flache+SigD

• kProdSigD*flache

• @r=SigDdeg

• SigD‐〉

• kProtDeg*SigD

• @r=Hagprod

• SigD_hag‐〉SigD+hag+Hag

• k*SigD_hag : k=1

• @r=Hagdeg

• Hag‐〉

• kProtDeg*Hag

• @r=hagact

• SigD+hag‐〉SigD_hag

• k*SigD*hag : k=0.01

• @r=haginact

• SigD_hag‐〉SigD+hag

• k*SigD_hag : k=0.1

• @r=flacherep

• CodY+flache‐〉CodY_flache

• kCodOn*CodY*flache

• @r=flacheunrep

• CodY_flache‐〉CodY+flache

• (p.699) kCodOff*CodY_flache

• @r=hagrep

• CodY+hag‐〉CodY_hag

• k*CodY*hag : k=0.01

• @r=hagunrep

• CodY_hag‐〉CodY+hag

• k*CodY_hag : k=0.1

• @r=GFPprod

• SigD_Phag_gfp‐〉SigD+Phag_gfp+GFP

• k*SigD_Phag_gfp : k=1

• @r=GFPdeg

• GFP‐〉

• 0.5*kProtDeg*GFP

• @r=Phag_gfpact

• SigD+Phag_gfp‐〉SigD_Phag_gfp

• k*SigD*Phag_gfp : k=0.01

• @r=Phag_gfpinact

• SigD_Phag_gfp‐〉SigD+Phag_gfp

• k*SigD_Phag_gfp : k=0.1

• @r=Phag_gfprep

• CodY+Phag_gfp‐〉CodY_Phag_gfp

• k*CodY*Phag_gfp : k=0.01

• @r=Phag_gfpunrep

• CodY_Phag_gfp‐〉CodY+Phag_gfp

• k*CodY_Phag_gfp : k=0.1

# Discussion

## Samuel Kou (Harvard University, USA)*

I would like to thank Dr. Wilkinson for presenting an interesting and stimulating paper. I strongly agree that statistics, particularly Bayesian statistics, plays and will play a crucial role in biological sciences. A major reason, I believe, is that in modern biology stochastic models are more and more widely applied and accepted, as technological breakthroughs have enabled scientists to investigate biological systems in ever‐increasing detail.

For example, advances in nanotechnology have made it possible for scientists to study biological systems at the single‐molecule level (Kou, 2009). It is well known that at the cellular level, biological processes are stochastic. For instance, placing genetically identical E. Coli bacteria in identical lab environment, one observes random and distinct outcomes (cf. Choi et al. 2008).

Along with opportunities and great potentials, the wide use of stochastic models also brings new challenges for statisticians. The first main challenge is model calibration. For a given stochastic model, typically there are unknown parameters. Statistical inference of the parameters is often complicated by (i) the high dimensionality of the model, and (ii) the large amount of missing data. It is not uncommon that a gene regulatory network involves more than fifty or a hundred reactions. Furthermore, in biological experiments, of all the species (such as proteins, enzymes or genes) involved in the reaction network, typically only one or two are observed (via fluorescence tags, such as GFP); the rest are entirely unrecorded. This challenging missing‐data issue, together with the high dimensionality, makes statistical inference computationally intensive. Consequently, for routine use, we need to develop (p.700) methods that are capable of handling the large dimension and missing data in real time (e.g., within a few hours). We are now still in the early stages of this task.

The second main challenge facing statisticians is model uncertainty. The great George Box once said, “All models are wrong, but some are useful.” This statement is quite pertinent to the stochastic models in biology. Often biologists are uncertain about the chemical kinetics. The models used by biologists are commonly chosen for (both theoretical and experimental) convenience. (a) It is quite possible that some reaction intermediates are ignored in the model (a biologist might write X+YZ, but the real reaction could be X+YW, WZ). As a concrete example, in reaction networks protein degradation is often modeled as a single step: X→ ∅. But in reality (e.g., in the ubiquitin‐proteasome pathway), protein degradation involves ATP binding to the protein; the energy from ATP is used to tag an unwanted protein with a chain of ubiquitins marking it for destruction; the protein is then hydrolyzed into small peptide fragments by the proteasome. Thus, the seemingly simple X→ ∅ actually consists of at least three steps (protein → ATP binding → ubiquitins tagging → degradation by proteasome). (b) It is quite possible that some reactions are missing in the (gene regulatory network) model due to insufficient scientific understanding. (c) The conformational dynamics of enzymes and proteins is often ignored, but the conformational fluctuation might in fact have important physiological implications for a living cell (Min et al. 2005; Kou, 2008).

Given these uncertainties, an immediate issue is model assessment. Is the current model adequate for the experimental data? If not, how should we improve it? Among all the reactions (or equations), which one is the most crucial to improve on? Which one is the most sensitive? Sometimes there are competing models available. Then one wants to know if the experimental observations are capable of discriminating the competing models. If so, which one is preferred? Like in model calibration, model assessment and selection are complicated by the high dimensionality, large amount of missing data and intensive computing.

Next, I will comment in some detail the parameter inference considered in Dr. Wilkinson's paper. Likelihood‐free MCMC (LFMCMC) was discussed and shown to be effective for a small system. LFMCMC relies on the exact simulation of the stochastic process in each step of the computation. It is this exact simulation that enables the likelihood to appear in both the denominator and numerator of the Metropolis–Hastings ratio, leading to a convenient cancellation. The reliance on exact simulation, on the other hand, creates two serious problems. First, if the experiments are carefully planned and implemented and yield high quality data with little noise, then the LFMCMC algorithm becomes exceedingly slow; in the extreme case of noise‐free experiments—an ideal for biologists—the LFMCMC algorithm would not be able to function at all. This is because in order to update the parameter values from θ to θ′, LFMCMC first draws θ′ and then generates x′ from θ′; in the small noise situation the only way that θ′ can be accepted is if x′ is very close to x, but this is a rare probability event, making LFMCMC incapable of moving. The approximate Bayesian computation (ABC) method somewhat alleviates this problem but not totally (for example, how to compare x′ to x, how to find the summarizing statistics, how close is close enough, how the approximation affects the quality of Bayesian inference, etc.); more research is clearly needed. Second, many reaction networks are multiscaled: some reactions in the network are much slower than others. These slow reactions are often the crucial ones that drive the whole system. An exact simulation of this type of system spends most of its computing time on the fast but unimportant reactions, wasting most resources on irrelevant (p.701) details and leading to excessively slow inference. Interestingly, the multiscale problem has attracted lots of attention from the applied mathematics and probability communities (see, for example, E et al. 2007, Ball et al., 2006, Gillespie, 2001). Integrating these multiscale approaches with Bayesian inference, in my opinion, presents a promising research direction.

Understanding stochastic dynamics is of significant scientific interest. As Dr. Wilkinson's paper demonstrates, statistics, particularly Bayesian statistics, plays an important role in the analysis of experimental data. Many problems lie ahead, ranging from model calibration, model assessment and selection, to model construction. The ultimate statistical goal is to develop fast and efficient (Bayesian) inference methods, which biologists can routinely use. We are in the early stages, but I hope that Dr. Wilkinson's stimulating paper and this discussion help generate interest in this important and challenging problem.

## Nicolas Chopin (Crest, France) and Christian P. Robert (Université Dauphine, France)*

In this discussion, we reflect on the links between the likelihood‐free method of the author and of recent developments by Mϕller et al. (2006) and Friel and Pettitt (2008), as well as the ABC literature (Beaumont et al., 2002).

While very much impressed by the scope of the chemical reaction models handled by Professor Wilkinson, we will (presumably predictably!) focus on the simulation aspects of his paper.

First, the solution proposed by the author to overcome the difficulties of handling the complex likelihood π(xǀθ) reminds us of the auxiliary completion of Mϕller et al. (2006), who created (as well) an auxiliary duplicate of the data x and a pseudo‐posterior on the duplicate to overcome computing the normalizing constant in π(xǀθ). As pointed out by Cucala et al. (2009), the choice of the completion distribution in Mϕller et al. (2006) may be detrimental to the convergence of the algorithm and we wonder if the same happens to the likelihood‐free algorithm of the author.

Second, the dismissal of ABC (Approximate Bayesian Computation, see, e.g., Grelaud et al., 2009) as being difficult to calibrate and to automatize is slightly unfair in that the summary statistics used in ABC are generally suggested by practitioners. Sequential ABC has been studied by Beaumont et al. (2009) as well, bringing a correction to Sisson et al. (2007) and building up a population Monte Carlo scheme for the approximation of π(θ,xǀƊ).

Third, when considering the sequential solution of Professor Wilkinson, we wonder about the approximation effects due to (a) the use of a kernel at each time t and (b) the lack of correction of the paths up to time t when given the new data d t+1, because this particle approach is bound to diverge quite rapidly from the true distribution.

Specifically, consider the same algorithm, but (a) with a fixed parameter θ (no estimation), and (b) no “slight perturbation” (Step 2a). Add a re‐weighting step, where the current weight of each sampled trajectory is multiplied by the partial observation likelihood Pt+1 ǀ Ɗt, x t+1) (which should depend in most cases on (p.702) the part of the trajectory sampled between time t and t + 1). Then one obtains (at negligible extra cost), a valid sequential Monte Carlo (SMC) algorithm for a continuous‐time hidden Markov model, as in e.g., Chopin and Varini (2007). If θ is included, then the algorithm remains valid (in the sense that the Monte Carlo error goes to zero as the number of trajectories goes to infinity), but it is likely to diverge over time (in the sense that the asymptotic variance typically grows quickly over time, see also our dicussions on the paper by Lopes et al. in this volume). The PMCMC approach of Andrieu et al. (2010), while expensive, may be a more reasonable approach in this case.

## Paul Fearnhead, Vasileios Giagos and Chris Sherlock (Lancaster University, UK)

We would like to thank the author for an interesting paper, and we applaud the aim of producing generic software which should be suitable for making inference on a range of stochastic kinetic models. We have a concern with the use of uniform priors, and we would also like to point out an alternative to ABC for fast approximate inference: the Linear Noise Approximation.

The choice of uniform priors on the log scale for stochastic rate constants makes reasonable assumptions on the range of the parameter values but implies, perhaps unintentionally, prior knowledge about the ratio of any two rate constants, since the log of this quantity has a Triangular distribution. In the case of reversible reactions, this quantity expresses the propensity of the forward reaction and its posterior estimates have been shown to be (consistently) more accurate (e.g., Golightly and Wilkinson 2005, 2006) compared to the estimates of the rate constants for the forward and backward reactions. It is reasonable to question whether this prior information reflects an expert opinion, and if not, it would certainly be of interest to see the sensitivity of inference to other choices of joint prior which preserve the range but lead to a flatter prior for the ratio.

Table 2: Number of datasets (out of 100 in each case) where the difference in log‐likelihood between the MLE and the true value is significant at the 5% level. The counts concern the auto‐regulatory example considering three different system sizes and three different freqencies of observation.

Observation Interval

Counts

Small

Medium

Large

0.1

40

7

4

0.5

12

5

3

1

6

5

5

The author mentions the Chemical Langevin Equation (CLE), an SDE approximation to the Markov jump process, which is reasonable provided that the number of molecules is not too small. The Linear Noise Approximation (LNA) (Kurtz, 1972) is a linear SDE which can be viewed as the linearisation of the CLE about the deterministic solution to the drift part of the equation. Solutions to the LNA are Gaussian, and the mean and variance at any time point can be ascertained by numerical integration of ODEs. Since our interest is to employ the LNA for inferential purposes, we first investigated to what degree the approximation to the system dynamics is satisfactory. We considered a series of simulation experiments using the (p.703) auto‐regulatory gene network of Golightly and Wilkinson (2005) and we compared the transition densities of the exact Markov jump process (SSA), the CLE, and the LNA. The LNA compares well (see Figure 9) for all (small, medium, large) system sizes, i.e., systems with initial populations of (34, 340, 3400) molecules respectively.

Figure 9: Kernel density estimates of the marginal (DNA) transition densities of the Markov jump process (solid black), the CLE (dashed black) and the LNA (solid grey) using three different system sizes. The lines at the small system indicate the relative frequencies of the discrete states.

We then looked at inference for the rate parameters when the concentration of molecules is observed for all species, at discrete time intervals, with no measurement error. The resulting multivariate normal likelihood is straightforward to calculate by solving the ODEs and inference is extremely fast.

Table 2 shows results from a simulation study of the previous gene network. Results are good for medium and large‐sized systems, though large discrepancies from the expected number of significant likelihood ratio tests are observed for small‐ sized systems. Additionally, the LNA method can be extended to partially observed data with measurement error by using it within the sequential methods considered in this paper.

Finally, we are currently developing an R package (Iner) to automate the approximate inference for the stochastic kinetic constants using the LNA.

First let me thank Dr. Kou and the other discussants for their insightful comments.

Dr. Kou began his discussion with some general comments about the application of complex stochastic modelling and Bayesian inference to difficult problems in bioscience research. In particular, he emphasized the importance of the assessment of model adequacy and model choice in complex, data‐poor scenarios. This is absolutely true, as biological models are inevitably a simplification of reality. Missing reactions and intermediaries (deliberately or otherwise) are definitely an issue to be aware of, not to mention more fundamental assumptions of the modelling paradigm (here, for example, that of rapid spatial mixing). Comparison of two or more competing models is a conceptually straightforward extension of the methodology, though computationally challenging, and likelihood‐free schemes for computing Bayes factors have been adopted elsewhere in the literature; see, for example, Toni et al. (2009), for an ABC approach. Formal direct quantification of the adequacy of a (p.704) single given model is conceptually more difficult, but informal/graphical diagnostics such as assessment of posterior predictive fit is straightforward, natural and useful. Sensitivity analysis can be a useful way to understand which parts of the model are most important to improve/get right, and whilst the techniques supporting this are better developed for deterministic models, sensitivity analysis for stochastic models has been studied (Thattai & van Oudenaarden, 2001).

The remainder of Dr. Kou's comments relate to limitations of the particular LF‐ MCMC algorithm presented in the paper. The algorithm does indeed break down in low‐noise scenarios, and more conventional ABC algorithms may be preferable in this case. It also breaks down as the number of time steps increases (the “particle degeneracy” problem), and as the number of parameters increases (the usual “curse of dimensionality”). However, these latter issues also apply to (sequential) ABC algorithms, as well. Speed of exact simulation for multiscale models can also be an issue, but there are now many fast approximate “hybrid” stochastic simulators which can be used in place of the exact algorithm in such cases, and even more crude approximations, such as the diffusion approximation (Golightly & Wilkinson, 2006) and stochastic model emulators (Henderson et al., 2009), have been shown to be generally satisfactory, in practice.

Chopin and Robert point out the connections between the LF‐MCMC algorithm and other similar ideas that have been presented recently in the literature. I certainly had no intention of “dismissing” ABC—it has clearly been shown to be a generally useful and widely applicable technique. Instead my aim was simply to point out that there are other ways of approaching challenging inference problems, and that in some circumstances it may be desirable to do so. It is true that the LF‐MCMC algorithm requires a slightly undesirable kernel tuning parameter which is in some ways analogous to the similarly undesirable “tolerance” parameter, ϵ, used in ABC schemes.

More generally, the LF‐MCMC and ABC algorithms are just two possible approaches to developing inference algorithms for complex models which have the property that He et al. (2010) refer to as “plug‐and‐play”; that is, algorithms which rely only on the ability to forward simulate from the model, and do not require the ability to evaluate likelihoods associated with it. In a non‐Bayesian context, “iterative filtering” (He et al., 2010) is another such approach. Such algorithms are likely to become increasingly important as we become more ambitious about the kinds of models we wish to analyse. However, until recently we were in the rather unsatisfactory situation of not having a genuinely “exact” method for Bayesian inference in such contexts. The recently proposed Particle MCMC methods (mentioned by Chopin and Robert) do indeed offer an exact approach applicable to models of this type, and therefore represent a considerable advance in this area. Since presenting my paper at Valencia, we (joint work with A. Golightly, Newcastle) have applied the PMMH PMCMC algorithm to the model from this paper. Comfortingly, we obtain very similar results. The PMCMC algorithm does (unsurprisingly) seem to be more numerically stable than the LF‐MCMC algorithm (when using a sufficiently large number of particles), but this increased numerical stability comes at a considerable additional computational cost. Which algorithm is most “computationally efficient” (for a required level of accuracy) is likely to be problem‐specific, and represents an interesting area of future research.

Fearnhead et al. first query the use of uniform priors (on the log‐parameter scale). I do not share their concerns, as plots such as Figure 3 (middle) in the paper clearly (p.705) show that the likelihood is strongly concentrated along the diagonal, indicating strong information in the data regarding the ratio of parameters (difference on the log scale), as the prior is flat over the region shown. The marginal posterior for the log‐ratio is much more concentrated than the (almost) triangular implied prior. That said, I am not particularly advocating routine use of log‐uniform priors in such models. In applied problems it is always desirable to carefully elicit expert knowledge for the parameters, and in fact most experts usually are more certain about the ratio of forward and backward constants than their individual values. It seems to be the case that the log‐scale is the natural scale for thinking about rate constants, and that experts are often better able to specify bounds on parameters than distributions. In the context of a simulation study it is also desirable to be clear about the information that has come from the data.

Fearnhead et al. then discuss the use of the LNA as a further approximation to the standard diffusion approximation (the CLE). We have found (e.g., Boys et al., 2008) that in most practical problems, making the CLE approximation does not have a particularly adverse impact on inferences for models of this type, even in cases where the approximation would be unsatisfactory for forward simulation, and it is interesting to see that the further approximation suggested (the LNA) also gives satisfactory results, as it lends support to previous applications of this technique (Komorowski et al., 2009). I look forward to experimenting with the proposed R package.

To finish, I would like once again to thank the discussants for taking the time to comment on this paper, and hope that their contributions give further encouragement to Bayesians to engage both with the methodological and computational challenges of inference for analytically intractable models, and the applied scientific problems associated with modern bioscience research.

# Additional References in the Discussion

Bibliography references:

Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. Roy. Statist. Soc. B 72, 269–342.

Ball, K., Kurtz, T. G., Popovic, L. and Rempala, G. (2006). Asymptotic analysis of multiscale approximations to reaction networks. Annals of Applied Probability 16, 1925–1961.

Beaumont, M., Cornuet, J.‐M. Marin, J.‐M. and Robert, C. (2009). Adaptive approximate Bayesian Computation. Biometrika 96, 983–990.

Choi, P. J., Cai, L., Frieda, K. and Xie, X. S. (2008). A stochastic single‐molecule event triggers phenotype switching of a bacterial cell. Science 322, 442–446.

Chopin, N. and Varini, E. (2007). Particle filtering for continuous‐time hidden Markov models. ESAIM: Proceedings 19, 12–17.

Cucala, L., Marin, J.‐M. Robert, C. and Titterington, D. (2009). Bayesian inference in k‐nearest‐neighbour classification models. J. Amer. Statist. Assoc. 104, 263–273.

Friel, N. and Pettitt, A. (2008). Marginal likelihood estimation via power posteriors. J. Roy. Statist. Soc. B 70, 589–607.

Grelaud, A., Marin, J.‐M., Robert, C., Rodolphe, F. and Tally, F. (2009). Likelihood free methods for model choice in Gibbs random fields. Bayesian Analysis 3, 427–442.

He, D., Ionides, E. L. and King, A. A. (2010). Plug‐and‐play inference for disease dynamics: measles in large and small populations as a case study. J. Roy. Statist. Soc. B 7, 271–283.

(p.706) Komorowski, M., Finkenstadt, B., Harper, C. V. and Rand, D. A. (2009). Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics 10, 343.

Kou, S. C. (2008). Stochastic networks in nanoscale biophysics: modeling enzymatic reaction of a single protein. J. Amer. Statist. Assoc. 103, 961–975.

Kou, S. C. (2009). A selective view of stochastic inference and modeling problems in nanoscale biophysics. Science in China, A 52, 1181–1211.

Min, W., English, B., Luo, G., Cherayil, B., Kou, S. C. and Xie, X. S. (2005). Fluctuating enzymes: lessons from single‐molecule studies. Acc. Chem. Res. 38, 923–931.

Mϕller, J., Pettitt, A., Reeves, R. and Berthelsen, K. (2006). An eficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93, 451–458.

Sisson, S. A., Fan, Y. and Tanaka, M. (2007). Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765.

Thattai, M. and van Oudenaarden, A. (2001). Intrinsic noise in gene regulatory networks. Proc. Nat. Acad. Sci. USA 98, 8614–8619.

Weinan, E., Engquist, B., Li, X., Ren, W. and Vanden‐Eijnden, E. (2007). Heterogeneous multiscale methods: A review. Comm. Comput. Phys 2, 367–450.

## Notes:

Samuel Kou's research is supported in part by the NSF grant DMS‐0449204 and NIH/NIGMS grant R01GM090202‐01.

N. Chopin and C. Robert are supported by the 2007–2010 grant ANR‐07‐BLAN‐0237‐01 “SP Bayes”.