
Differential Evolution Markov Chain (DEMC)
Acceptance Rate | The optimal acceptance rate is based on the multivariate normality of the marginal posterior distributions, and ranges from 44% for one parameter to 23.4% for five or more parameters. The observed acceptance rate may be suitable in the interval [15%,50%]. |
Applications | This is a widely applicable, general-purpose algorithm. |
Difficulty | This algorithm is moderate in difficulty. It has few algorithm specifications that are easy to specify, but proficiency comes with practice. This algorithm requires an additional matrix of initial values, and the number of chains depends on the number of parameters.. |
Final Algorithm? | No. |
Proposal | Multivariate. |
The original Differential Evolution Markov Chain (DEMC) (Ter Braak 2006), referred to in the literature as DE-MC, included a Metropolis step on a genetic algorithm called Differential Evolution with multiple chains (per parameter), in which the chains learn from each other. It could be considered as parallel adaptive direction sampling (ADS) with the Gibbs sampling (Gibbs) step replaced by a Metropolis step, or as a non-parametric form of Random-Walk Metropolis (RWM). However, for a model with J parameters, the original DEMC required at least 2J chains, and often required as many as 20J chains. Hence, from 2J to 20J model evaluations were required per iteration, whereas a typical multivariate sampler such as Adaptive-Mixture Metropolis (AMM) requires one evaluation, or a componentwise sampler such as Adaptive Metropolis-within-Gibbs (AMWG) requires J evaluations per iteration.
The version included here was presented in Ter Braak and Vrugt (2008), and the required number of chains is drastically reduced by adapting based on historical, thinned samples (the parallel direction move), and periodically using a snooker move instead. In the article, three chains were used to update as many as 50-100 parameters. Larger models may require blocks (as suggested in the article, and blocking is not included here), or more chains (see below). In testing, here, a few 200-dimensional models have been solved with 5-10 chains.
DEMC has four algorithm specifications: Nc
is the number of chains (at least 3), Z
is a T x J matrix or T x J x Nc array of T thinned samples for J parameters and Nc chains, gamma
is a scale parameter, and w
is the probability of a snooker move for each iteration. When gamma=NULL
, the scale parameter defaults to the recommended 2.381204 / sqrt(2J), though for snooker moves, it is 2.381204 / sqrt(2) regardless of the algorithm specification. The default, recommended probability of a snooker move is w = 0.1.
The parallel direction move consists of a multivariate proposal for each chain in which two randomly selected previous iterations from two randomly selected other chains are differenced and scaled, and a small jitter, U ∼ (-0.001, 0.001)J, is added. The snooker move differences these with another randomly selected (current) chain in the current iteration, and with a fixed scale. Another variation is to use snooker with past chain time-periods. The snooker move facilitates mode-jumping behavior for multimodal posteriors.
For the first update, Z
is usually NULL
. Internally, LaplacesDemon
populates Z
with GIV
, and it is strongly recommended that PGF
is specified by the user. As the sampler iterates, Z
is used for adaptation, and elements of Z
are replaced with thinned samples. A short, first run is recommended to obtain thinned samples, such as in Fit$Posterior1
. For consecutive updates, this posterior matrix is used as Z
In this implementation, samples are returned of the main chain, for which Initial.Values
are specified. Samples for other chains (associated with PCIV
) are not returned, but are used to assist the main chain. The authors note that too many chains can be problematic when an outlier chain exists. Here, samples of other chains are not returned, outlier or not. If an outlier chain exists, it simply does not help the main chain much and wastes computational resources, but does not negatively affect the main chain.
An attractive property is that DEMC does not require a proposal covariance matrix, but adapts instead based on past (thinned) samples. In the output of LaplacesDemon
, the thinned samples are also stored in CovarDHis
, though they are thinned samples, not the history of the diagonal of the covariance matrix. This is done so the absolute differences can be observed for diminishing adaptation. Another attractive property is that the chains may be parallelized, such as across CPUs, in the future, though the current version is not parallelized.
DEMC has been considered to perform like a version of Metropolis-within-Gibbs (MWG) that updates by chain, rather than by component. DEMC may be one form of a compromise between a one-evaluation multivariate proposal and J componentwise proposals. Since it is adaptive, DEMC is not recommended as a final algorithm.
References
- Ter Braak C (2006). "A Markov Chain Monte Carlo Version of the Genetic Algorithm Differential Evolution: Easy Bayesian Computing for Real Parameter Spaces." Statistics and Computing, 16, 239-249.
- Ter Braak C, Vrugt J (2008). "Differential Evolution Markov Chain with Snooker Updater and Fewer Chains." Statistics and Computing, 18(4), 435-446.



