ensemble (c41b2)

ENSEMBLE averaging / replica exchange

Robert Best
0-K String method

Victor Ovchinnikov

The ENSEMBLE module of CHARMM permits one to start a number of
copies of CHARMM, communicating using MPI, with some small amount
of information being shared between the copies. There are a number
of applications of this:
(i) to average restraints over an ensemble (especially
useful for NOE's/spin labels in unfolded states (1,2).
(ii) to perform replica-exchange (parallel tempering) (3)
simulations at a number of temperatures to enhance
sampling.
(iii) to do replica exchange between different energy functions.
(e.g. between different umbrella windows) (4).
(iv) exponential averaging of different force-fields (5).
(v) to find a minimum energy path (MEP) between two
conformations of a molecule (0-K String method)

Many other applications can be envisaged.

This feature is still quite new and it is advisable to stick
closely to the test cases to start with.

References:
-------------
1. R. B. Best & M. Vendruscolo, JACS, 126, 8090-8091 (2004) + supp info.
2. R. B. Best, J. Clarke & M. Karplus, J. Mol. Biol., 349, 185-203 (2005).
3. K. Lindorff-Larsen, R. B. Best, M. A. DePristo, C. M. Dobson &
M. Vendruscolo, Nature, 433, 128-132 (2005).
4. R. B. Best & G. Hummer, unpublished.
5. R. B. Best, Y-G. Chen and G. Hummer, Structure, 13, 1755-1763 (2005).

The zero-temperature string method is described in:

1. E, W., Ren, W. & Vanden-Eijnden, E.
Simplified and improved string method for computing the minimum energy
paths in barrier-crossing events. J. Chem. Phys. 126, 164103-164103-8 (2007)

------------------------------------------------------------------
NOTES ON BUILDING THE ENSEMBLE CODE:

For gnu compilers:
$> ./install.com gnu medium E M

------------------------------------------------------------------

* Syntax | Syntax of the ENSEMBLE command
* General Description | General info on I/O and other practical matters
* Replica Exchange | Using replica exchange to swap between different
force-fields and/or temperatures
* Ensemble Restraints | Using HQBM and ENSEMBLE to average restraints
* Force-field Averaging | Combining two potentials with exponential averaging
(e.g. for "multi-go" models)
* 0-K String method | Finding minimum energy paths (MEP) between two
conformations of a molecule
* Test Cases | Description of c33 and c34 tests

Top
Initialize ENSEMBLE:
--------------------

ENSEMBLE NENSEM integer

Replica exchange commands:
--------------------------

ENSEMBLE EXCH T2REp integer REP2t integer FREQ integer MAPU integer -
[ AUTO LOWT real TGRAD real PSWAP real | TEMP1 TEMP2 ... TEMPN ]

ENSEMBLE [SWON | SWOFF]

ENSEMBLE INFO

ENSEMBLE STRING

General commands:
-----------------

ENSEMBLE SYNC

ENSEMBLE SEED [ROOT integer]

Force-field averaging (used for "multi-Go", for example)
--------------------------------------------------------

ENSEMBLE EXPAvg BETA real [ UNIT real ] -
OFFSet real_1 real_2 ... real_nensem

Top
The following section describes the keywords of the ENSEMBLE command.

General Description
===================

Ensemble enabled executables run exactly the same as normal parallel
command is. A charmm script will be processed in the normal parallel
way until the ensemble initializing command is given. Once ensemble is
initialized for N replicas, there are N replicas of charmm running
where the total processors are split evenly among the replicas. (***
The total number of processors must be evenly divisable by the number
of replicas. ***)

Once running in ensemble mode, each replica runs independently at
first, each reading the default input stream. Each replica produces
its default output to an output file charmm.out.xxx, where xxx is the
replica number (excepting the 0th replica which still goes to stdout)
from 1 to N-1.

Each replica can open and close files independently. Take care to not
open the same file for writing (such as dcd or restart files) on
different replicas, see example below.

Each replica can stream a new input file, allowing independent
simulations to run out of the a large number of processors in the same
batch run.

Initializing ENSEMBLE
-----------------------

The command

ensemble nensem 4

will break the processors into 4 replicas of the current state of the
charmm run, giving 1/4 of the processors to each replica. The replicas
are numbered 0,1,2,3. The output for rep 0 still goes to stdout, while
the others go to charmm.out.001, charmm.out.002, charmm.out.003. The
replicas keep reading the input file (though independently) unless
directed to stream another input file. The identity of the replica can
be determined from ?whoiam and the total number of replicas can be
determined from ?nensem.

set numrep ?nensem
set myrep ?whoiam

These are useful for giving different file names to different nodes or
used in conditionals to process the input stream differently for each
rep, for example:

open unit 20 write form name rest@myrep.rst
stream newinput_@myrep.inp
if @myrep .eq. 000 then
do some stuff
endif
ensemble sync

This implementation differs from other implementations of replica
exchange (excluding those based on external scripting), for example
the closely related REPDstr function in CHARMM, or that in GROMACS, in
that all processes can take input from the same file rather than
reading different files. Reading from a single file requires a few
additional commands, but has the advantage that all simulation input
is contained in one place. Alternatively one could use separate input
files as noted above.

Each node reads the input file itself and each node maintains a
completely independent copy of all data. This allows dynamics to be
run much as usual, with all nodes happily unaware of the others, apart
from the communication entailed in replica-exchange or force-field
averaging. A few points about I/O.

Files that really should be opened with unique names for each rep:
------------------------------------------------------------------
trajectories (coord/velocities/..)
restart files
energy files from dynamics runs
any other dynamics output which will differ between replicas
coordinate writing
experimental data files for HQBM

Files opened for reading by all reps will be opened by each rep in
read-only mode, each rep opening the file and reading it
independently. This may cause io delays for huge numbers of reps.

Some Initialization notes
-------------------------
For most ensemble averaged restraints, starting all replicas with the same
coordinates and velocities this is a waste of time (and is one pathological
case where an N-replica simulation will behave exactly like a single replica).
Thusly, one should assign either or both different random seeds (see below) and
different starting coordinates to different replicas. Bear in mind that not all
integration schemes in CHARMM actually use a random seed from the dyna command
(e.g. NOSE does not, but LEAP VERLET does).

e.g. for assigning different seeds
if ?whoiam .eq. 0 set seed 23832
if ?whoiam .eq. 1 set seed 9375283
etc...
Then use "dyna start ... iseed @seed ..."

Top
Replica Exchange
================

NOTE: THE REPLICA EXCHANGE FEATURE IS STILL NOT THOROUGHLY TESTED AND
SHOULD THEREFORE BE USED WITH CAUTION.

At present, only the main dynamics integrator in charmm (that is, the
three-step verlet in dynamc.src) is fully supported by this command. Thus
'DYNA NOSE' etc. will not work, but 'DYNA LEAP' will.

An earlier version of this code required a deconvolution of coordinates
written at different tempertures. However, since the overhead for
coordinate swapping is so low, it is easier to do it during the run
and that is how it is done at present. A record of swaps is still
written out for information.

NEW IN C34:
- Constant pressure MD
- Support for VV2 integrator (incl. constant pressure)

The idea will not be described here, see Sugita & Okamoto, Chem. Phys. Lett.
314, 141-151 (1999), for example.

When starting off, replica exchange is turned off. To turn it on and
set up temperatures use:

ENSEMBLE EXCH T2REp integer REP2t integer FREQ integer MAPU integer -
[ RULEs integer ] -
[ AUTO LOWT real TGRAD real PSWAP real | TEMP1 TEMP2 ... TEMPN ]
T2RE integer: unit to write map of replica(T) as sim progresses
REP2 integer: unit to write map of T(replica) as sim progresses
(yes, this is redundant!)
FREQ integer: frequency in MD timesteps for attempting swaps
##deprecated: MAPU integer: file to read a final temperature map
in order to restart dynamics ##
RULEs integer: number of unit to read allowed swaps from.
The format of this file is
----------8<--------------8<-------------
NRULE
I_1 J_1
I_2 J_2
...
I_NRULE J_NRULE
----------8<--------------8<-------------
where NRULE is the number of allowed swaps and subsequent
lines detail the pairs of nodes that are allowed to swap
Nodes are numbered from 1...NENSEM
AUTO LOWT real TGRAD real PSWAP real: this is the first way to
set up replica temperatures. Just specify the lowest
temperature you want, the gradient of potential energy
as a function of T (determined from a few trial
simulations), and the desired probability of
swapping replicas. This assumes a delta function for
the energy distributions, which is clearly incorrect.
TEMP1 TEMP2 ... TEMPN: specify temperatures manually - must give
as many as there are replicas!

ENSEMBLE [SWON | SWOFF]: turn replica exchange on/off. Can be useful to
have it off for initial equilibration.

## deprecated: ENSEMBLE WRITE UNIT integer: write temperature map to unit
for restart purposes (read in using MAPU in ENSE EXCH). ##

ENSEMBLE INFO: print out info about replica temperatures, etc.

Top
Ensemble restraints
=====================

This is mostly documented in hqbm.doc. The only relevant commands
are the 'general' ones above. Note comments about random seeds!

Top
The "ENSEmble EXPAvg" command invokes exponential averaging of different
force-fields. Each node reads a different force-field (by using node-dependent
names for the force-field files, for example), and the different potentials
are averaged with the following function:

exp(-beta_mix * E(R)) = exp(-beta_mix * {E_1(R) + off_1}) + ...
+ exp(-beta_mix * {E_nensem(R) + off_nensem})

(see Structure paper reference in intro) beta_mix is analogous to the
standard beta = 1/kT but need not correspond to the temperature at which
simulations are run.

All nodes propagate exactly the same dynamics, but each evaluates only one
energy function, and the forces and energies are subsequently shared at each
time step to calculate the average.

The meaning of the various parts of the command:

ENSEMBLE EXPAvg BETA real [ UNIT integer ] -
OFFSet real_1 real_2 ... real_nensem

BETA: specifies beta_mix
UNIT: specifies a formatted unit to write energies every NPRINT steps
during MD.
OFFSet: specifies offsets off_1 ... off_nensem. This allows the relative
energies of the different force-fields to be tuned, e.g.
to match experimental data

File: Ensemble, Node: 0-K String method, Previous: Force-field Averaging, Up: Top, Next: Test Cases

0-K String method
=====================

The O-K (zero-temperature) String method is fully documented elsewhere» stringm

Top
TESTCASES:
=============================================

To run ensemble tests for architecture "arch", use the following
command in the test directory:
./test.com E arch
in this case the optional fourth command specifying target will be ignored.

This will run four processes for each test case; the
following test cases (all names ending "_ens.inp") will
be run.

c33test:
--------
hqbm_rc3_ens.inp: } Ensemble-averaged restraints
hqbm_rc4_ens.inp: } » hqbm
hqbm_rc8_ens.inp: }

rex_ens.inp: Simple example of replica exchange with different temperatures

c34test:
--------
hexrex_ens.inp: Simple example of replica exchange with different force-fields
rex2_ens.inp: Example of 2D replica exchange with a custom rules for swapping
multi_ens.inp: Exponential averaging of some simple harmonic potentials