adumb (c38b1)

Adaptive Umbrella Sampling Module

Setting up of adaptive umbrella potentials. Currently supported types
of umbrella potentials are functions of dihedral angles and functions of the
potential energy of the system (energy sampling). The module also supports
umbrella potentials that are functions of arbitrary reaction coordinates
defined using the RXNCOR commands (» umbrel ).

WARNING: The module is still being developed and some details are likely
to change in future versions.
Please report problems to Christian Bartels at cb@brel.u-strasbg.fr
Please report problems with the interface to the RXNCOR commands
to Justin Spiriti at jspiriti@usf.edu and/or Arjan van der Vaart
at avandervaart@usf.edu.

REFERENCES:
C. Bartels & M. Karplus, J. Comp. Chem. 18 (1997) 1450-
C. Bartels & M. Karplus, J. Phys. Chem. 102 (1998) 865-
M. Schaefer, C. Bartels, & M. Karplus, J. Mol. Biol. (1998)

* Syntax | Syntax of the ADUMB commands
* Function | Purpose of each of the commands
* Examples | Usage examples of the ADUMB module

Top
Syntax

[SYNTAX ADUMB functions]

Syntax:

ADUMb CORR DIST UNIT int SELE...END SELE...END (atom selection x 2)

CORR RMSD COR1
COR2

CORR RMSD SETUp NATOms int NSTRuctures int

CORR RMSD UNT1 1 int UNT2 int (atom selection x 3) -
ORIEnt SYMMetry 4X(atom-spec) FOLD int

ADUMb DIHE NRES int TRIG int POLY int 4X(atom-spec)

ADUMb ENER NRES int TRIG int POLY int
MAXE real MINE real [MAXT real] [MINT real]

ADUMb INIT NSIM int [UPDA int] [EQUI int] [TEMP real]
[AGIN real] [NEXT int] [THRE real]
[UCUN int] [WUNI int] [RUNI int] [FREQ int]
[WCUN int] [RCUN int] [CPFR int] [MAXB real]

ADUMb PROB UCUN int [TEMP real] [PUNI int] [TUNI int]

ADUMb RXNCor NRES int TRIG int POLY int NAME name MIN real MAX real

ADUMb STON

ADUMb STOFf

where: atom-spec ::= { segid resid iupac }
{ resnumber iupac }

Top

0. Introduction

The module provides commands to define degrees of freedom along which
adaptive umbrella potentials are applied in molecular dynamics
simulations. Statistics on the sampling of the degrees of freedom are recorded
during the md simulations and periodically used to update the umbrella
potential such that uniform sampling of the degrees of freedom can be
expected. Currently, dihedral angles and the potential energy are supported
as degrees of freedom.

If several degrees of freedom are defined, multidimensional adaptive
umbrella sampling is performed.

Two sorts of input/output files are used by the module. The "umbrella"
files contain the umbrella potentials that were used in the simulations
together with the statistics of the sampling of the bins during the
simulations. Based on this information the potential of mean force can be
calculated and the umbrella potential expected to lead to uniform sampling
can be determined. The second sort of files contains the values of the
umbrella coordinates (=degree of freedom for adaptive umbrella sampling)
for each time step in which coordinates were saved to the trajectory files.
The umbrella coordinates are normalized to the range 0 to 1, independent of
the degrees of freedom used. From the umbrella coordinates saved, weighting
factors can be calculated which are needed to calculate average properties of
the unbiased system.

The ADUMB DIHE and ADUMB RXNCOR options have been updated to work with domain
decomposition (» umbrel ).

1. ADUMb DIHE

Define a dihedral angle as degree of freedom for adaptive umbrella
sampling. To record the statistics the degree of freedom is partitioned
into NRES bins. The umbrella potentials are represented as a linear combination
of two times TRIG trigonometric functions and polynomial functions of degree
0 to POLY - 1. Repeating the command results in a multidimensional adaptive
umbrella potential.

The coordinates written to the umbrella coordinates file are normalized
to the range 0 to 1 with 0 corresponding to -180 degrees and 1 corresponding
to +180 degrees.

2. ADUMb ENER

Define the potential energy as degree of freedom for adaptive umbrella
sampling. NRES, TRIG and POLY have the same meaning as in ADUMb DIHE.
MINE and MAXE specify the potential energy range: Statistics on the sampling
are recorded in the range MINE-0.5*(MAXE-MINE) to MAXE+0.5*(MAXE-MINE). In
the range outside of MINE to MAXE the umbrella potential is kept constant
to prevent the system from leaving the range in which statistics are recorded.
MINT and MAXT (default values: 273 K and 1000 K, respectively) are minimal
and maximal temperatures to restrict sampling in the relevant temperature
range. To set up a system, get a rough estimate of the potential energy of the
system at the desired TMIN and TMAX (from short unbiased simulations at
TMIN and TMAX). Set EMIN and EMAX to the values determined minus/plus a
small tolerance, respectively.

The coordinates written to the umbrella coordinates file are normalized
to the range 0 to 1 with 0 corresponding to MINE-0.5*(MAXE-MINE) and 1
corresponding to MAXE+0.5*(MAXE-MINE).

3. ADUMb INIT

Defines or redefines the parameters for adaptive umbrella sampling and
initializes the umbrella potential. The umbrella potential is updated
every UPDAte steps. After each update, no statistics are recorded for
EQUI steps. For the remaining UPDA - EQUI steps, statistics on the sampling
of the umbrella coordinates are recorded and stored separately from previous
statistics and together with the umbrella potential active when recording the
statistics. NSIM separate statistics can be kept in memory. If the number of
updates performed in a run exceeds NSIM, the oldest statistics are discarded
to make space for the most recent statistics.

After each update the umbrella potential and the statistics are written
to standard output (the log file). The written table contains, from left
to right, the number of the bin, the number of integration time steps in
which the system was in the bin since the last update, the potential of mean
force calculated with the WHAM equations, the negative of the updated
umbrella potential (potential of mean force modified to restrict sampling if
necessary and fitted to the set of trigonometric and polynomial functions),
the total number of times the bin was visited in the entire simulation, and
the umbrella coordinates of the center of the bin.

The temperature TEMP should be set to the temperature used in
the simulations. It is used to calculate the umbrella potentials from
the sampling statistics and to restrict sampling if potential energy
sampling is performed.

Umbrella coordinates are written to unit UCUN. At each update,
the statistics are written to unit WUNI together with the umbrella potential
active when recording the statistics. Statistics from previous runs can
be read from unit RUNI. The statistics read must be from adaptive umbrella
sampling simulations with the same parameters as the present one, in
particular, the same degrees of freedom have to be used as umbrella
coordinates. If adaptive umbrella sampling of the potential energy is
used, umbrella potentials from runs at different temperatures can
be read by repeating the ADUMb INIT command with RUNI set to the unit
containing the statistics of each of the runs and TEMP set to the temperature
of the run.

To define the umbrella potential of bins for which no statistics
have been acquired so far, the umbrella potential has to be extrapolated.
In the current implementation (might change in future implementations),
the umbrella potential of the bins that were not sampled is set to
the same value (ext-cons). To determine ext-cons, the potential of the bins
that were sampled is linearly extrapolated for NEXT bins, and the maximal
value (max-extrapolated) of the linearly extrapolated potentials is
determined. Then, the minimal value (min-sampled) of the potentials of the
bins that were sampled is determined and ext-cons is set to min-sampled
or max-extrapolated whatever value is smaller.

A few statistics that differ significantly from the rest of the
statistics can be due to problems with the convergence caused by the
extrapolation or due to the occurrence of rare events. In the former case,
outliers should occur only in the first few simulations and it is advantageous
to eliminate them. By default, the module eliminates statistics that
differ from the averaged statistics by THRE times the average deviation. If
one wants to prevent statistics from being eliminated THRE has to be set to
a value larger than NSIM. At each update, the deviations of the statistics
from the averaged statistics is printed to standard output (log file), e.g.,

0 Deviation of simulation 1 : 0.955
0 Deviation of simulation 2 : 0.513E-01
0 Deviation of simulation 3 : 0.787E-01
0 Deviation of simulation 4 : 0.292
0 Deviation of simulation 5 : 0.170
0 Deviation of simulation 6 : 0.201
0 Deviation of simulation 7 : 0.933
0 Deviation of simulation 8 : 0.208
0 Deviation of simulation 9 : 0.270
0 Deviation of simulation 10 : 0.131
0 Deviation of simulation 11 : 0.394
0 Deviation of simulation 12 : 1.52
0 Deviation of simulation 13 : 0.969
0 Deviation of simulation 14 : 0.502
0 Deviation of simulation 15 : 1.47
0 Deviation of simulation 16 : 2.97
-1 Deviation of simulation 17 : 210.
0 Deviation of simulation 18 : 0.695E-01
0 Deviation of simulation 19 : 0.160
0 Deviation of simulation 20 : 0.450

The 0 or -1 on each line indicates whether the statistics of a particular
simulation are used (0) or were discarded (-1) based on the THRE criterion.

For complex systems, there might exist no umbrella potential that
enables the system to diffuse rapidly along the umbrella coordinate. In
such cases it has been found to be advantageous to give a higher weight
to the most recent statistics. This is implemented using the AGINg factor.
For an umbrella potential calculated from n statistics, the i'th statistics
(i=1,2,..,n) are weighted by AGINg**(n-i).
The FREQ keyword specifies the frequency with which the dynamics
trajectories are sampled for compilation of the umbrella potential
statistics. FREQ 2 for example means that every other point along the
trajectory is sampled.
WCUNit and RCUNit specify the units to which the accumulators for the
correlated structural variables are to be written and read, for the purposes
of restarting trajectories. The accumulators, along with the updated
average results, will be written every CPFRequency updates of the umbrella
potential (see also ADUMb CORR). If WCUNit and RCUNit are omitted, no
writing of the accumulator statistics will be done.
A MAXBias option has been added to limit the magnitude of the umbrella
potential. This works by applying the formula

U_bias(q) = -beta^-1 * ln(exp(-beta*F(q)) + exp(-beta*U_max))

where F(q) is the free energy surface and U_max is the maximum bias. This
is primarily intended for use with the roll angle reaction coordinate, where a
cap on the biasing potential is needed to prevent numerical instabilities (see
Spiriti and van der Vaart, JCTC 8, 2145 (2012)).

4. ADUMb PROB

Average properties of the unbiased system can be obtained by weighting
the conformations of an adaptive umbrella sampling run by appropriate
factors. The ADUMb PROB command calculates these weighting factors from
the umbrella coordinates read from unit UCUN and writes them to unit PUNI.
For the command to work the umbrella potentials and statistics from the
run must have been read with the ADUMb INIT command. If the potential
energy was used as umbrella coordinate, the TEMP specifies the temperature
at which properties of the unbiased system should be calculated.

5. ADUMb RXNCor

Use a coordinate that has been previously defined using the RXNCOR DEFIne
and RXNCOR SET commands (» umbrel ) as a reaction
coordinate for adaptive umbrella sampling. NRES, TRIG and POLY have the
same meaning as in ADUMB DIHE. The NAME option specifies the name of the
coordinate to be used. The MIN and MAX options specify the minimum and
maximum values of the coordinate; if the coordinate exceeds the values
specified during the simulation an error will be produced. This command
may be used to perform adaptive umbrella sampling on the roll angle reaction
coordinate (» umbrel )

6. ADUMb STON
ADUMb STOFf
By default statistics on the sampling of the umbrella coordinates are
recorded in each call to the energy routines. The ADUMb STOFf command
prevents that statistics are recorded. This might be useful when doing
a minimization or running a md simulation with an umbrella potential
that should not change during the simulation.

7. ADUMb CORRelations
The CORRelations keyword allows for the running calculation of the
average values of specified structural variables over the course of the
trajectores as a function of the reaction coordinates. It is intended
as a tool for examining correlations between the reaction coordinates
and various other structural variables in the system. It is currently
implemented for interatomic distances and substructure rmsd's. The average
values for the specified variables (distances or rmsd's) are written to
a file (or to standard output) every CPFR times the umbrella potential is
updated, where CPFR is a keyword specified in the UMBR INIT command.

Correlated Distances:
*********************
The UMBR CORR DIST command sets up the calculation of an average inter-
atomic distance, between atoms specified with a double atom selection.
Only one atom may be specified for each atom selection. The UMBR CORR DIST
command must be given once for each interatomic distance to be calcu-
lated. The UNIT keyword is followed by the unit number to which the
results are to be written. If no unit number is specified, the results
for the correlated distances will be written to standard output. If a
unit number is specified for any distance, they must be specified for
all distances. The average distance results will be written every
CPFRequency updates of the umbrella potential (see ADUMb INIT).
Up to 100 distances can be specified.

EXAMPLE:
umbrella corr dist unit 17 sele atom1 end -
sele atom2 end

This will result in the calculation of the running average of the
distance between atom1 and atom2. (The selection of less than or
greater than exactly 2 atoms will result in an error.)

The output is formatted as follows:

Average vals of distance fr 17 to 6 at step 500
2 1 -1.00000000 7.25430918
2 2 -1.00000000 6.89725628
2 3 -1.00000000 6.69046274
2 4 6.38194491 6.41586493
2 5 5.92699253 5.84622204

The first line describes the variable
The first column gives the assigned number of the distance variable.
The second column gives the position of the reaction coordinate
(same as in free energy output). The third column gives the average
value of the distance over the last trajectory. The fourth column gives
the cumulative average over all trajectories. A "-1" value indicates
that the reaction coordinate position has not been visited.

Correlated Substructure RMSD's:
*******************************
The UMBR CORR RMSD commands allow for the calculation of the running
average of the rmsd's, as a function of the reaction coordinates,
for specified parts of the system relative to 2 reference structures.

UMBR CORR RMSD COR1 !saves the current coords as reference structure #1.

UMBR CORR RMSD COR2 !saves the current coords as reference structure #2.

UMBR CORR RMSD SETUp NATOms int NSTRuctures int WCUNit int RCUNit int

This command gives the memory specifications, where NATOms is the
total number of atoms that will be selected for all UMBR CORR RMSD
calculations, and NSTRuctures is the number of sets of substructures
for which RMSD calculations are to be carried out. WCUNit and RCUNit
specify the units to which the accumulators are to be written/read for
the purposes of restarting trajectories. The accumulator values will
be written every CPFRequency updates of the umbrella potential (see also
ADUMb INIT).

The above three commands must be invoked prior to the last set of commands
(UMBR CORR RMSD SUBStructure), which specifies the atoms involved in
the rmsd calculations:

UMBR CORR RMSD SUBStructure UNT1 1 int UNT2 int (atom selection x 3) -
ORIEnt SYMMetry 4X(atom-spec) FOLD int

UNT1 and UNT2 are the unit numbers for the output (average rmsd's
relative to reference structures 1 and 2, respectively). If no unit
numbers are specified, the results are written to standard output. Unit
numbers must be specified for either all UMBR CORR RMSD SUBS commands or
none of them (i.e. either all results are written to files or all are
written to standard output).
The three atom selections specify the following (in order):
1) the atoms whose rmsd is to be calculated
2) the atoms relative to which a reorientation of the
system is to take place prior to calculation of the rmsd
and 3) the atoms involved in a symmetry operation that will be
done prior to the calculation of the rmsd.

The ORIEnt keyword invokes a reorientation of the system.
The SYMMetry keyword invokes the symmetry operation, which is a dihedral
angle rotation specified by 4 atoms. The FOLD keyword specifies the
multiplicity of the symmetry. The final rmsd will be the lowest one
calculated for any of the symmetric positions. (Only 1 symmetry oper-
ation is allowed per rmsd calculation, currently). Since the positions
of atoms in this (3rd) selection will be initialized and rebuilt according
to the internal coordinates of the initialized fragment and the cartesian
coordinates of the rest of the structure, care must be taken in the
selection so as to ensure the initialized fragment is not too large.
If the ORIEnt keyword is specified and only one atom selection is given,
the reorientation (as well as the rmsd calculation) will be done relative
to this selection. If only one or two atom selections are given, no
symmetry operation will occur (irrespective of the presence or absence of
reorientation).
The UMBR CORR RMSD SUBStructure command must be invoked once for each set of
rmsd substructure calculations to be done during the dynamics.

Example
UMBRELLA CORR RMSD SUBS UNT1 27 UNT2 28 SELE (phe residue) END -
SELE (phe backbone) END -
SELE (phe sidechain) END -
ORIE SYMM 2 CA 2 CB 2 CG 2 CD1 FOLD 2

This command specifies that the rmsd will be calculated relative to
the "phe residue" atoms. Reorientation will be done relative to the
"phe backbone" atoms prior to the rmsd calculation. A 2-fold symmetry
operation will be carried out involving the "phe sidechain" atoms and
a rotation about the dihedral defined by 2 CA 2 CB 2 CG 2 CD1. The
rmsd relative to reference structure 1 will be written to unit 27 and
that relative to reference structure 2 will be written to unit 28.

Example
UMBRELLA CORR RMSD SUBS UNT1 27 UNT2 28 SELE (phe residue) END -
SELE (phe backbone) END -
ORIE

This will result in the same calculation as above, absent the symmetry
operation.

The output is formatted as follows:
Average RMSDs from Ref #1 for set 35 at step 500000
35 1 -1.00000000 1.11056063
35 2 -1.00000000 1.09866706
35 3 -1.00000000 1.05065449
35 4 -1.00000000 1.09327534
35 5 -1.00000000 1.07153876
35 6 -1.00000000 -1.00000000
35 7 -1.00000000 -1.00000000

The first column gives the number of the substructure (numbered
serially from 1 with each UMBR CORR RMSD command). The second column
gives the reaction coordinate gridpoint. The third column gives
the average rmsd over the last trajectory. The fourth column gives
the average rmsd over all trajectories.

NOTE that specification of any correlated variables must be followed
by an UMBRella INIT command, prior to the start of dynamics.
In addition, the specification of any correlated variables will reset
the umbrella potential, causing the previously accumulated statistics
to be discarded. This is to ensure exact correspondence between the
statistical ensembles that are sampled for the free energy surface
and the structural variables.

In adaptive umbrella sampling without structural correlations,
trajectories (sampling runs between updates) that deviate more than a
specified tolerance from the average trajectory are removed from the
statistics. This filtering feature is disabled when structural
correlations are invoked, due to the large memory requirements.

The "aging" option, whereby older trajectories may be weighted by
the user less heavily than more recent trajectories, is preserved
for the free energy surfaces when structural correlations are invoked,
but the feature is not implemented for the structural correlations,
themselves, again because of large memory requirements. Hence aging
the trajectories may result in free energy surfaces and structural
correlations that are derived from different statistical distributions.

Top
Examples

This examples are meant to be a partial guide in setting up
an input file for ADUMB. There are three test files, adumb-phichi.inp,
adumb-enum.inp and ace2.inp.

Example (1)
-----------
Set up and run an adaptive umbrella sampling simulation using two dihedral
angles as umbrella coordinates.

! define the phi and chi1 dihedral angle as the two umbrella coordinates
umbrella dihe nresol 36 trig 6 poly 1 pept 1 N pept 1 CA pept 1 CB pept 1 OG1
umbrella dihe nresol 36 trig 6 poly 1 pept 1 CY pept 1 N pept 1 CA pept 1 C

umbrella init nsim 100 update 10000 equi 1000 thresh 10 temp 300 -
ucun 10 wuni 11

! perform adaptive umbrella sampling md simulation
dynamics nose tref 300 qref 20 start -
nstep 20000 timestep 0.001 -
ihbfrq 0 inbfrq 10 ilbfrq 5 -
iseed 12 -
nprint 1000 iprfreq 1000 -
isvfrq 1000 iunwrite -1 iunread -1 -
wmin 1.2

Example(2)
----------
Set up and run an adaptive umbrella sampling simulation using the potential
energy as umbrella coordinate (=energy sampling, multicanonical simulation,
entropic sampling).

! set up umbrella; the range of relevant potential energies is assumed to
! extend form -50 kcal/mol to 100 kcal/mol.
umbrella ener nresol 200 trig 20 poly 5 mine -50 maxe 100.0 mint 280 maxt 2000

open write formatted unit 9 name @9enum.umb
open write formatted unit 10 name @9enum.uco
open write unformatted unit 11 name @9enum.cor

umbrella init nsim 100 update 10000 equi 1000 temp 1000 thres 100 -
wuni 9 ucun 10

! energy sampling simulation
dynamics langevin start -
nstep 50000 timestep 0.001 -
inbfrq 10 ilbfrq 10 rbuffer 0.0 tbath 1000 -
iseed 12 -
nprint 1000 iprfreq 1000 -
isvfrq 1000 iunwrite -1 iunread -1 -
nsavc 100 iuncrd 11 -
wmin 1.2

Example(3)
----------
Determine the weighting factors to calculate properties of the
unbiased system.

! define the umbrella coordinates
umbrella ener nresol 200 trig 20 poly 5 mine -50 maxe 100.0 mint 280 maxt 2000

open read formatted unit 10 name ../scr/@n.umb
umbrella init nsim 100 update 10000 equi 1000 runi 10 temp 1000 thres 200

! translate umbrella coordinates into probability factors at 300K
open read formatted unit 11 name ../scr/@n.uco
open write formatted unit 12 name ../scr/@nT300K.pfa

umbrella prob ucun 11 puni 12 temp 300

! translate umbrella coordinates into probability factors at 1000K
open read formatted unit 11 name ../scr/@n.uco
open write formatted unit 12 name ../scr/@nT1000K.pfa

umbrella prob ucun 11 puni 12 temp 1000

Example(4)
----------
Set up and run an adaptive umbrella sampling simulation using an interatomic
distance as a reaction coordinate.