cfti (c38b2)

CFTI: conformational energy/free energy calculations

* Constraints | Note on constrained optimization implementation

* CFTINT | Description and syntax of standard conformational
free energy thermodynamic integration

* CFTIM | Description and syntax of multidimensional onformational
free energy thermodynamic integration

Top
Constraints:

Energy minimization with holonomic constraints has been implemented.
There are no special commands for this option.
Charlie Brook's TSM module allows for MD simulations with constrained
values of selected conformational coordinates - distances, atoms,
dihedrals.
This has been expanded to also allow energy minimization using several
algorithms. The method is an alternative to using harmonic restraints
in generating structures of flexible molecules with desired properties,
or generating adiabatic profiles.

To use this option, simply enter the 'TSM' module and give set
of 'FIX' commands to define set of fixed internal coordinates
(» perturb for details). Next specify an energy minimization
(» minmiz ).

Algorithms that work: SD, CONJ, POWE
(ABNR works also, for reasons unclear to me, KK)

Top
CFTI: standard (one-dimensional) conformational thermodynamic integration

Description of method

Method expands the capabilities of the TSM module.
The TSM module employs the Thermodynamic Perturbation (TP) approach
to conformational free energy simulations. The basis of the
calculation is a MD simulation with a constrained value of a
conformational coordinate. With minimal
modifications, the alternative Thermodynamic Integration (TI) method
is added on. In the modified code the user has the option of using
TP only (as previously) or activating TI, in which case the same
simulation and data files are used to give both TP and TI results.

[SYNTAX CFTI]

All commands are parsed by the TSM command parser, so should be
within a 'TSM ... END' block.

CFTI
command activates the thermodynamic integration calculation
the context of use should be the same as for a thermodynamic
perturbation run, i.e. some coordinates should be fixed by
'FIX ,,,', saving data to a disk file should be specified by
'SAVI ...', and one perturbation should be defined by 'MOVE ...'
The derivative dA/dx is calculated for the coordinate defined
in the 'MOVE ...' statement. This coordinate has to be also
fixed with 'FIX ...'; other coordinates may also be fixed if
desired. The formula for dA/dx involves only averaging over
the corresponding derivative of the potential energy U, omitting
the 'Jacobian term' realted to changes in phase space volume:
dA/dx = <dU/dx>
dU/dx = Sum(j=1,3N) (dU/dy_j)(dy_j/dx)
y_j, j=1,...,3N - atomic Cartesian coordinates
Notes:
1) The formatted data file generated by 'SAVI ...' may be read
by both TI postprocessing command (CFTJ) and TP postprocessing
(POST). The SAVI 'NWIN' keyword has meaning only for TP, it can
be set to an arbitrary value if TI only is to be used.
2) For consistency with TP, the 'BY <real>' part of the 'MOVE'
command was retained. The <real> value has meaning for TP only,
it can be set to an arbitrary number if TI only is to be used.

CFTJ [TEMP <real>] [UICP <int>] [CONT <int>]
Command to calculate the conformational free energy derivative
dA/dx = <dU/dx>
as well as the energy-entropy components: d<U>/dx, -TdS/dx
Data is read in from the formatted file generated by the
'SAVI ...' command
TEMP - specifies temperature, needed for energy-entropy components
UICP - specifies unit with data
CONT - defines length of data block for error analysis
e.g. if data file has 1000 entries, 'CONT 100' will
divide data into 10 blocks and calculate the standard
deviation of the mean of the block averages

CFTA [FIRSt <int>] [NUNIt <int>] [BEGIn <int>] [STOP <int>] [SKIP <int>]
[CONT <int>] [TEMP <real>]
Command activates analysis of CFTI-generated trajectory.
Trajectory coordinate file(s) should be in consecutive units
FIRST, NUNI, BEGIN, STOP, SKIP - define trajectory reading
CONT, TEMP - as in CFTJ

Examples of usage : see test cases: cftidist.inp, cftiangl.inp, cftidihe.inp
These test cases also compare the TI to TP results, showing the small size
of the 'Jacobian term'.

Top

CFTM: multidimensional conformational thermodynamic integration

Description of method

This is a new approach. MD simulations are performed with several
conformational coordinates simultaneously constrained to fixed values.
The partial derivatives of the conformational free energy with
respect to all the coordinates in the fixed set are calculated
from this one simulation. The free energy gradient may be used
in different ways to explore conformational free energy surfaces
of flexible molecules.

Method expands the capabilities of the TSM module.
Only TI calculations possible, no corresponding TP analysis
possible.

[SYNTAX CFTM]

All commands are parsed by the TSM command parser, so should be
within a 'TSM ... END' block.

CFTM
command activates the multidimensional TI method
the context of use should be the same as for a thermodynamic
perturbation run, i.e. several coordinates should be fixed by
'FIX ,,,', saving data to a disk file should be specified by
'SAVI ...'. and a perturbation should be defined by a 'MOVE ...'
statement for each of the fixed coordinates.
Only the average of the derivatives of the potential energy U
are calculated, the 'Jacobian term' is ignored - see notes below
and test cases.
dA/dx_k = <dU/dx_k> x_k, k=1,...,m - fixed coordinates
dU/dx_k = Sum(j=1,3N) (dU/dy_j)(dy_j/dx_k)
y_j, j=1,...,3N - atomic Cartesian coordinates
Notes:
1) The formated data file defined by 'SAVI ...' has a different
format under CFTM than under CFTI. This file is only useful
for CFTM post-processing.
2) For consistency with TP, the 'BY <real>' part of the 'MOVE'
command was retained. The <real> value has no meaning in CFTM.
'INTE' keyword has to be specified within the 'MOVE' command.

CFTC [TEMP <real>] [UICP <int>] [CONT <int>]
Command to calculate the conformational free energy derivatives
dA/dx_i = <dU/dx_i>
as well as the energy-entropy components: d<U>/dx_i, -TdS/dx_i
Data is read in from the formatted file generated by the
'SAVI ...' command
TEMP - specifies temperature, needed for energy-entropy components
UICP - specifies unit with data
CONT - defines length of data block for error analysis
e.g. if data file has 1000 entries, 'CONT 100' will
divide data into 10 blocks and calculate the standard
deviation of the mean of the block averages
Output includes all individual partial derivatives, and
optionally their analysis into groups. The derivative with
respect to a path direction is also calculated.

CFTB [FIRSt <int>] [NUNIt <int>] [BEGIn <int>] [STOP <int>] [SKIP <int>]
[CONT <int>] [TEMP <real>]
Command activates analysis of CFTM-generated trajectory.
Trajectory coordinate file(s) should be in consecutive units
FIRST, NUNI, BEGIN, STOP, SKIP - define trajectory reading
CONT, TEMP - as in CFTJ
Output is the free energy gradient with respect to the set
of fixed coordinates, the derivative along a specified direction
(see DIRE) and optionally a group contribution analysis.

CFTS [FIRSt <int>] [NUNIt <int>] [BEGIn <int>] [STOP <int>] [SKIP <int>]
[CONT <int>] [TEMP <real>] [DUNI <int>]
Analogous to CFTB, additionally writes out potential energy and
dU/dx_i to a disk file specified by DUNI.

NCOR NUMB <int>
NUMB specifies the number of internal coordinates involved
(=NICP). Used in calculating the path derivative.

DIRE LAMB <int>
<real, real, ... , real>
The LAMB value specifies number of step (progress along reaction
path). The following line(s) contain NICP real numbers defining
a path vector. The vector will be normalized automatically.
The unit vector will be used to calculate derivatives of dA/dl,
d<U>/dl, -TdS/dl along the path from the gradients.
The real numbers correspond to weights of the fixed coordinates.
Note: the vector components are read in free format

CFTG NGRUp <int>
<int, int, ..., int>
<string,string,...,string>
Define groups for group contribution analysis to free energy
NGRUP is the number of groups.
The following line(s) contain the integer group numbers of the
coordinates (LGRUP(J),J=1,NICP) in free format
After that follow line(s) with group symbols (i.e. tags that
will be used to denote the groups) in (20A4) format
(GSYM(J),J=1,NGRUP)
Example of usage:
The system is a decapeptide, we calculate derivatives with
respect to all phi and psi backbone dihedrals (NICP=18).
In the 18 'MOVE ...' commands we specify the 9 phi first
and the 9 psi at the end. The following will calculate and
print out an aggregate of all phi and all psi contributions
labelled by the tags 'PHI' and 'PSI':

cftg ngrup 2
1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2
PHI PSI
cfts

Examples of usage: see test cases cftmala10.inp, cftmtst1.inp

Checks that 'Jacobian term' is small: cftmtst2.inp, cftmtst3.inp,
cftmtst4.inp, cftmtst5.inp

>NOTE on sign of derivatives:

In both CFTI and CFTM it is possible to obtain a derivative value
with incorrect sign by cleverly manipulating the atom selections in the
'MOVE ...' command. A simple way of checking the sign is to run
a 1-D test case using both TI and TP postprocessing (see test cases
cftidist.inp, cftiangl.inp, cftidihe.inp).
A general rule is to think about how the coordinate
is defined and how motions of fragments influence it.

E.g. for a distance between atoms A and B, the coordinate is the
length of the vector from A to B. Perturbations (TP) involve
actual displacements of A and B along the =vector= from A to B;
Derivative calculations (TI) do not involve actual motions of atoms,
but rather predictions of how atomic positions will vary with
infinitesimal coordinate changes.
Moving B along this coordinate by delta > 0
will increase the coordinate, while moving A by delta will decrease
the coordinate. Alternatively, we can distort the bond by delta by moving
A by -delta/2 and B by + delta/2.
To get correct sign of derivative you have to specify B as the moving part
or specify both B and A, but maintaining that order (B first, A next).
This is illustrated schematically below:

Correct scheme 1:
FIX DIST <spec atom A> <spec atom B>
MOVE DIST <spec atom A> <spec atom B> BY 1.0 INTE -
sele <atom B> end

Correct scheme 2:
FIX DIST <spec atom A> <spec atom B>
MOVE DIST <spec atom A> <spec atom B> BY 1.0 INTE -
sele <atom B> end sele <atom A> end

Both give the same result (I tested this, KK).
See test case cftidist.inp.

The same holds true for a dihedral defined by atoms I-J-K-L.
Mentally divide the molecule into two parts by cutting through
the J-K bond. Atoms before the cut (I, J and all atoms bound to them
except K) for the first part, the rest of the atoms form the second
part. To distort the dihedral, we can either rotate second half by
delta around J-K axis, or rotate first half by -delta/2 and second
half by +delta/2. To get correct derivative either define the second
part as moving or define both parts but in correct order: (second,
first). Here is an example for the alanine dipeptide. The following
defines the atoms (in toph19, see cftmtst1.inp):

1 1 ACE CH3 3.06258 0.64613 1.42088 ALA 1 0.00000
2 1 ACE C 2.33541 -0.68685 1.35313 ALA 1 0.00000
3 1 ACE O 2.01413 -1.29380 2.37725 ALA 1 0.00000
4 2 ALA N 2.07725 -1.18175 0.14371 ALA 2 0.00000
5 2 ALA H 2.45870 -0.76210 -0.65152 ALA 2 0.00000
6 2 ALA CA 1.35635 -2.43045 -0.00242 ALA 2 0.00000
7 2 ALA CB 0.69707 -2.49721 -1.37506 ALA 2 0.00000
8 2 ALA C 2.38192 -3.54475 0.11749 ALA 2 0.00000
9 2 ALA O 3.17467 -3.80389 -0.78914 ALA 2 0.00000
10 3 CBX N 2.41984 -4.12094 1.31507 ALA 3 0.00000
11 3 CBX H 1.92248 -3.68658 2.04150 ALA 3 0.00000
12 3 CBX CA 3.28397 -5.30373 1.59827 ALA 3 0.00000

The following is a correct set-up for a phi-psi gradient calculation
using the single-selection variant:

tsm
fix dihe ala 1 c ala 2 n ala 2 ca ala 2 c toli 1.0e-5
fix dihe ala 2 n ala 2 ca ala 2 c ala 3 n toli 1.0e-5
maxi 100
cftm
move dihe ala 1 c ala 2 n ala 2 ca ala 2 c by 1.0 -
inte sele bynum 6:12 end
move dihe ala 2 n ala 2 ca ala 2 c ala 3 n by 1.0 -
inte sele bynum 8:12 end
end

And here is the correct alternative double-selection variant:

tsm
fix dihe ala 1 c ala 2 n ala 2 ca ala 2 c toli 1.0e-5
fix dihe ala 2 n ala 2 ca ala 2 c ala 3 n toli 1.0e-5
maxi 100
cftm
move dihe ala 1 c ala 2 n ala 2 ca ala 2 c by 1.0 -
inte sele bynum 6:12 end sele bynum 1:5 end
move dihe ala 2 n ala 2 ca ala 2 c ala 3 n by 1.0 -
inte sele bynum 8:12 end sele bynum 1:7 end
end

See test cases cftidihe.inp, cftmala10.inp.

>NOTE on integrators:

With CHARMM c30a2x I have tested LEAP, NOSE and NOSE VVER aproaches,
which worked fine.

The LANGevin integrator LED TO INCORRECT FORCES stored in the formatted
data file (defined with 'SAVI ...'). Thus, post-processing using
the 'CFTA' or 'CFTB' approaches worked fine, as this method re-reads
the trajectory and re-calculates derivatives. The 'CFTJ' and/or 'CFTC' gave
incorrect results! This is probably related to incorrect placement of
the 'CALL DYNICT' and 'CALL DYNICM' commands within the dynamics files
so that the energy gradient DX,DY,DZ does not agree with the coordinates
X,Y,Z. I will look into this at a later date. KK