# dcor (c48b2)

Distance Correlation (DCOR)

Distance correlation coefficient is a very useful and elegant alternative to the standard measures

of correlation and is based on several deep and non-trivial theoretical calculations developed by

SzÃ©kely, Rizzo and Bakirov [1]. The main result is that a single, simple statistic DCOR(X,Y) can

be used to assess whether two random vectors X and Y, of possibly different respective dimensions,

are dependent, linearly or non-linearly, based on an independent and identically distributed

(i.i.d.) sample.

* Commands | Invoking DCOR

* Background | An introduction to distance correlation

* References | Relevant citations

* Example | Outline of an example input script

Distance correlation coefficient is a very useful and elegant alternative to the standard measures

of correlation and is based on several deep and non-trivial theoretical calculations developed by

SzÃ©kely, Rizzo and Bakirov [1]. The main result is that a single, simple statistic DCOR(X,Y) can

be used to assess whether two random vectors X and Y, of possibly different respective dimensions,

are dependent, linearly or non-linearly, based on an independent and identically distributed

(i.i.d.) sample.

* Commands | Invoking DCOR

* Background | An introduction to distance correlation

* References | Relevant citations

* Example | Outline of an example input script

Top

DCOR Commands

DCOR can be invoked from CORREL using the keyword DCOR to calculate dependence between two time

series, which can be of different respective dimensions. Time series can contain any variable. The

only requirement is both the time series should be of equal length.

setup trajectory file

correl maxt ... maxa ... maxs â€¦

setup first time series

setup second time series

traj trajectory sepcifications

dcor time-series1 time-series2

end

DCOR can also be invoked from CORMAN as a part of â€œCOORdinate COVAarianceâ€, using the keyword DCOR

(DCOV), to calculate distance correlation (covariance) between positional fluctuation of two

selection of atoms.

COORdinates COVAriance traj-spec 2x(atom_selection) [UNIT_for_output int] -

[RESIdue_average_nsets integer] [MATRix] -

[ENTRopy [TEMP <real>] [DIAG] [RESI] [SCHL] ]-

[DCOR] [DCOV]

If DCOR or DCOV has been requested with ENTRopy, ENTRopy command will be ignored. If both DCOR and

DCOV have been requested, then DCOR will be ignored.

Parallel:

When using CORREL parallelization in calculating distance correlations can be achieved by submitting

multiple serial jobs. Using COORdinate COVAriance distance correlations between positional

fluctuations of multiple atoms can be calculated using parallel version of CHARMM.

Benchmark for 352x352 atoms, 500 steps dynamics, using 32 core AMD box:

CPUs DCOR (charmm c40)

time (sec) speedup efficiency

1 1443.0 1.00 100%

2 750.0 1.92 96%

4 366.6 3.94 98%

8 189.6 7.61 95%

16 93.9 15.37 96%

For better parallelization of DCOR using COORdinate COVAriance, select the larger number of atoms using

the second selection. Time requirement for DCOR calculation between two time series increases as N*N

where N is the length of the time series.

DCOR Commands

DCOR can be invoked from CORREL using the keyword DCOR to calculate dependence between two time

series, which can be of different respective dimensions. Time series can contain any variable. The

only requirement is both the time series should be of equal length.

setup trajectory file

correl maxt ... maxa ... maxs â€¦

setup first time series

setup second time series

traj trajectory sepcifications

dcor time-series1 time-series2

end

DCOR can also be invoked from CORMAN as a part of â€œCOORdinate COVAarianceâ€, using the keyword DCOR

(DCOV), to calculate distance correlation (covariance) between positional fluctuation of two

selection of atoms.

COORdinates COVAriance traj-spec 2x(atom_selection) [UNIT_for_output int] -

[RESIdue_average_nsets integer] [MATRix] -

[ENTRopy [TEMP <real>] [DIAG] [RESI] [SCHL] ]-

[DCOR] [DCOV]

If DCOR or DCOV has been requested with ENTRopy, ENTRopy command will be ignored. If both DCOR and

DCOV have been requested, then DCOR will be ignored.

Parallel:

When using CORREL parallelization in calculating distance correlations can be achieved by submitting

multiple serial jobs. Using COORdinate COVAriance distance correlations between positional

fluctuations of multiple atoms can be calculated using parallel version of CHARMM.

Benchmark for 352x352 atoms, 500 steps dynamics, using 32 core AMD box:

CPUs DCOR (charmm c40)

time (sec) speedup efficiency

1 1443.0 1.00 100%

2 750.0 1.92 96%

4 366.6 3.94 98%

8 189.6 7.61 95%

16 93.9 15.37 96%

For better parallelization of DCOR using COORdinate COVAriance, select the larger number of atoms using

the second selection. Time requirement for DCOR calculation between two time series increases as N*N

where N is the length of the time series.

Top

Introduction to Distance Correlation

Among Pearsonâ€™s correlation coefficient (PCC), a generalized correlation coefficient (GCC) 16 and

distance correlation coefficient (DCOR), DCOR is the most appropriate parameter to find association

in atomic motions because it is least sensitive to angular dependence while reflecting variability in

covariance. [2] Calculation of DCOR between two vector series is straightforward to implement. Let {A}

and {B} be two vector series with m entries each and the ith entry in {A} is denoted by A_i . To

calculate distance covariance between {A} and {B} the following five steps are needed.

1) Calculate the m x m matrix, a, from {A}, where a_ij is the Euclidean distance between the ith

and jth entries of {A}: a_ij = a_ji =| A i - A j |

2) Average the rows of a: a_i. = 1/m sum_j(a_ij)

3) Average the columns of a: a_.j = 1/m sum_i(a_ij)

4) Average all elements of a: a_.. = 1/(m*m) sum_ij(a_ij)

5) Build the m x m matrix alpha from a where alpha_ij = a_ij - a_i. - a_. j + a_..

Then the distance covariance is

DCOV(A, B) â‰¡ sqrt(1/(m*m) sum_ij(alpha_ij * beta_ij))

where beta_ij is defined similarly from B.

The distance correlation coefficient, DCOR, is defined as

DCOR = DCOV(A,B)/sqrt(DCOV(A,A) * DCOV (B,B))

DCOR was found to capture both linear and non-linear correlation between positional vectors [2] and

was able to reveal long-distance concerted motions in a protein that was not revealed by PCC or GCC. [2]

Introduction to Distance Correlation

Among Pearsonâ€™s correlation coefficient (PCC), a generalized correlation coefficient (GCC) 16 and

distance correlation coefficient (DCOR), DCOR is the most appropriate parameter to find association

in atomic motions because it is least sensitive to angular dependence while reflecting variability in

covariance. [2] Calculation of DCOR between two vector series is straightforward to implement. Let {A}

and {B} be two vector series with m entries each and the ith entry in {A} is denoted by A_i . To

calculate distance covariance between {A} and {B} the following five steps are needed.

1) Calculate the m x m matrix, a, from {A}, where a_ij is the Euclidean distance between the ith

and jth entries of {A}: a_ij = a_ji =| A i - A j |

2) Average the rows of a: a_i. = 1/m sum_j(a_ij)

3) Average the columns of a: a_.j = 1/m sum_i(a_ij)

4) Average all elements of a: a_.. = 1/(m*m) sum_ij(a_ij)

5) Build the m x m matrix alpha from a where alpha_ij = a_ij - a_i. - a_. j + a_..

Then the distance covariance is

DCOV(A, B) â‰¡ sqrt(1/(m*m) sum_ij(alpha_ij * beta_ij))

where beta_ij is defined similarly from B.

The distance correlation coefficient, DCOR, is defined as

DCOR = DCOV(A,B)/sqrt(DCOV(A,A) * DCOV (B,B))

DCOR was found to capture both linear and non-linear correlation between positional vectors [2] and

was able to reveal long-distance concerted motions in a protein that was not revealed by PCC or GCC. [2]

Top

References

[1] Measuring and testing dependence by correlation of distances

GJ SzÃ©kely, ML Rizzo, NK Bakirov

The Annals of Statistics 35.6 (2007): 2769-2794.

[2] Detection of Long-Range Concerted Motions in Protein by a Distance Covariance

Amitava Roy and Carol Beth Post

J. Chem. Theory Comput., 2012, 8 (9), pp 3009â€“3014

References

[1] Measuring and testing dependence by correlation of distances

GJ SzÃ©kely, ML Rizzo, NK Bakirov

The Annals of Statistics 35.6 (2007): 2769-2794.

[2] Detection of Long-Range Concerted Motions in Protein by a Distance Covariance

Amitava Roy and Carol Beth Post

J. Chem. Theory Comput., 2012, 8 (9), pp 3009â€“3014

Top

Input File

1) CORREL input for DCOR between positional fluctuation between 2 CA atoms:

open read file unit 31 name traj1_file

open read file unit 32 name traj2_file

correl maxt 5000 maxa 4 maxs 6

enter a atom XYZ sele ires 1 .and. type CA end

enter b atom XYZ sele ires 20 .and. type CA end

traj firstu 31 nunit 2

dcor a b

end

Output:

CORREL> dcor a b

DCOR> VAR1 = 19.868 VAR2 = 18.500 COVAR = 14.777 CORR = 0.771

CORREL> end

VAR1,VAR2 â€“ distance variances

COVAR â€“ distance covariance between time series â€œaâ€ and â€œbâ€

CORR â€“ distance correlation between time series â€œaâ€ and â€œbâ€

2) COORdinate COVAriance input for DCOR between positional fluctuations between all CA atoms:

open read file unit 31 name traj1_file

open read file unit 32 name traj2_file

open write card unit 41 name matrix_file

coor cova firstu 31 nunit 2 sele type CA end sele type CA end unit 41 dcor

Output options are identical with COOR COVA output options.

Input File

1) CORREL input for DCOR between positional fluctuation between 2 CA atoms:

open read file unit 31 name traj1_file

open read file unit 32 name traj2_file

correl maxt 5000 maxa 4 maxs 6

enter a atom XYZ sele ires 1 .and. type CA end

enter b atom XYZ sele ires 20 .and. type CA end

traj firstu 31 nunit 2

dcor a b

end

Output:

CORREL> dcor a b

DCOR> VAR1 = 19.868 VAR2 = 18.500 COVAR = 14.777 CORR = 0.771

CORREL> end

VAR1,VAR2 â€“ distance variances

COVAR â€“ distance covariance between time series â€œaâ€ and â€œbâ€

CORR â€“ distance correlation between time series â€œaâ€ and â€œbâ€

2) COORdinate COVAriance input for DCOR between positional fluctuations between all CA atoms:

open read file unit 31 name traj1_file

open read file unit 32 name traj2_file

open write card unit 41 name matrix_file

coor cova firstu 31 nunit 2 sele type CA end sele type CA end unit 41 dcor

Output options are identical with COOR COVA output options.