# gnn (c37b2)

Genetic Neural Network

A genetic neural network (GNN) method is provided for efficient determination

of quantitative structure property relationships. See the references given

below for a description of the GNN and its application. Some details specific

to the CHARMM implementation follow.

The GNN keyword must be included in pref.dat for the code to be compiled.

The input and output vectors of the data set are internally scaled to take

values between 0.1 and 0.9. The format of the data file is described in the

examples section.

Steepest descent back-propagation neural network is used to evaluate model

predictive quality. Jackknife cross-validation residual rms errors are

reported if no test data are specified. Only one hidden layer is employed.

Exhaustive enumeration and two genetic algorithm variants, genetic function

approximation (GFA) and evolutionary programming (EP), are available for

selecting models (sets of descriptors). The stochastic reminder method and

elitism are included for GFA reproduction.

* Syntax | Syntax required to invoke GNN

* Description | Description of GNN specific keywords

* Examples | Examples

* References | References

A genetic neural network (GNN) method is provided for efficient determination

of quantitative structure property relationships. See the references given

below for a description of the GNN and its application. Some details specific

to the CHARMM implementation follow.

The GNN keyword must be included in pref.dat for the code to be compiled.

The input and output vectors of the data set are internally scaled to take

values between 0.1 and 0.9. The format of the data file is described in the

examples section.

Steepest descent back-propagation neural network is used to evaluate model

predictive quality. Jackknife cross-validation residual rms errors are

reported if no test data are specified. Only one hidden layer is employed.

Exhaustive enumeration and two genetic algorithm variants, genetic function

approximation (GFA) and evolutionary programming (EP), are available for

selecting models (sets of descriptors). The stochastic reminder method and

elitism are included for GFA reproduction.

* Syntax | Syntax required to invoke GNN

* Description | Description of GNN specific keywords

* Examples | Examples

* References | References

Top

Syntax required to invoke GNN

GNN [ data-spec ] [ nn-spec ] [ ga-spec ]

data-spec ::= [ NDATa 1 ] [ NPROd 0 ] [ NPARa 1 ] [ UNIT -1 ] [ SEED 123 ]

nn-spec ::= [ NDES 1 ] [ NHIDden 2 ] [ NTARg 1 ] [ NSWEep 100 ] [ MU 0.5 ] [ ETA 0.5 ]

ga-spec ::= [ EXHAust ] [ GFA ] [ EP ] [ NPOPu 500 ] [ NGEN 200 ] [ FITNess 5.0 ]

Syntax required to invoke GNN

GNN [ data-spec ] [ nn-spec ] [ ga-spec ]

data-spec ::= [ NDATa 1 ] [ NPROd 0 ] [ NPARa 1 ] [ UNIT -1 ] [ SEED 123 ]

nn-spec ::= [ NDES 1 ] [ NHIDden 2 ] [ NTARg 1 ] [ NSWEep 100 ] [ MU 0.5 ] [ ETA 0.5 ]

ga-spec ::= [ EXHAust ] [ GFA ] [ EP ] [ NPOPu 500 ] [ NGEN 200 ] [ FITNess 5.0 ]

Top

Description of GNN specific keywords

NDATa Number of data points in the training set.

NPROd Number of data points in the test set.

NPARa Number of candidate descriptors.

UNIT Unit number from which data are imported.

SEED Seed for random number generator.

NDES Number of descriptors for the neural network.

NHIDden Number of nodes in the hidden layer.

NTARg Number of target parameters to predict.

NSWEep Number of sweeps training.

MU Momentum constant.

ETA Learning rate.

EXHAust Exhuastive enumeration.

GFA Genetic function approximation.

EP Evolutionary programming.

NPOPu Number of individual models in reproduction pool.

NGEN Number of generations to reproduce.

FITNess Average fitness of models in the reproduction pool before terminating

genetic algorithms. Fitness is defined as the reciprocal of the

residual rms error.

Description of GNN specific keywords

NDATa Number of data points in the training set.

NPROd Number of data points in the test set.

NPARa Number of candidate descriptors.

UNIT Unit number from which data are imported.

SEED Seed for random number generator.

NDES Number of descriptors for the neural network.

NHIDden Number of nodes in the hidden layer.

NTARg Number of target parameters to predict.

NSWEep Number of sweeps training.

MU Momentum constant.

ETA Learning rate.

EXHAust Exhuastive enumeration.

GFA Genetic function approximation.

EP Evolutionary programming.

NPOPu Number of individual models in reproduction pool.

NGEN Number of generations to reproduce.

FITNess Average fitness of models in the reproduction pool before terminating

genetic algorithms. Fitness is defined as the reciprocal of the

residual rms error.

Top

Examples

[ Data File (represented here symbolically) ]

P1(1) P2(1) P3(1) P4(1) P5(1)

P1(2) P2(2) P3(2) P4(2) P5(2)

P1(3) P2(3) P3(3) P4(3) P5(3)

P1(4) P2(4) P3(4) P4(4) P5(4)

P1(5) P2(5) P3(5) P4(5) P5(5)

Note: Descriptors P1, P2 and P3, target parameters P4 and P5, training set

with data points (1), (2), and (3), and test set with data points (4) and (5).

[ Input ]

For example, file gnn.dat has 53 lines and 6 columns.

open read card unit 18 name gnn.dat

1. Exhaustive enumeration + Cross-validation + 1-Descriptor network

gnn ndata 53 nprod 0 npara 5 unit 18 seed 123 -

ndes 1 nhidden 2 ntarg 1 nsweep 100 mu 0.5 eta 0.5 -

exhaust

2. GFA + Test set residual rms error evaluated + 2-Descriptor network

gnn ndata 30 nprod 23 npara 4 unit 18 seed 123 -

ndes 2 nhidden 3 ntarg 2 nsweep 100 mu 0.5 eta 0.5 -

gfa npopu 2 ngen 10 fitness 5.0

Note: ndata + nprod = 53 (number of lines),

npara + ntarg = 6 (number of columns).

Examples

[ Data File (represented here symbolically) ]

P1(1) P2(1) P3(1) P4(1) P5(1)

P1(2) P2(2) P3(2) P4(2) P5(2)

P1(3) P2(3) P3(3) P4(3) P5(3)

P1(4) P2(4) P3(4) P4(4) P5(4)

P1(5) P2(5) P3(5) P4(5) P5(5)

Note: Descriptors P1, P2 and P3, target parameters P4 and P5, training set

with data points (1), (2), and (3), and test set with data points (4) and (5).

[ Input ]

For example, file gnn.dat has 53 lines and 6 columns.

open read card unit 18 name gnn.dat

1. Exhaustive enumeration + Cross-validation + 1-Descriptor network

gnn ndata 53 nprod 0 npara 5 unit 18 seed 123 -

ndes 1 nhidden 2 ntarg 1 nsweep 100 mu 0.5 eta 0.5 -

exhaust

2. GFA + Test set residual rms error evaluated + 2-Descriptor network

gnn ndata 30 nprod 23 npara 4 unit 18 seed 123 -

ndes 2 nhidden 3 ntarg 2 nsweep 100 mu 0.5 eta 0.5 -

gfa npopu 2 ngen 10 fitness 5.0

Note: ndata + nprod = 53 (number of lines),

npara + ntarg = 6 (number of columns).

Top

References

The GNN method was originally introduced in:

Sung-Sau So and Martin Karplus, Evolutionary optimization in quantitative

structure-activity relationship: An application of genetic neural networks,

J. Med. Chem., 39:1521-1530 (1996).

Sung-Sau So and Martin Karplus, Genetic Neural Networks for Quantitative

Structure-Activity Relationships: Improvements and application of

benzodiazepine affinity for benzodiazepine/GABA_A receptors, J. Med. Chem.,

39:5246-5256 (1996).

Jie Hu and Aaron Dinner implemented the version in CHARMM. It differs from

the HIPPO program of So and Karplus primarily in that steepest descents rather

than scaled conjugate gradients optimization is used to train the neural

networks. Performance is comparable for the data in:

Jie Hu, Ao Ma, and Aaron R. Dinner, A two-step nucleotide flipping mechanism

enable kinetic discrimination of DNA lesions by AGT, Proc. Natl. Acad. Sci.

USA, in press (2008).

In addition to the above studies, papers using the CHARMM GNN method should

cite the introduction of the use of the GNN (and, more generally, informatic

methods) for determination of reaction coordinates:

Ao Ma and Aaron R. Dinner, Automatic method for identifying reaction

coordinates in complex systems, J. Phys. Chem. B, 109:6769-6779 (2005).

For a review of the GNN in other contexts, see:

Aaron R. Dinner, Sung-Sau So, and Martin Karplus, Statistical analysis of

protein folding kinetics, Adv. Chem. Phys., 120:1-34 (2002).

For a more detailed discussion of the neural network component used in the

Jure Zupan and Johann Gasteiger, Neural Networks for Chemists: An

Introduction, VCH, New York (1993).

References

The GNN method was originally introduced in:

Sung-Sau So and Martin Karplus, Evolutionary optimization in quantitative

structure-activity relationship: An application of genetic neural networks,

J. Med. Chem., 39:1521-1530 (1996).

Sung-Sau So and Martin Karplus, Genetic Neural Networks for Quantitative

Structure-Activity Relationships: Improvements and application of

benzodiazepine affinity for benzodiazepine/GABA_A receptors, J. Med. Chem.,

39:5246-5256 (1996).

Jie Hu and Aaron Dinner implemented the version in CHARMM. It differs from

the HIPPO program of So and Karplus primarily in that steepest descents rather

than scaled conjugate gradients optimization is used to train the neural

networks. Performance is comparable for the data in:

Jie Hu, Ao Ma, and Aaron R. Dinner, A two-step nucleotide flipping mechanism

enable kinetic discrimination of DNA lesions by AGT, Proc. Natl. Acad. Sci.

USA, in press (2008).

In addition to the above studies, papers using the CHARMM GNN method should

cite the introduction of the use of the GNN (and, more generally, informatic

methods) for determination of reaction coordinates:

Ao Ma and Aaron R. Dinner, Automatic method for identifying reaction

coordinates in complex systems, J. Phys. Chem. B, 109:6769-6779 (2005).

For a review of the GNN in other contexts, see:

Aaron R. Dinner, Sung-Sau So, and Martin Karplus, Statistical analysis of

protein folding kinetics, Adv. Chem. Phys., 120:1-34 (2002).

For a more detailed discussion of the neural network component used in the

Jure Zupan and Johann Gasteiger, Neural Networks for Chemists: An

Introduction, VCH, New York (1993).