ISDB: setting up a Metadynamics Metainference simulation

Aims

The aim of this tutorial is to introduce the users to the ISDB module and in particular to Metadynamics Metainference [18] [19] ensemble determination. We will reproduce the setup of the simulation for a simple system [69] . For a general overview of the problem of ensembles determination please read [20] .

Objectives

Once this tutorial is completed students will be able to:

  • Setup their own PLUMED-ISDB simulation.

Resources

The TARBALL for this project contains the following files:

  • charmm36-eef1sb.ff: the force-field files for gromacs (not needed)
  • system: a folder with reference files for gromacs (not needed)
  • reference-impl: a folder to perform a simple implicit solvent simulation
  • reference-impl-pbmetad: a folder to perform a pbmetad implicit solvent simulation
  • m_and_m: a folder to perform a metadynamics metainference simulation

This tutorial has been tested on a pre-release version of version 2.4.

Introduction

Molecular dynamics simulations are the ideal tool to determine at atomistic resolution the behavior of complex molecules. This great resolution power comes at the cost of approximations that affects the agreement with actual experimental observables. At the same time experimental data alone are generally speaking not enough to determine a structural ensemble due the inverse nature of the problem, that is to go from few observables to many atoms in many different configurations. Furthermore, experimental data are affected by errors of multiple nature, from noise, systematic errors and errors in their atomistic interpretation. Most important experimental data are the result of the averaging over the ensemble of structure so it is not trivial to deconvolve this signal. One possibility is that of employing MD simulations together with experimental data to generate simulations already corrected for the data themselves. With METAINFERENCE this is done on-the-fly by adding an additional energy to the system that takes into account the agreement with the experimental data considering the multiple sources of errors.

Run a reference simulation

The system we use is the EGAAWAASS peptide used in ref. [69] . First of all we will run a simulation in implicit solvent using the EEF1-SB CHARMM36 force field. EEF1-SB includes a correction to the standard backbone torsion potential of CHARMM36, an electrostatic interaction with a distance dependent dielectric constant and a simple gaussian form for the solvation energy. The first two terms are implemented in the force field and using table potentials while the latter is implemented as a collective variable in PLUMED, EEFSOLV .

# this is optional and tell to VIM that this is a PLUMED file
# vim: ft=plumed
# see comments just below this input file
MOLINFO MOLTYPE=protein STRUCTURE=egaawaass.pdb
WHOLEMOLECULES ENTITY0=1-111

# EEF1SB Implicit solvation
protein-h: GROUP NDX_FILE=index.ndx NDX_GROUP=Protein-H
solv: EEFSOLV ATOMS=protein-h NOPBC NL_STRIDE=10 NL_BUFFER=0.1
bias: BIASVALUE ARG=solv

This can be run using gromacs (unfortunately recent versions of gromacs do not support Verlet groups with table potentials, so performances are currently sub-optimal on the gromacs side)

gmx_mpi mdrun -s run.tpr -table table.xvg -tablep table.xvg -plumed plumed-eef1.dat -v

In order to have a converged sampling for this reference ensemble calculation it is useful to setup a Metadynamics calculation. In particular we will use PBMETAD because it is then a natural choice for Metadynamics Metainference later. The following input file is meant to be appended to the former.

# CVs, Psi9, Phi1 are not defined
psi1: TORSION ATOMS=@psi-1 NOPBC
psi2: TORSION ATOMS=@psi-2 NOPBC
psi3: TORSION ATOMS=@psi-3 NOPBC
psi4: TORSION ATOMS=@psi-4 NOPBC
psi5: TORSION ATOMS=@psi-5 NOPBC
psi6: TORSION ATOMS=@psi-6 NOPBC
psi7: TORSION ATOMS=@psi-7 NOPBC
psi8: TORSION ATOMS=@psi-8 NOPBC

phi2: TORSION ATOMS=@phi-2 NOPBC
phi3: TORSION ATOMS=@phi-3 NOPBC
phi4: TORSION ATOMS=@phi-4 NOPBC
phi5: TORSION ATOMS=@phi-5 NOPBC
phi6: TORSION ATOMS=@phi-6 NOPBC
phi7: TORSION ATOMS=@phi-7 NOPBC
phi8: TORSION ATOMS=@phi-8 NOPBC
phi9: TORSION ATOMS=@phi-9 NOPBC

ahc:  ALPHARMSD RESIDUES=all TYPE=OPTIMAL LESS_THAN={RATIONAL R_0=0.12}

# Bulky Trp residue dihedral
dihtrp_cacb: TORSION ATOMS=67,47,49,52 NOPBC
dihtrp_cbcg: TORSION ATOMS=47,49,52,53 NOPBC

protein-ca: GROUP NDX_FILE=index.ndx NDX_GROUP=C-alpha
gyr: GYRATION TYPE=RADIUS ATOMS=protein-ca NOPBC

# PBMetaD
PBMETAD ...
    LABEL=pb
    ARG=phi2,phi3,phi4,phi5,phi6,phi7,phi8,phi9,psi1,psi2,psi3,psi4,psi5,psi6,psi7,psi8,dihtrp_cacb,dihtrp_cbcg,ahc.lessthan
    SIGMA=1000 
    SIGMA_MIN=0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.001
    SIGMA_MAX=0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.2
    ADAPTIVE=DIFF
    HEIGHT=0.5
    PACE=200
    BIASFACTOR=40
    GRID_MIN=-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,0
    GRID_MAX=pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,5
    GRID_WSTRIDE=5000
    WALKERS_MPI
... PBMETAD

PRINT FILE=COLVAR ARG=phi2,phi3,phi4,phi5,phi6,phi7,phi8,phi9,psi1,psi2,psi3,psi4,psi5,psi6,psi7,psi8,dihtrp_cacb,dihtrp_cbcg,ahc.lessthan STRIDE=200
PRINT FILE=ENERGY ARG=bias.bias,pb.bias STRIDE=200

In this case we are running a multiple-replica simulation where the sampling is used to parallelize the Metadynamics time-dependent potential through the use of multiple walkers.

mpiexec -np 14 gmx_mpi mdrun -s topolnew -multi 14 -plumed plumed-eef1-pbmetad.dat -table table.xvg -tablep table.xvg >& log.out &

Metadynamics Metainference

The former simulations should provide a converged (check for this) ensemble for the peptide. As shown in [69] the agreement with the multiple available NMR experimental data is not perfect. In order to generate an ensemble compatible with most of the available experimental data it is possible to include them in the simulation using METAINFERENCE . To do so the forward models for the data sets should be defined in the input file. In this case we have backbone chemical shifts, CS2BACKBONE ; residual dipolar couplings for two bonds, RDC ; and J-couplings for multiple atoms, JCOUPLING. Once the forward models are defined for the data sets, the calculated data together with the corresponding experimental values can be used to calculate the metainference score. The metainference score is additive so it can be split into multiple METAINFERENCE entries. In this case we are using two metainference entries for the two sets of RDCs because these are compared with the experimental data modulo a constant that should be unique each data set. Then we use one metainference for all the jcouplings and another one for the chemical shifts. In this latter case we use a different noise model, i.e. NOISE=MOUTLIERS because the forward model for chemical shifts can result in systematic errors for some of them.

The following input file is meant to be appended to the former files.

# EXPERIMENTAL DATA SECTION

# RDCs (Grzesiek et al.)
# xGAAWAASS
RDC ...
    ADDCOUPLINGS
    GYROM=-72.5388
    SCALE=0.0001
    NOPBC
    ATOMS1=18,19 COUPLING1=-5.4
    ATOMS2=25,26 COUPLING2=-1.26
    ATOMS3=35,36 COUPLING3=-5.22
    ATOMS4=45,46 COUPLING4=-0.91
    ATOMS5=69,70 COUPLING5=2.33
    ATOMS6=79,80 COUPLING6=-2.88
    ATOMS7=89,90 COUPLING7=-8.37
    ATOMS8=100,101 COUPLING8=-3.78
    LABEL=nh
... RDC

# ExAAWAASx
RDC ...
    ADDCOUPLINGS
    GYROM=179.9319
    SCALE=0.0001
    NOPBC
    ATOMS1=5,6 COUPLING1=12.95
    ATOMS2=27,28 COUPLING2=11.5
    ATOMS3=37,38 COUPLING3=21.42
    ATOMS4=47,48 COUPLING4=-9.37
    ATOMS5=71,72 COUPLING5=10.01
    ATOMS6=81,82 COUPLING6=15.01
    ATOMS7=91,92 COUPLING7=15.73
    LABEL=caha
... RDC

# xGxAWxASx
JCOUPLING ...
    ADDCOUPLINGS
    TYPE=HAN
    NOPBC
    ATOMS1=@psi-2 COUPLING1=-0.49
    ATOMS2=@psi-4 COUPLING2=-0.54
    ATOMS3=@psi-5 COUPLING3=-0.53
    ATOMS4=@psi-7 COUPLING4=-0.39
    ATOMS5=@psi-8 COUPLING5=-0.39
    LABEL=jhan
... JCOUPLING

# xxAAWAASS
JCOUPLING ...
    ADDCOUPLINGS
    TYPE=HAHN
    NOPBC
    ATOMS1=@phi-2 COUPLING1=6.05
    ATOMS2=@phi-3 COUPLING2=5.95
    ATOMS3=@phi-4 COUPLING3=6.44
    ATOMS4=@phi-5 COUPLING4=6.53
    ATOMS5=@phi-6 COUPLING5=5.93
    ATOMS6=@phi-7 COUPLING6=6.98
    ATOMS7=@phi-8 COUPLING7=7.16
    LABEL=jhahn
... JCOUPLING

# xxxxWxxxx
JCOUPLING ...
    ADDCOUPLINGS
    TYPE=CCG
    NOPBC
    ATOMS1=67,47,49,52 COUPLING1=1.59
    LABEL=jccg
... JCOUPLING

# xxxxWxxxx
JCOUPLING ...
    ADDCOUPLINGS
    TYPE=NCG
    NOPBC
    ATOMS1=47,49,52,53 COUPLING1=1.21
    LABEL=jncg
... JCOUPLING
 
# Chemical shifts
cs: CS2BACKBONE ATOMS=1-111 NRES=9 DATA=data TEMPLATE=egaawaass.pdb NOPBC

# metainference entries

#RDCS
METAINFERENCE ...
    ARG=(nh\.rdc_.*),pb.bias
    PARARG=(nh\.exp_.*)
    REWEIGHT 
    NOISETYPE=MGAUSS
    OPTSIGMAMEAN=SEM AVERAGING=200
    SCALEDATA SCALE_PRIOR=GAUSSIAN SCALE0=8.0 DSCALE=0.5
    SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1
    WRITE_STRIDE=10000
    LABEL=byrdcnh
... METAINFERENCE

#RDCS
METAINFERENCE ...
    ARG=(caha\.rdc_.*),pb.bias
    PARARG=(caha\.exp_.*)
    REWEIGHT
    NOISETYPE=MGAUSS
    OPTSIGMAMEAN=SEM AVERAGING=200
    SCALEDATA SCALE_PRIOR=GAUSSIAN SCALE0=9.0 DSCALE=0.5
    SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1
    WRITE_STRIDE=10000
    LABEL=byrdccaha
... METAINFERENCE

#JC
METAINFERENCE ...
    ARG=(jhan\.j_.*),(jhahn\.j_.*),(jccg\.j.*),(jncg\.j.*),pb.bias
    PARARG=(jhan\.exp_.*),(jhahn\.exp_.*),(jccg\.exp.*),(jncg\.exp.*)
    REWEIGHT
    NOISETYPE=MGAUSS
    OPTSIGMAMEAN=SEM AVERAGING=200
    SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1
    WRITE_STRIDE=10000
    LABEL=byj
... METAINFERENCE

#CS
METAINFERENCE ...
    ARG=(cs\.ca_.*),(cs\.cb_.*),pb.bias
    PARARG=(cs\.expca.*),(cs\.expcb.*)
    REWEIGHT
    NOISETYPE=MOUTLIERS
    OPTSIGMAMEAN=SEM AVERAGING=200
    SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1
    WRITE_STRIDE=10000
    LABEL=bycs
... METAINFERENCE

# output from METAINFERENCE

PRINT ARG=byrdcnh.*   STRIDE=200 FILE=BAYES.RDC.NH
PRINT ARG=byrdccaha.* STRIDE=200 FILE=BAYES.RDC.CAHA
PRINT ARG=byj.*       STRIDE=200 FILE=BAYES.J
PRINT ARG=bycs.*      STRIDE=200 FILE=BAYES.CS

# the following are usefull for the analysis on-the-fly of the quality of the agreement with the experimentl data
ENSEMBLE ...
    ARG=(nh\.rdc_.*),(caha\.rdc_.*),(jhan\.j_.*),(jhahn\.j_.*),(jccg\.j_.*),(jncg\.j_.*),(cs\..._.*),pb.bias REWEIGHT
    LABEL=ens
... ENSEMBLE

STATS ...
    ARG=(ens\.nh\.rdc_.*) PARARG=(nh\.exp_.*)
    LABEL=nhst
... STATS

STATS ...
    ARG=(ens\.caha\.rdc_.*) PARARG=(caha\.exp_.*)
    LABEL=cahast
... STATS

STATS ...
    ARG=(ens\.cs\..._.*) PARARG=(cs\.exp.*)
    LABEL=csst
... STATS

STATS ...
    ARG=(ens\.jhan\.j_.*) PARARG=(jhan\.exp_.*)
    LABEL=jhanst
... STATS

STATS ...
    ARG=(ens\.jhahn\.j_.*) PARARG=(jhahn\.exp_.*)
    LABEL=jhahnst
... STATS

STATS ...
    ARG=(ens\.jccg\.j.*),(ens\.jccg\.j.*) PARARG=(jccg\.exp_.*),(jccg\.exp_.*)
    SQDEVSUM
    LABEL=jw5ccyst
... STATS

STATS ...
    ARG=(ens\.jncg\.j.*),(ens\.jncg\.j.*) PARARG=(jncg\.exp_.*),(jncg\.exp_.*)
    SQDEVSUM
    LABEL=jw5ncyst
... STATS

#output from STATS
PRINT ARG=nhst.*      STRIDE=2000 FILE=ST.RDC.NH
PRINT ARG=cahast.*    STRIDE=2000 FILE=ST.RDC.CAHA
PRINT ARG=csst.*      STRIDE=2000 FILE=ST.CS
PRINT ARG=jhanst.*,jhahnst.*,jw5ccyst.*,jw5ncyst.* STRIDE=2000 FILE=ST.J

As for the former case we are running a multiple-replica simulation where in addition to multiple-walker metadynamics we are also coupling the replicas through Metainference. The use of multiple-walkers metadynamics is here key in order to have the same bias defined for all the replicas. This allows us to calculate a weighted average of the experimental observables where the weights are defined univocally from the bias [19] .

mpiexec -np 14 gmx_mpi mdrun -s topolnew -multi 14 -plumed plumed-eef1-pbmetad-m_m.dat -table table.xvg -tablep table.xvg >& log.out &