The aim of this tutorial is to introduce the users to the ISDB module and in particular to Metadynamics Metainference [18] [19] ensemble determination. We will reproduce the setup of the simulation for a simple system [69] . For a general overview of the problem of ensembles determination please read [20] .
Once this tutorial is completed students will be able to:
The TARBALL for this project contains the following files:
This tutorial has been tested on a pre-release version of version 2.4.
Molecular dynamics simulations are the ideal tool to determine at atomistic resolution the behavior of complex molecules. This great resolution power comes at the cost of approximations that affects the agreement with actual experimental observables. At the same time experimental data alone are generally speaking not enough to determine a structural ensemble due the inverse nature of the problem, that is to go from few observables to many atoms in many different configurations. Furthermore, experimental data are affected by errors of multiple nature, from noise, systematic errors and errors in their atomistic interpretation. Most important experimental data are the result of the averaging over the ensemble of structure so it is not trivial to deconvolve this signal. One possibility is that of employing MD simulations together with experimental data to generate simulations already corrected for the data themselves. With METAINFERENCE this is done on-the-fly by adding an additional energy to the system that takes into account the agreement with the experimental data considering the multiple sources of errors.
The system we use is the EGAAWAASS peptide used in ref. [69] . First of all we will run a simulation in implicit solvent using the EEF1-SB CHARMM36 force field. EEF1-SB includes a correction to the standard backbone torsion potential of CHARMM36, an electrostatic interaction with a distance dependent dielectric constant and a simple gaussian form for the solvation energy. The first two terms are implemented in the force field and using table potentials while the latter is implemented as a collective variable in PLUMED, EEFSOLV .
# this is optional and tell to VIM that this is a PLUMED file # vim: ft=plumed # see comments just below this input file MOLINFO MOLTYPE=protein STRUCTURE=egaawaass.pdb WHOLEMOLECULES ENTITY0=1-111 # EEF1SB Implicit solvation protein-h: GROUP NDX_FILE=index.ndx NDX_GROUP=Protein-H solv: EEFSOLV ATOMS=protein-h NOPBC NL_STRIDE=10 NL_BUFFER=0.1 bias: BIASVALUE ARG=solv
This can be run using gromacs (unfortunately recent versions of gromacs do not support Verlet groups with table potentials, so performances are currently sub-optimal on the gromacs side)
gmx_mpi mdrun -s run.tpr -table table.xvg -tablep table.xvg -plumed plumed-eef1.dat -v
In order to have a converged sampling for this reference ensemble calculation it is useful to setup a Metadynamics calculation. In particular we will use PBMETAD because it is then a natural choice for Metadynamics Metainference later. The following input file is meant to be appended to the former.
# CVs, Psi9, Phi1 are not defined psi1: TORSION ATOMS=@psi-1 NOPBC psi2: TORSION ATOMS=@psi-2 NOPBC psi3: TORSION ATOMS=@psi-3 NOPBC psi4: TORSION ATOMS=@psi-4 NOPBC psi5: TORSION ATOMS=@psi-5 NOPBC psi6: TORSION ATOMS=@psi-6 NOPBC psi7: TORSION ATOMS=@psi-7 NOPBC psi8: TORSION ATOMS=@psi-8 NOPBC phi2: TORSION ATOMS=@phi-2 NOPBC phi3: TORSION ATOMS=@phi-3 NOPBC phi4: TORSION ATOMS=@phi-4 NOPBC phi5: TORSION ATOMS=@phi-5 NOPBC phi6: TORSION ATOMS=@phi-6 NOPBC phi7: TORSION ATOMS=@phi-7 NOPBC phi8: TORSION ATOMS=@phi-8 NOPBC phi9: TORSION ATOMS=@phi-9 NOPBC ahc: ALPHARMSD RESIDUES=all TYPE=OPTIMAL LESS_THAN={RATIONAL R_0=0.12} # Bulky Trp residue dihedral dihtrp_cacb: TORSION ATOMS=67,47,49,52 NOPBC dihtrp_cbcg: TORSION ATOMS=47,49,52,53 NOPBC protein-ca: GROUP NDX_FILE=index.ndx NDX_GROUP=C-alpha gyr: GYRATION TYPE=RADIUS ATOMS=protein-ca NOPBC # PBMetaD PBMETAD ... LABEL=pb ARG=phi2,phi3,phi4,phi5,phi6,phi7,phi8,phi9,psi1,psi2,psi3,psi4,psi5,psi6,psi7,psi8,dihtrp_cacb,dihtrp_cbcg,ahc.lessthan SIGMA=1000 SIGMA_MIN=0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.001 SIGMA_MAX=0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.2 ADAPTIVE=DIFF HEIGHT=0.5 PACE=200 BIASFACTOR=40 GRID_MIN=-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,-pi,0 GRID_MAX=pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,pi,5 GRID_WSTRIDE=5000 WALKERS_MPI ... PBMETAD PRINT FILE=COLVAR ARG=phi2,phi3,phi4,phi5,phi6,phi7,phi8,phi9,psi1,psi2,psi3,psi4,psi5,psi6,psi7,psi8,dihtrp_cacb,dihtrp_cbcg,ahc.lessthan STRIDE=200 PRINT FILE=ENERGY ARG=bias.bias,pb.bias STRIDE=200
In this case we are running a multiple-replica simulation where the sampling is used to parallelize the Metadynamics time-dependent potential through the use of multiple walkers.
mpiexec -np 14 gmx_mpi mdrun -s topolnew -multi 14 -plumed plumed-eef1-pbmetad.dat -table table.xvg -tablep table.xvg >& log.out &
The former simulations should provide a converged (check for this) ensemble for the peptide. As shown in [69] the agreement with the multiple available NMR experimental data is not perfect. In order to generate an ensemble compatible with most of the available experimental data it is possible to include them in the simulation using METAINFERENCE . To do so the forward models for the data sets should be defined in the input file. In this case we have backbone chemical shifts, CS2BACKBONE ; residual dipolar couplings for two bonds, RDC ; and J-couplings for multiple atoms, JCOUPLING. Once the forward models are defined for the data sets, the calculated data together with the corresponding experimental values can be used to calculate the metainference score. The metainference score is additive so it can be split into multiple METAINFERENCE entries. In this case we are using two metainference entries for the two sets of RDCs because these are compared with the experimental data modulo a constant that should be unique each data set. Then we use one metainference for all the jcouplings and another one for the chemical shifts. In this latter case we use a different noise model, i.e. NOISE=MOUTLIERS because the forward model for chemical shifts can result in systematic errors for some of them.
The following input file is meant to be appended to the former files.
# EXPERIMENTAL DATA SECTION # RDCs (Grzesiek et al.) # xGAAWAASS RDC ... ADDCOUPLINGS GYROM=-72.5388 SCALE=0.0001 NOPBC ATOMS1=18,19 COUPLING1=-5.4 ATOMS2=25,26 COUPLING2=-1.26 ATOMS3=35,36 COUPLING3=-5.22 ATOMS4=45,46 COUPLING4=-0.91 ATOMS5=69,70 COUPLING5=2.33 ATOMS6=79,80 COUPLING6=-2.88 ATOMS7=89,90 COUPLING7=-8.37 ATOMS8=100,101 COUPLING8=-3.78 LABEL=nh ... RDC # ExAAWAASx RDC ... ADDCOUPLINGS GYROM=179.9319 SCALE=0.0001 NOPBC ATOMS1=5,6 COUPLING1=12.95 ATOMS2=27,28 COUPLING2=11.5 ATOMS3=37,38 COUPLING3=21.42 ATOMS4=47,48 COUPLING4=-9.37 ATOMS5=71,72 COUPLING5=10.01 ATOMS6=81,82 COUPLING6=15.01 ATOMS7=91,92 COUPLING7=15.73 LABEL=caha ... RDC # xGxAWxASx JCOUPLING ... ADDCOUPLINGS TYPE=HAN NOPBC ATOMS1=@psi-2 COUPLING1=-0.49 ATOMS2=@psi-4 COUPLING2=-0.54 ATOMS3=@psi-5 COUPLING3=-0.53 ATOMS4=@psi-7 COUPLING4=-0.39 ATOMS5=@psi-8 COUPLING5=-0.39 LABEL=jhan ... JCOUPLING # xxAAWAASS JCOUPLING ... ADDCOUPLINGS TYPE=HAHN NOPBC ATOMS1=@phi-2 COUPLING1=6.05 ATOMS2=@phi-3 COUPLING2=5.95 ATOMS3=@phi-4 COUPLING3=6.44 ATOMS4=@phi-5 COUPLING4=6.53 ATOMS5=@phi-6 COUPLING5=5.93 ATOMS6=@phi-7 COUPLING6=6.98 ATOMS7=@phi-8 COUPLING7=7.16 LABEL=jhahn ... JCOUPLING # xxxxWxxxx JCOUPLING ... ADDCOUPLINGS TYPE=CCG NOPBC ATOMS1=67,47,49,52 COUPLING1=1.59 LABEL=jccg ... JCOUPLING # xxxxWxxxx JCOUPLING ... ADDCOUPLINGS TYPE=NCG NOPBC ATOMS1=47,49,52,53 COUPLING1=1.21 LABEL=jncg ... JCOUPLING # Chemical shifts cs: CS2BACKBONE ATOMS=1-111 NRES=9 DATA=data TEMPLATE=egaawaass.pdb NOPBC # metainference entries #RDCS METAINFERENCE ... ARG=(nh\.rdc_.*),pb.bias PARARG=(nh\.exp_.*) REWEIGHT NOISETYPE=MGAUSS OPTSIGMAMEAN=SEM AVERAGING=200 SCALEDATA SCALE_PRIOR=GAUSSIAN SCALE0=8.0 DSCALE=0.5 SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1 WRITE_STRIDE=10000 LABEL=byrdcnh ... METAINFERENCE #RDCS METAINFERENCE ... ARG=(caha\.rdc_.*),pb.bias PARARG=(caha\.exp_.*) REWEIGHT NOISETYPE=MGAUSS OPTSIGMAMEAN=SEM AVERAGING=200 SCALEDATA SCALE_PRIOR=GAUSSIAN SCALE0=9.0 DSCALE=0.5 SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1 WRITE_STRIDE=10000 LABEL=byrdccaha ... METAINFERENCE #JC METAINFERENCE ... ARG=(jhan\.j_.*),(jhahn\.j_.*),(jccg\.j.*),(jncg\.j.*),pb.bias PARARG=(jhan\.exp_.*),(jhahn\.exp_.*),(jccg\.exp.*),(jncg\.exp.*) REWEIGHT NOISETYPE=MGAUSS OPTSIGMAMEAN=SEM AVERAGING=200 SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1 WRITE_STRIDE=10000 LABEL=byj ... METAINFERENCE #CS METAINFERENCE ... ARG=(cs\.ca_.*),(cs\.cb_.*),pb.bias PARARG=(cs\.expca.*),(cs\.expcb.*) REWEIGHT NOISETYPE=MOUTLIERS OPTSIGMAMEAN=SEM AVERAGING=200 SIGMA0=5.0 SIGMA_MIN=0.0001 SIGMA_MAX=15.0 DSIGMA=0.1 WRITE_STRIDE=10000 LABEL=bycs ... METAINFERENCE # output from METAINFERENCE PRINT ARG=byrdcnh.* STRIDE=200 FILE=BAYES.RDC.NH PRINT ARG=byrdccaha.* STRIDE=200 FILE=BAYES.RDC.CAHA PRINT ARG=byj.* STRIDE=200 FILE=BAYES.J PRINT ARG=bycs.* STRIDE=200 FILE=BAYES.CS # the following are usefull for the analysis on-the-fly of the quality of the agreement with the experimentl data ENSEMBLE ... ARG=(nh\.rdc_.*),(caha\.rdc_.*),(jhan\.j_.*),(jhahn\.j_.*),(jccg\.j_.*),(jncg\.j_.*),(cs\..._.*),pb.bias REWEIGHT LABEL=ens ... ENSEMBLE STATS ... ARG=(ens\.nh\.rdc_.*) PARARG=(nh\.exp_.*) LABEL=nhst ... STATS STATS ... ARG=(ens\.caha\.rdc_.*) PARARG=(caha\.exp_.*) LABEL=cahast ... STATS STATS ... ARG=(ens\.cs\..._.*) PARARG=(cs\.exp.*) LABEL=csst ... STATS STATS ... ARG=(ens\.jhan\.j_.*) PARARG=(jhan\.exp_.*) LABEL=jhanst ... STATS STATS ... ARG=(ens\.jhahn\.j_.*) PARARG=(jhahn\.exp_.*) LABEL=jhahnst ... STATS STATS ... ARG=(ens\.jccg\.j.*),(ens\.jccg\.j.*) PARARG=(jccg\.exp_.*),(jccg\.exp_.*) SQDEVSUM LABEL=jw5ccyst ... STATS STATS ... ARG=(ens\.jncg\.j.*),(ens\.jncg\.j.*) PARARG=(jncg\.exp_.*),(jncg\.exp_.*) SQDEVSUM LABEL=jw5ncyst ... STATS #output from STATS PRINT ARG=nhst.* STRIDE=2000 FILE=ST.RDC.NH PRINT ARG=cahast.* STRIDE=2000 FILE=ST.RDC.CAHA PRINT ARG=csst.* STRIDE=2000 FILE=ST.CS PRINT ARG=jhanst.*,jhahnst.*,jw5ccyst.*,jw5ncyst.* STRIDE=2000 FILE=ST.J
As for the former case we are running a multiple-replica simulation where in addition to multiple-walker metadynamics we are also coupling the replicas through Metainference. The use of multiple-walkers metadynamics is here key in order to have the same bias defined for all the replicas. This allows us to calculate a weighted average of the experimental observables where the weights are defined univocally from the bias [19] .
mpiexec -np 14 gmx_mpi mdrun -s topolnew -multi 14 -plumed plumed-eef1-pbmetad-m_m.dat -table table.xvg -tablep table.xvg >& log.out &