.. _protein_electrolyte-label:
Protein in electrolyte
**********************
.. container:: hatnote
Simulating a protein solvated in water and ions
.. figure:: ../figures/level1/protein-in-electrolyte/protein-light.webp
:alt: Protein and Na Cl ions visualized with VMD
:class: only-light
:height: 300
:align: right
.. figure:: ../figures/level1/protein-in-electrolyte/protein-dark.webp
:alt: Protein and Na Cl ions visualized with VMD
:class: only-dark
:height: 300
:align: right
.. container:: justify
The goal of this tutorial is to use GROMACS and perform a simple
molecular dynamics simulation of a protein solvated in an electrolyte. The
protein is downloaded from the Protein Data Bank (PDB) :cite:`bank1971protein`
and solvated in an electrolyte made of water molecules and Na+ Cl- ions.
.. container:: justify
This tutorial covers some of the basic uses of GROMACS, including system
preparation, force field selection, input file preparation, and data
analysis.
.. include:: ../../non-tutorials/recommand-salt.rst
.. include:: ../../non-tutorials/needhelp.rst
.. include:: ../../non-tutorials/GROMACS2024.2.rst
Convert the PDB file
====================
.. container:: justify
Download the *.pdb* file from the |ProteinDataBank|,
or simply click |1cta.pdb|. The protein is a calcium-binding peptide from site III
of chicken troponin-C that has been determined using 1H-NMR spectroscopy
:cite:`shaw1992determination`.
.. |ProteinDataBank| raw:: html
Protein Data Bank
.. |1cta.pdb| raw:: html
here
.. container:: justify
We first need to create the *.gro* file, i.e. a GROMACS structure file,
from the *.pdb* file. This can be done using *gmx trjconv*:
.. code-block:: bash
gmx trjconv -f 1cta.pdb -s 1cta.pdb -o 1cta.gro -center -box 5 5 5
.. container:: justify
Choose the group *System* for the centering, and the group *System* as well
for the output. A file named *1cta.gro* is created. The generated *.gro*
file contains 666 atoms, each atom corresponding to one line:
.. code-block:: bw
TROPONIN C SITE III - SITE III HOMODIMER
666
0ACE C 1 2.662 4.131 2.701
0ACE O 2 2.714 4.036 2.646
0ACE CH3 3 2.651 4.147 2.853
(...)
35NH2 HN2 664 2.417 3.671 3.192
69CA CA 665 3.016 2.279 1.785
70CA CA 666 1.859 2.046 1.838
5.00000 5.00000 5.00000
.. container:: justify
The last line is the box dimensions in nanometer, which was requested
in the *gmx trjconv* command by the *-box 5 5 5* option. All the options
of *trjconv* can be found on the corresponding page of the GROMACS
|trjconv-documentation|.
.. |trjconv-documentation| raw:: html
documentation
Choose the force field
======================
.. container:: justify
Let us select the force field that will control the interactions between
the different atoms. This can be done using the *gmx pdb2gmx* command:
.. code-block:: bash
gmx pdb2gmx -f 1cta.gro -water spce -v -ignh -o unsolvated.gro
.. container:: justify
Here, the *-ignh* option is used to ignore the hydrogen atoms that are in
the coordinate file. The *-water spce* open is used to
specify the water model; the extended simple point charge
model (spce) :cite:`berendsen1987missing`.
.. container:: justify
When running *gmx pdb2gmx*, choose the *AMBER03 protein, nucleic
AMBER94* force field :cite:`duan2003point`. A new *gro* file named
*unsolvated.gro* was created, as well as a topology *.top* file named
*topol.top*.
Solvate the protein
===================
.. container:: justify
The protein is now ready to be solvated. Let us first immerse it in pure
water using *gmx solvate*:
.. code-block:: bash
gmx solvate -cs spc216.gro -cp unsolvated.gro -o solvated.gro -p topol.top
.. container:: justify
Here, *spc216.gro* is a pre-equilibrated water configuration that is provided
by GROMACS. After running *gmx solvate*, a number :math:`N = 3719` of
water molecules, or SOL (for solvent), is created in the new GRO file
named *solvated.gro* next to the protein. The number :math:`N` may slightly
differ in your case. A new line must also appear at the end of the
*topol.top* file:
.. code-block:: bash
(...)
[ molecules ]
; Compound #mols
Protein 1
SOL 3719
Run an energy minimization
==========================
.. container:: justify
Although *gmx solvate* creates molecules without overlap with the
protein, it is safer to perform a short energy minimization to ensure that
the distances between the atoms are reasonable.
.. container:: justify
To do so, create a new folder named *inputs/*, and create a file named
*mininimize.mdp* into it. Copy the following lines into *mininimize.mdp*:
.. code-block:: bw
integrator = steep
nsteps = 50
nstxout = 10
cutoff-scheme = Verlet
nstlist = 10
ns_type = grid
couple-intramol = yes
vdw-type = Cut-off
rvdw = 1.0
coulombtype = pme
fourierspacing = 0.1
pme-order = 4
rcoulomb = 1.0
.. container:: justify
Here, the speepest-descent method is used, with a maximum number of steps
of 50 :cite:`debye1909naherungsformeln`. The trajectory is printed every 10
step, as specified by the *nstxout* option. The other commands control the
interactions and cut-offs.
.. container:: justify
Prepare the energy minimization using *gmx grompp*:
.. code-block:: bash
gmx grompp -f inputs/mininimize.mdp -c solvated.gro -p topol.top -o min -pp min -po min -maxwarn 1
gmx mdrun -v -deffnm min
.. container:: justify
The *-maxwarn 1* is required here, because the system is not charge neutral
and GROMACS will return a WARNING. The charge neutrality will be enforced
later on. Finally, run the simulation using *gmx mdrun*:
.. code-block:: bash
gmx mdrun -v -deffnm min
.. container:: justify
Thanks to the steepest-descent algorithm, the potential energy of the
system decreases rapidly and becomes large and negative, which is usually
a good sign. The potential energy can be extracted using *gmx energy*:
.. code-block:: bash
gmx energy -f min.edr -o potential-energy-minimization.xvg
.. container:: justify
and choose *Potential*. The generated *.xvg* files contain the value of the
potential energy (in kJ/mol) as a function of the simulation steps. The potential
energy decreases from :math:`-3 \mathrm{e}-4~\text{kJ}/\text{mol}` to
:math:`-1.8 \mathrm{e}-5~\text{kJ}/\text{mol}`.
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-minimization-light.png
:alt: potential energy extracted using Gromacs
:class: only-light
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-minimization-dark.png
:alt: potential energy extracted using Gromacs
:class: only-dark
.. container:: figurelegend
Figure: Potential energy :math:`E_\text{p}` of the system as a function of the
number of steps :math:`N_\text{step}` during energy minimization.
Add the salt
============
.. container:: justify
Let us add some ions to the system so that the (1) total charge is 0, and (2)
that the salt concentration is :math:`c_\text{s} \approx 1~\text{mol/L}`.
This is done using the *gmx genion* command,
.. code-block:: bash
gmx genion -s min.tpr -p topol.top -conc 1 -neutral -o salted.gro
.. container:: justify
Select the group *SOL* as the continuous group of solvent molecules. GROMACS
will replace some of the *SOL* residues with ions.
.. container:: justify
As can be seen from the *topol.top* file, some sodium (Na+) and chloride
(Cl-) ions were added, and the number :math:`N` of water molecules is
reduced compared to the previous step:
.. code-block:: bash
[ molecules ]
; Compound #mols
Protein 1
SOL 3563
NA 81
CL 75
.. container:: justify
Out of safety, let us run a new energy minimization starting from the
*salted.gro* configuration. The *-maxwarn* option is not necessary
as the system is charge-neutral.
.. code-block:: bash
gmx grompp -f inputs/mininimize.mdp -c salted.gro -p topol.top -o min-s -pp min-s -po min-s
gmx mdrun -v -deffnm min-s
.. container:: justify
As previously, one can have a look at the potential energy using *gmx energy*:
.. code-block:: bash
gmx energy -f min-s.edr -o potential-energy-minimization-s.xvg
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-minimization-s-light.png
:alt: potential energy extracted using Gromacs
:class: only-light
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-minimization-s-dark.png
:alt: potential energy extracted using Gromacs
:class: only-dark
.. container:: figurelegend
Figure: Potential energy :math:`E_\text{p}` of the system as a function of the
number of steps :math:`N_\text{step}` during energy minimization.
.. container:: justify
The system can also be visualized using VMD using:
.. code-block:: bash
vmd min.gro min.trr
.. figure:: ../figures/level1/protein-in-electrolyte/protein-solvated.png
:alt: Gromacs tutorial protein in water and salt
.. container:: figurelegend
Figure: Protein solvated in water. On the left, only the water molecules that are
near the protein are represented. On the right, the entire system is shown.
Water molecules are represented as red and white sticks, and ions are
represented as spheres.
Run the molecular dynamics
==========================
.. container:: justify
Create a new input file called *nvt.mdp* and placed into the *inputs/*
folder, and copy the following lines into it:
.. code-block:: bw
integrator = md
nsteps = 20000
dt = 0.001
comm_mode = linear
comm_grps = system
gen-vel = yes
gen-temp = 300
cutoff-scheme = Verlet
nstlist = 10
ns_type = grid
nstxout-compressed = 1000
vdw-type = Cut-off
rvdw = 1.0
couple-intramol = yes
coulombtype = pme
fourierspacing = 0.1
pme-order = 4
rcoulomb = 1.0
constraint-algorithm = lincs
constraints = hbonds
tcoupl = v-rescale
ld-seed = 48456
tc-grps = system
tau-t = 0.5
ref-t = 300
.. container:: justify
Here, the *v-rescale* thermostat is used to impose a temperature
of :math:`T = 300~\text{K}`
with a characteristic time of :math:`0.5~\text{ps}`.
The *v-rescale* thermostat
corresponds to the Berendsen thermostat with an additional stochastic
term :cite:`bussi2007canonical`, and is known to give proper canonical ensemble.
.. container:: justify
The *LINCS* algorithm is used to constrain the hydrogen bonds, allowing us
to use a timestep of :math:`1~\text{fs}`. Without such constraint, the fast
vibration of the hydrogen bonds would impose the use of a smaller timestep,
which makes the computation more computationally expensive.
.. container:: justify
Run the NVT simulation:
.. container:: justify
gmx grompp -f inputs/nvt.mdp -c min-s.gro -p topol.top -o nvt -pp nvt -po nvt
gmx mdrun -v -deffnm nvt
.. container:: justify
Let us observe the potential energy. Let us also observe the temperature
and the pressure of the system. Run the *gmx energy* command 3 times,
and select successively *potential*, *temperature*, and *pressure*:
.. code-block:: bash
gmx energy -f nvt.edr -o potential-energy-nvt.xvg
gmx energy -f nvt.edr -o temperature-nvt.xvg
gmx energy -f nvt.edr -o pressure-nvt.xvg
.. container:: justify
After an initial spike, the energy, the temperature, and the pressure
all stabilize. For the temperature, the desired value of :math:`T = 300~\text{K}`
is reached, and for the pressure, a negative value of about :math:`- 700~\text{bar}`
can be observed.
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-nvt-light.png
:alt: potential energy extracted using Gromacs
:class: only-light
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-nvt-dark.png
:alt: potential energy extracted using Gromacs
:class: only-dark
.. container:: figurelegend
Figure: Potential energy :math:`E_\text{p}` (a),
temperature :math:`T` (b),
and pressure :math:`p` (c) as a function of the
time :math:`t` during the NVT molecular dynamics.
.. container:: justify
The negative pressure indicates that the volume is slightly too large. This
can be rectified by performing a short NPT simulation, during which the
volume of the box will adjust until a desired pressure is reached. Create
a new file called *npt.mdp* in the *inputs/* folder. Copy the same lines as
in *nvt.mdp*, and add the following lines to it:
.. code-block:: bw
pcoupl = c-rescale
Pcoupltype = isotropic
tau_p = 1.0
ref_p = 1.0
compressibility = 4.5e-5
.. container:: justify
Here, the isotropic c-rescale pressure
coupling with a target pressure of 1 bar is used.
Run it starting from the end of the previous *nvt* run:
.. container:: justify
gmx grompp -f inputs/npt.mdp -c nvt.gro -p topol.top -o npt -pp npt -po npt
gmx mdrun -v -deffnm npt
.. container:: justify
As the simulation progresses, the volume of the box decreases and better
adjust to the fluid content of the box, as can be seen using *gmx energy*
and extracting the *volume*, and/or by extracting the *density*:
.. code-block:: bash
gmx energy -f npt.edr -o volume-npt.xvg
gmx energy -f npt.edr -o density-npt.xvg
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-npt-light.png
:alt: potential energy extracted using Gromacs
:class: only-light
.. figure:: ../figures/level1/protein-in-electrolyte/potential-energy-npt-dark.png
:alt: potential energy extracted using Gromacs
:class: only-dark
.. container:: figurelegend
Figure: Box volume :math:`V` (a) and
density :math:`\rho` (b) as a function of the
time :math:`t` during the NPT molecular dynamics.
.. include:: ../../non-tutorials/accessfile.rst
Going further with exercises
============================
Extract radial distribution functions
-------------------------------------
.. container:: justify
Use *gmx rdf* and extract the radial distribution functions (RDFs) between
the ions and the water molecules.