Parameterisation¶

To parameterise a small molecule for simulation, import to right Parameteriser. Here we showcase using the SolutionParameteriser, which when given a SMILES string of the molecule to be simulated, it gives back a parameterised system with one copy of this molecule solvated in water. We use benzene as example here.

[1]:

benzene_smiles = "c1ccccc1"

Depending on the toolkit at hand, parameterisation can either be done using via_rdkit(), which uses open-sourced RDKit or via_openeye(), which is commercial.

The parameterised system is stored as a Parmed object.

[2]:

from mdfptools import Parameteriser
#RDKit parameterisation
rdk_pmd = Parameteriser.SolutionParameteriser.via_rdkit(benzene_smiles)
rdk_pmd

[2]:

<Structure 2622 atoms; 871 residues; 1752 bonds; PBC (orthogonal); parametrized>

[3]:

#OpenEye alternative
oe_pmd = Parameteriser.SolutionParameteriser.via_openeye(benzene_smiles)
oe_pmd

[3]:

<Structure 2652 atoms; 881 residues; 1772 bonds; PBC (orthogonal); parametrized>

When using RDKit, by default the partial charge assignment to the small molecule is done via antechamber to yield AM1-BCC charges.

We also developed a machine-learned alternative partial charge assignment scheme called mlddec. Once this package is installed, one can charge the system using:

[4]:

Parameteriser.SolutionParameteriser.load_ddec_models()
Parameteriser.SolutionParameteriser.via_rdkit(benzene_smiles)

  0%|          | 0/10 [00:00<?, ?it/s]

Loading models...

/home/shuwang/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.21.1 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
100%|██████████| 10/10 [02:53<00:00, 17.30s/it]

[4]:

<Structure 2631 atoms; 874 residues; 1758 bonds; PBC (orthogonal); parametrized>

After one is finished with using Parameteriser to prepare all the systems one wishes to subsequently simulate, the ddec models should be unloaded as they occupy quite some memory.

[5]:

Parameteriser.SolutionParameteriser.unload_ddec_models()

The parameterised systems as parmed objects can be stored to disk and reloaded into memory using pickle:

[6]:

import pickle
#store to disk
pickle.dump(rdk_pmd, open("./benzene.pickle", "wb"))

# Load the pickled object back to memory:
pickle.load(open("./benzene.pickle", "rb"))

[6]:

<Structure 2622 atoms; 871 residues; 1752 bonds; PBC (orthogonal); parametrized>

Visualisation (Optional)¶

You can have a look at the parameterised parmed system inside Jupyter notebook (Jupyter lab does not seem to work) using nglview.

[7]:

import nglview as nv
view = nv.show_parmed(rdk_pmd)
view.add_licorice()
view

Simulation¶

To simulate, just import the right simulator and call the via_openmm() class method, which as the name implies runs MD using OpenMM under the hood.

We plan to include another python handle in GROMACS which will enable via_gromacs() in the future.

The default simulation length is 5 ns, with trajectory frame stored every 10 ps (so 1 frame is stored after every 5000 steps, totally 500 frames). The simulation will take some time to run.

[8]:

from mdfptools.Simulator import SolutionSimulator
SolutionSimulator.via_openmm(rdk_pmd, file_name = "benzene", file_path = "./",
                             platform = "CUDA", num_steps = 5000 * 500)

[8]:

'/home/shuwang/Documents/Modelling/MDFP/Codes/mdfptools/examples/benzene.h5'

Obtain Molecular Dynamics Fingerprint (MDFP)¶

Once the simulation has finished, one can extract the relevant properties using the right Composer (SolutionComposer here).

[9]:

#First load in the simulated trajectory
import mdtraj as md
traj = md.load("./benzene.h5")

[13]:

from mdfptools.Composer import SolutionComposer
mdfp = SolutionComposer.run(traj, rdk_pmd)
print(mdfp)

{'2d_counts': [6, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'water_intra_crf': [8.447469879840682, 0.07754269775241597, 8.448101852311458], 'water_intra_lj': [14.698494638650427, 0.35388130931658185, 14.62080608686442], 'water_total_crf': [-20.59946506906263, 8.513642293598433, -18.724968586184893], 'water_total_lj': [-20.978276995376827, 4.06594223290863, -22.954899317973993], 'water_intra_ene': [23.14596451849111, 0.3528157869181612, 23.10431903412804], 'water_total_ene': [-41.57774206443945, 6.698043493291234, -39.76306852550783], 'water_rgyr': [1.4897907974950872, 0.0031876909007386768, 1.4891081418027634], 'water_sasa': [2.433884, 0.007920546, 2.4333215]}

The returned object from a Composer is a MDFP object. As can be seen from above, it contains more information. To use it for the subsequent machine learning tasks, call the get_mdfp() method to get the feature vectors (i.e. just the values not the keys)

[11]:

mdfp.get_mdfp()

[11]:

[6,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 8.447469879840682,
 0.07754269775241597,
 8.448101852311458,
 14.698494638650427,
 0.35388130931658185,
 14.62080608686442,
 -20.59946506906263,
 8.513642293598433,
 -18.724968586184893,
 -20.978276995376827,
 4.06594223290863,
 -22.954899317973993,
 23.14596451849111,
 0.3528157869181612,
 23.10431903412804,
 -41.57774206443945,
 6.698043493291234,
 -39.76306852550783,
 1.4897907974950872,
 0.0031876909007386768,
 1.4891081418027634,
 2.433884,
 0.007920546,
 2.4333215]