Parameterisation¶
To parameterise a small molecule for simulation, import to right Parameteriser. Here we showcase using the SolutionParameteriser, which when given a SMILES string of the molecule to be simulated, it gives back a parameterised system with one copy of this molecule solvated in water. We use benzene as example here.
[1]:
benzene_smiles = "c1ccccc1"
Depending on the toolkit at hand, parameterisation can either be done using via_rdkit(), which uses open-sourced RDKit or via_openeye(), which is commercial.
The parameterised system is stored as a Parmed object.
[2]:
from mdfptools import Parameteriser
#RDKit parameterisation
rdk_pmd = Parameteriser.SolutionParameteriser.via_rdkit(benzene_smiles)
rdk_pmd
[2]:
<Structure 2622 atoms; 871 residues; 1752 bonds; PBC (orthogonal); parametrized>
[3]:
#OpenEye alternative
oe_pmd = Parameteriser.SolutionParameteriser.via_openeye(benzene_smiles)
oe_pmd
[3]:
<Structure 2652 atoms; 881 residues; 1772 bonds; PBC (orthogonal); parametrized>
When using RDKit, by default the partial charge assignment to the small molecule is done via antechamber to yield AM1-BCC charges.
We also developed a machine-learned alternative partial charge assignment scheme called mlddec. Once this package is installed, one can charge the system using:
[4]:
Parameteriser.SolutionParameteriser.load_ddec_models()
Parameteriser.SolutionParameteriser.via_rdkit(benzene_smiles)
0%| | 0/10 [00:00<?, ?it/s]
Loading models...
/home/shuwang/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.21.1 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
100%|██████████| 10/10 [02:53<00:00, 17.30s/it]
[4]:
<Structure 2631 atoms; 874 residues; 1758 bonds; PBC (orthogonal); parametrized>
After one is finished with using Parameteriser to prepare all the systems one wishes to subsequently simulate, the ddec models should be unloaded as they occupy quite some memory.
[5]:
Parameteriser.SolutionParameteriser.unload_ddec_models()
The parameterised systems as parmed objects can be stored to disk and reloaded into memory using pickle:
[6]:
import pickle
#store to disk
pickle.dump(rdk_pmd, open("./benzene.pickle", "wb"))
# Load the pickled object back to memory:
pickle.load(open("./benzene.pickle", "rb"))
[6]:
<Structure 2622 atoms; 871 residues; 1752 bonds; PBC (orthogonal); parametrized>
Visualisation (Optional)¶
You can have a look at the parameterised parmed system inside Jupyter notebook (Jupyter lab does not seem to work) using nglview.
[7]:
import nglview as nv
view = nv.show_parmed(rdk_pmd)
view.add_licorice()
view
Simulation¶
To simulate, just import the right simulator and call the via_openmm() class method, which as the name implies runs MD using OpenMM under the hood.
We plan to include another python handle in GROMACS which will enable via_gromacs() in the future.
The default simulation length is 5 ns, with trajectory frame stored every 10 ps (so 1 frame is stored after every 5000 steps, totally 500 frames). The simulation will take some time to run.
[8]:
from mdfptools.Simulator import SolutionSimulator
SolutionSimulator.via_openmm(rdk_pmd, file_name = "benzene", file_path = "./",
platform = "CUDA", num_steps = 5000 * 500)
[8]:
'/home/shuwang/Documents/Modelling/MDFP/Codes/mdfptools/examples/benzene.h5'
Obtain Molecular Dynamics Fingerprint (MDFP)¶
Once the simulation has finished, one can extract the relevant properties using the right Composer (SolutionComposer here).
[9]:
#First load in the simulated trajectory
import mdtraj as md
traj = md.load("./benzene.h5")
[13]:
from mdfptools.Composer import SolutionComposer
mdfp = SolutionComposer.run(traj, rdk_pmd)
print(mdfp)
{'2d_counts': [6, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'water_intra_crf': [8.447469879840682, 0.07754269775241597, 8.448101852311458], 'water_intra_lj': [14.698494638650427, 0.35388130931658185, 14.62080608686442], 'water_total_crf': [-20.59946506906263, 8.513642293598433, -18.724968586184893], 'water_total_lj': [-20.978276995376827, 4.06594223290863, -22.954899317973993], 'water_intra_ene': [23.14596451849111, 0.3528157869181612, 23.10431903412804], 'water_total_ene': [-41.57774206443945, 6.698043493291234, -39.76306852550783], 'water_rgyr': [1.4897907974950872, 0.0031876909007386768, 1.4891081418027634], 'water_sasa': [2.433884, 0.007920546, 2.4333215]}
The returned object from a Composer is a MDFP object. As can be seen from above, it contains more information. To use it for the subsequent machine learning tasks, call the get_mdfp() method to get the feature vectors (i.e. just the values not the keys)
[11]:
mdfp.get_mdfp()
[11]:
[6,
0,
0,
0,
0,
0,
0,
0,
0,
0,
8.447469879840682,
0.07754269775241597,
8.448101852311458,
14.698494638650427,
0.35388130931658185,
14.62080608686442,
-20.59946506906263,
8.513642293598433,
-18.724968586184893,
-20.978276995376827,
4.06594223290863,
-22.954899317973993,
23.14596451849111,
0.3528157869181612,
23.10431903412804,
-41.57774206443945,
6.698043493291234,
-39.76306852550783,
1.4897907974950872,
0.0031876909007386768,
1.4891081418027634,
2.433884,
0.007920546,
2.4333215]