============================================== Quick Start: 1. Build Workflow ============================================== EnzyHTP is a python library which means it doesn't have any direct executables but instead provides modular functions as building blocks of any customized workflow from users. In other word, user needs to writing a **main script** calling EnzyHTP functions to furnish their automated workflow. 1. Modify from a Template ============================================== As a quick-start, we will modify from a main script template provided under the ``/template`` folder of EnzyHTP. | In this tutorial, I will - Explain functions used in the template - Talk about how you can be modify them to match your own need. **This will result in a main script. Running this main script will perform a HTP workflow addressing your research goal.** .. note:: | Creating workflow from a template is a simple but limited way. | See `this advanced tutorial `_ to learn how to create workflow from scratch. .. note:: The code below are sections from ``template/template_main.py`` 1.1 Workflow Configurations ------------------------------------ This part of the code configure the workflow with some settings, I will break it down into chunks and explain each of them. You can modify this part of the template to change the configuration of the workflow. .. panels:: :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python import pickle from enzy_htp.preparation import protonate_stru, remove_hydrogens from enzy_htp.mutation import assign_mutant, mutate_stru from enzy_htp.geometry import equi_md_sampling from enzy_htp.quantum import single_point from enzy_htp.analysis import bond_dipole, ele_field_strength_at_along, ele_stab_energy_of_bond from enzy_htp import interface import enzy_htp.structure.structure_constraint as stru_cons from enzy_htp.structure import PDBParser from enzy_htp.chemical.level_of_theory import QMLevelOfTheory from enzy_htp.core.clusters.accre import Accre This import EnzyHTP functions/classes to this main script. We won't change anything here in our quick-start. ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # workflow config # I/O path wt_pdb_path = "KE_07_R7_2_S.pdb" result_path = "ke_test_result.pickle" ligand_chrg_spin_mapper = {"H5J" : (0,1)} # define the charge spin for ligands and modified AAs This part configures input and output file path of the workflow. .. dropdown:: :fa:`eye,mr-1` Click to see code explanation ``wt_pdb_path`` the WT pdb path. This structure needs to be a structure that contains **no missing parts (except for the hydrogens), no wrong parts, and no redundant parts (except water).** (There will be less requirements for this input in the next version of EnzyHTP with the docking module) ``result_path`` the path of the result file in .pickle format (this file will be generated in this path after the workflow) ``ligand_chrg_spin_mapper`` we ask users to assign the charge and spin for ligands and modified amino acids in the protein. they are necessary for the model. .. note:: **Where to modify? (Examples)** - change path of the PDB file - assign a different charge spin to model a radical cation .. code:: ligand_chrg_spin_mapper = {"H5J" : (1,2)} ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # HPC job resources md_hpc_job_config = { "cluster" : Accre(), "res_keywords" : { "account" : "your_account", "partition" : "your_partition" } } qm_hpc_job_config = { "cluster" : Accre(), "res_keywords" : { "account" : "your_account", "partition" : "your_partition", 'walltime' : '1-00:00:00', } } result_dict = {} This part configures the resource on the HPC you want to use. .. dropdown:: :fa:`eye,mr-1` Click to see code explanation There will be MD simulation and QM calculation in this workflow. We configure the resource and the HPC environment for them here. ``md_hpc_job_config`` ``qm_hpc_job_config`` the resource for MD and QM See `here `_ for more details about the config format ``result_dict`` a place holder for collect the results of the workflow. no need to change this. .. note:: **Where to modify? (Examples)** - Support your local cluster by changing ``"cluster" : Name_of_your_cluster()`` (`Check the Tutorial of supporting your local cluster. `_) - For Accre user, use a real account by changing ``'account':'your_real_account_name'`` - Change the number of cores and memory .. code:: python qm_hpc_job_config = { "cluster" : Accre(), "res_keywords" : { "account" : "your_account", "partition" : "your_partition", "walltime" : "1-00:00:00", "node_cores" : "24", "mem_per_core" : "3G", } } 1.2 Workflow Body ------------------------------------ This following parts assemble EnzyHTP functions to a workflow, loops through mutants, and calculate their properties .. panels:: :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # 1. create Structure() wt_stru = PDBParser().get_structure(wt_pdb_path) # 2. prepare remove_hydrogens(wt_stru, polypeptide_only=True) protonate_stru(wt_stru, protonate_ligand=False) This 1st & 2nd part create the ``Structure()`` and prepares the enzyme. We will not modify this part in this quick start. .. dropdown:: :fa:`eye,mr-1` Click to see code explanation ``wt_stru`` | In EnzyHTP, ``Structure`` serve as a center class that describes the structure of a protein. Most functions in EnzyHTP are operations of Structure (i.e.: change a Residue() to mutate or add Atom() to fixs missing hydrogens, etc.) | A ``Structure()`` object is created by reading from a PDB file here using ``PDBParser``. See also other ways to obtain a ``Structure()`` `here `_ ``remove_hydrogens`` this function remove hydrogens from the WT structure. See more details `here `_ ``protonate_stru`` this function protonate the structure with the correct titration state. ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # 3. create mutant library mutant_pattern = "WT, r:2[resi 254 around 4 and not resi 101: all not self]*2" mutants = assign_mutant(wt_stru, mutant_pattern) for i, mut in enumerate(mutants): mutant_result = [] mutant_dir = f"mutant_{i}" # 4. mutate Structure() mutant_stru = mutate_stru(wt_stru, mut, engine="pymol") mutant_stru.assign_ncaa_chargespin(ligand_chrg_spin_mapper) remove_hydrogens(mutant_stru, polypeptide_only=True) protonate_stru(mutant_stru, protonate_ligand=False) This 3rd & 4th part create a target mutant library. .. dropdown:: :fa:`eye,mr-1` Click to see code explanation ``assign_mutant`` this powerful function allows you to create mutant library in a flexible manner. You can apply mutation strageties such as site-saturation mutagenesis, mutating residues from large to small, random mutation, etc. by writing different mutant_pattern. (See syntax `here `_) After creating the library, we use a for loop in Python to study each mutant in the library in each loop. For each loop, a ``mutant_result`` is initiated as a place holder for the result for this mutant. The ``mutant_dir`` is determined as the sub-directory that contains MD files for this mutant. For each mutant, ``mutate_stru`` this function mutate the Struture() and create the corresponding mutant Structure() (See details `here `_) We then do another round of protonation to consider a perturbed protonation state of the protein mutant. .. note:: **Where to modify? (Examples)** - Study a specific mutant by changing the ``mutant_pattern`` (See syntax `here `_) .. code:: python mutant_pattern = "R154W" ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # 5. sampling with MD param_method = interface.amber.build_md_parameterizer( ncaa_param_lib_path=f"ncaa_lib", force_fields=[ "leaprc.protein.ff14SB", "leaprc.gaff2", "leaprc.water.tip3p", ], ) mut_constraints = [ stru_cons.create_distance_constraint("B.254.H2", "A.101.OE2", 2.4, mutant_stru), stru_cons.create_angle_constraint("B.254.CAE", "B.254.H2", "A.101.OE2", 180.0, mutant_stru), ] md_result = equi_md_sampling( stru = mutant_stru, param_method = param_method, cluster_job_config = md_hpc_job_config, job_check_period=10, prod_constrain=mut_constraints, prod_time= 0.1, #ns work_dir=f"{mutant_dir}/MD/" ) for replica_esm in md_result: replica_result = [] This 5th part sample a geometrical ensemble for the enzyme. (still in the loop) .. dropdown:: :fa:`eye,mr-1` Click to see code explanation ``param_method`` | this defines the force field and the MD engine you want to use. | Here we used Amber with ff14SB, TIP3P, and GAFF2 ``mut_constraints`` | this defines the constraint added in the MD. | here we add distance and angle constraint between the 2 atoms of the forming bond to create a TS-analog ``equi_md_sampling`` | this function runs the MD simulation. It will submit a job to the HPC. | ``prod_time`` specifies the length of the MD simulation. We use 0.1 ns here for demo purpose The result of the MD simulation are replica trajectories from the production run. (by default 3 replica) We loop through it to calculate mutant properties for each replica. .. note:: **Where to modify? (Examples)** - change ``prod_time = 100.0`` to change the length of the MD - change the force field by changing the names in ``force_field`` .. code:: python param_method = interface.amber.build_md_parameterizer( ncaa_param_lib_path=f"ncaa_lib", force_fields=[ "leaprc.protein.ff19SB", "leaprc.gaff2", "leaprc.water.tip4p", ], ) - apply customized ligand parameters by adding files under the ncaa_param_lib_path See `section 2.1 <#configure-the-working-dir>`_ for example - apply a different constraint by changing the atom key and the constraint value .. code:: python mut_constraints = [ stru_cons.create_distance_constraint("B.254.H2", "A.101.OE2", 1.8, mutant_stru), ] # format: chain_id.residue_idx.atom_name - run only 1 replica of the MD by default 3 replicas are run. You can set it to only run 1 by set ``parallel_runs = 1`` in ``equi_md_sampling`` ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # 6. electronic structure qm_results = single_point( stru=replica_esm, engine="gaussian", method=QMLevelOfTheory( basis_set="3-21G", method="hf" ), regions=["resi 101+254"], cluster_job_config=qm_hpc_job_config, job_check_period=60, job_array_size=20, work_dir=f"{mutant_dir}/QM_SPE/", ) This 6th part calculate wavefunction for active site of the enzyme using QM cluster. (still in the loop) .. dropdown:: :fa:`eye,mr-1` Click to see code explanation ``replica_result`` this is the place holder of each replica ``single_point`` this function calculates wavefunction for each frame in a trajectory (as a ``StructureEnsemble()``) using QM. (See `here `_ for more details) .. note:: **Where to modify? (Examples)** - change QM region by changing ``regions = ['resi 123+456+789']`` (this follows PyMol selection syntax) - change QM level of theory by changing ``method=QMLevelOfTheory( basis_set="def2-svp", method="b3lyp-d3" )``. - You can also remove this whole section if you don't want to do QM. ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # 7. analysis for ele_stru in qm_results: this_frame_stru = ele_stru.geometry.topology atom_1 = this_frame_stru.get("B.254.CAE") atom_2 = this_frame_stru.get("B.254.H2") # bond dipole dipole = bond_dipole( ele_stru, atom_1, atom_2, work_dir=f"{mutant_dir}/bond_dipole/" ) # EF field_strength = ele_field_strength_at_along( this_frame_stru, atom_1, atom_2, region_pattern="chain A and (not resi 101)" ) # dGele dg_ele = ele_stab_energy_of_bond(dipole[0], field_strength) replica_result.append((dg_ele, dipole, field_strength)) mutant_result.append(replica_result) result_dict[tuple(mut)] = mutant_result This 7th part calculate properties for mutants. (still in the loop) .. dropdown:: :fa:`eye,mr-1` Click to see code explanation With the wavefunctions generated by QM and trajectories generated by MM, we calculate the reacting bond dipole moment (``bond_dipole``), enzyme's internal electric field strength (``ele_field_strength_at_along``), and electrostatic stablization energy (``ele_stab_energy_of_bond``). Most of the code are self-explaining in this part. I will explain for: ``region_pattern`` This defines the region that EnzyHTP use to calculate the electric field strength. .. note:: **Where to modify? (Examples)** - keep only functions that calculate the properties your want. - calcualte EF and dipole for a different bond by change ``atom_1`` and ``atom_2``. - add functions to calculate other properties like .. code:: python # MMPBSA # TODO update this ligand_mask = ":902" mmpbsa_result_dict = pdb_obj.get_mmpbsa_binding( ligand_mask, cluster=Accre(), res_setting = {'account':'yang_lab'}) ------------------------ :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left .. code:: python # save the result with open(result_path, "wb") as of: pickle.dump(result_dict, of) This last part save our results for each mutant to the output file. 2. Running the Workflow ============================================== Now we finished customizing the workflow. It is the time for launching it. 2.1 Configure the working dir ------------------------------------ Here is what your working directory should look like before the launching: .. code:: bash . ├── template_main.py ├── template_hpc_submission.sh ├── your_target_wt_enzyme.pdb └── ncaa_lib # (optional) add this when you customize ligand parameters ├── XYZ_AM1BCC-GAFF2.frcmod # XYZ is the ligand 3-letter code └── XYZ_AM1BCC-GAFF2.prepin # the "AM1BCC-GAFF2" is used to identify the method (upper case) (``template/template_wk_dir`` give an example pdb file) ``template_hpc_submission.sh`` is the job submission script for our workflow main script (``template_main.py``). This main script runs only requires 1 CPU and 6GB memory. It will submit computationally intensive jobs during the runtime to other computing nodes. (e.g.: MD and QM) The walltime for the main script should cover the maximum time span of your workflow. .. dropdown:: :fa:`eye,mr-1` **Do this** if you are NOT in Vanderbilt... You may also need to modify the ``template_hpc_submission.sh`` to match with your local cluster. Here are some instructions: In ``template_hpc_submission.sh``: 1. Change ``line 1-10`` (resource settings) to match your local cluster's scheduler syntax. (checkout the submission script you normally use) 2. Change ``line 12-22`` (environment settings) to match your local environmental setting (e.g.: how you normally load Gaussian, AmberTool, and Multiwfn) .. dropdown:: :fa:`eye,mr-1` **Do this** if you are in Vanderbilt... In ``template_hpc_submission.sh``: 1. Change ``xxx`` in ``line 3`` to a valid value. (e.g.: yang_lab) 2. Change ``EFdesMD`` in ``line 2`` to a customized name for your workflow 3. Change the path of conda in ``line 22`` to match your own paths ``ncaa_lib`` This is the folder defined in ``interface.amber.build_md_parameterizer`` above. It is designed to reuse parameter files for the same ligand to be reused to save time during the workflow. It also **allows user to supply custom parameter files** for ligands. .. dropdown:: :fa:`eye,mr-1` Click to see how to supply custom parameter files for ligands You can do it by place files following this name scheme: .. code:: XYZ_AAA-BBB.frcmod - XYZ is the 3-letter name of the ligand/modified AA - AAA is the name of the atomic charge model in uppercase - BBB is the name of the force field in uppercase allowed AAA-BBB combinations can be found in ``enzy_htp/_interface/ncaa_library.py::PARM_METHOD_LIST`` 2.2 Submit! ------------------------------------ Submit the main script under this working directory. Here is an example command for submission on ACCRE @Vanderbilt: .. code:: bash sbatch template_hpc_submission.sh **Now wait for results and enjoy the power of automation of EnzyHTP!** Author: QZ Shao