Quick Start: 1. Build Workflow

EnzyHTP is a python library which means it doesn’t have any direct executables but instead provides modular functions as building blocks of any customized workflow from users. In other word, user needs to writing a main script calling EnzyHTP functions to furnish their automated workflow.

1. Modify from a Template

As a quick-start, we will modify from a main script template provided under the /template folder of EnzyHTP.

In this tutorial, I will

Explain functions used in the template
Talk about how you can be modify them to match your own need.

This will result in a main script. Running this main script will perform a HTP workflow addressing your research goal.

Note

Creating workflow from a template is a simple but limited way.

See this advanced tutorial to learn how to create workflow from scratch.

Note

The code below are sections from template/template_main.py

1.1 Workflow Configurations

This part of the code configure the workflow with some settings, I will break it down into chunks and explain each of them. You can modify this part of the template to change the configuration of the workflow.

import pickle
from enzy_htp.preparation import protonate_stru, remove_hydrogens
from enzy_htp.mutation import assign_mutant, mutate_stru
from enzy_htp.geometry import equi_md_sampling
from enzy_htp.quantum import single_point
from enzy_htp.analysis import bond_dipole, ele_field_strength_at_along, ele_stab_energy_of_bond
from enzy_htp import interface
import enzy_htp.structure.structure_constraint as stru_cons
from enzy_htp.structure import PDBParser
from enzy_htp.chemical.level_of_theory import QMLevelOfTheory
from enzy_htp.core.clusters.accre import Accre

This import EnzyHTP functions/classes to this main script. We won’t change anything here in our quick-start.

# workflow config
# I/O path
wt_pdb_path = "KE_07_R7_2_S.pdb"
result_path = "ke_test_result.pickle"
ligand_chrg_spin_mapper = {"H5J" : (0,1)} # define the charge spin for ligands and modified AAs

This part configures input and output file path of the workflow.

Note

Where to modify? (Examples)

change path of the PDB file
assign a different charge spin to model a radical cation
ligand_chrg_spin_mapper = {"H5J" : (1,2)}

# HPC job resources
md_hpc_job_config = {
    "cluster" : Accre(),
    "res_keywords" : {
        "account" : "your_account",
        "partition" : "your_partition"
    }
}
qm_hpc_job_config = {
    "cluster" : Accre(),
    "res_keywords" : {
        "account" : "your_account",
        "partition" : "your_partition",
        'walltime' : '1-00:00:00',
    }
}
result_dict = {}

This part configures the resource on the HPC you want to use.

Note

Where to modify? (Examples)

Support your local cluster by changing "cluster" : Name_of_your_cluster() (Check the Tutorial of supporting your local cluster.)
For Accre user, use a real account by changing 'account':'your_real_account_name'

Change the number of cores and memory

qm_hpc_job_config = {
    "cluster" : Accre(),
    "res_keywords" : {
        "account" : "your_account",
        "partition" : "your_partition",
        "walltime" : "1-00:00:00",
        "node_cores" : "24",
        "mem_per_core" : "3G",
    }
}

1.2 Workflow Body

This following parts assemble EnzyHTP functions to a workflow, loops through mutants, and calculate their properties

# 1. create Structure()
wt_stru = PDBParser().get_structure(wt_pdb_path)

# 2. prepare
remove_hydrogens(wt_stru, polypeptide_only=True)
protonate_stru(wt_stru, protonate_ligand=False)

This 1st & 2nd part create the Structure() and prepares the enzyme. We will not modify this part in this quick start.

# 3. create mutant library
mutant_pattern = "WT, r:2[resi 254 around 4 and not resi 101: all not self]*2"
mutants = assign_mutant(wt_stru, mutant_pattern)

for i, mut in enumerate(mutants):
    mutant_result = []
    mutant_dir = f"mutant_{i}"

# 4. mutate Structure()
    mutant_stru = mutate_stru(wt_stru, mut, engine="pymol")
    mutant_stru.assign_ncaa_chargespin(ligand_chrg_spin_mapper)

    remove_hydrogens(mutant_stru, polypeptide_only=True)
    protonate_stru(mutant_stru, protonate_ligand=False)

This 3rd & 4th part create a target mutant library.

Note

Where to modify? (Examples)

Study a specific mutant by changing the mutant_pattern (See syntax here)
mutant_pattern = "R154W"

# 5. sampling with MD
    param_method = interface.amber.build_md_parameterizer(
        ncaa_param_lib_path=f"ncaa_lib",
        force_fields=[
            "leaprc.protein.ff14SB",
            "leaprc.gaff2",
            "leaprc.water.tip3p",
        ],
    )
    mut_constraints = [
        stru_cons.create_distance_constraint("B.254.H2", "A.101.OE2", 2.4, mutant_stru),
        stru_cons.create_angle_constraint("B.254.CAE", "B.254.H2", "A.101.OE2", 180.0, mutant_stru),
    ]

    md_result = equi_md_sampling(
        stru = mutant_stru,
        param_method = param_method,
        cluster_job_config = md_hpc_job_config,
        job_check_period=10,
        prod_constrain=mut_constraints,
        prod_time= 0.1, #ns
        work_dir=f"{mutant_dir}/MD/"
    )

    for replica_esm in md_result:
        replica_result = []

This 5th part sample a geometrical ensemble for the enzyme. (still in the loop)

The result of the MD simulation are replica trajectories from the production run. (by default 3 replica) We loop through it to calculate mutant properties for each replica.

Note

Where to modify? (Examples)

change prod_time = 100.0 to change the length of the MD

change the force field by changing the names in force_field

param_method = interface.amber.build_md_parameterizer(
    ncaa_param_lib_path=f"ncaa_lib",
    force_fields=[
        "leaprc.protein.ff19SB",
        "leaprc.gaff2",
        "leaprc.water.tip4p",
    ],
)

apply customized ligand parameters by adding files under the ncaa_param_lib_path

See section 2.1 for example

apply a different constraint by changing the atom key and the constraint value

mut_constraints = [
    stru_cons.create_distance_constraint("B.254.H2", "A.101.OE2", 1.8, mutant_stru),
]
# format: chain_id.residue_idx.atom_name

run only 1 replica of the MD

by default 3 replicas are run. You can set it to only run 1 by set parallel_runs = 1 in equi_md_sampling

# 6. electronic structure
        qm_results = single_point(
            stru=replica_esm,
            engine="gaussian",
            method=QMLevelOfTheory( basis_set="3-21G", method="hf" ),
            regions=["resi 101+254"],
            cluster_job_config=qm_hpc_job_config,
            job_check_period=60,
            job_array_size=20,
            work_dir=f"{mutant_dir}/QM_SPE/",
        )

This 6th part calculate wavefunction for active site of the enzyme using QM cluster. (still in the loop)

Note

Where to modify? (Examples)

change QM region by changing regions = ['resi 123+456+789'] (this follows PyMol selection syntax)
change QM level of theory by changing method=QMLevelOfTheory( basis_set="def2-svp", method="b3lyp-d3" ).
You can also remove this whole section if you don’t want to do QM.

# 7. analysis
        for ele_stru in qm_results:
            this_frame_stru = ele_stru.geometry.topology
            atom_1 = this_frame_stru.get("B.254.CAE")
            atom_2 = this_frame_stru.get("B.254.H2")

        # bond dipole
            dipole = bond_dipole(
                ele_stru, atom_1, atom_2,
                work_dir=f"{mutant_dir}/bond_dipole/"
            )
        # EF
            field_strength = ele_field_strength_at_along(
                this_frame_stru, atom_1, atom_2, region_pattern="chain A and (not resi 101)"
            )
        # dGele
            dg_ele = ele_stab_energy_of_bond(dipole[0], field_strength)

            replica_result.append((dg_ele, dipole, field_strength))

        mutant_result.append(replica_result)

    result_dict[tuple(mut)] = mutant_result

This 7th part calculate properties for mutants. (still in the loop)

Note

Where to modify? (Examples)

keep only functions that calculate the properties your want.
calcualte EF and dipole for a different bond by change atom_1 and atom_2.

add functions to calculate other properties like

# MMPBSA # TODO update this
ligand_mask = ":902"
mmpbsa_result_dict = pdb_obj.get_mmpbsa_binding(
    ligand_mask,
    cluster=Accre(),
    res_setting = {'account':'yang_lab'})

# save the result
with open(result_path, "wb") as of:
    pickle.dump(result_dict, of)

This last part save our results for each mutant to the output file.

2. Running the Workflow

Now we finished customizing the workflow. It is the time for launching it.

2.1 Configure the working dir

Here is what your working directory should look like before the launching:

.
├── template_main.py
├── template_hpc_submission.sh
├── your_target_wt_enzyme.pdb
└── ncaa_lib # (optional) add this when you customize ligand parameters
    ├── XYZ_AM1BCC-GAFF2.frcmod # XYZ is the ligand 3-letter code
    └── XYZ_AM1BCC-GAFF2.prepin # the "AM1BCC-GAFF2" is used to identify the method (upper case)

(template/template_wk_dir give an example pdb file)

template_hpc_submission.sh

is the job submission script for our workflow main script (template_main.py). This main script runs only requires 1 CPU and 6GB memory. It will submit computationally intensive jobs during the runtime to other computing nodes. (e.g.: MD and QM) The walltime for the main script should cover the maximum time span of your workflow.

ncaa_lib

This is the folder defined in interface.amber.build_md_parameterizer above. It is designed to reuse parameter files for the same ligand to be reused to save time during the workflow. It also allows user to supply custom parameter files for ligands.

2.2 Submit!

Submit the main script under this working directory. Here is an example command for submission on ACCRE @Vanderbilt:

sbatch template_hpc_submission.sh

Now wait for results and enjoy the power of automation of EnzyHTP!

Author: QZ Shao <shaoqz@icloud.com>