Quick Start: 3. Parallelize the Workflow
Overview
You may notice a problem in the workflow we use in Quick Start 1 – the mutants in the for loop are modeled in a serial manner, that is, one mutant can only start after the previous one finishes. However,
those mutants can be modeled in parallel as they are independent in nature.
It also allows us to fully utlize available resources to speed up our research.
This tutorial will introduce you the shrapnel workflow, which there is a mother script that creates multiple children EnzyHTP working directories that each contains a main workflow script like the one in quick_start_1. Just like a shrapnel bullet:
Note
template/template_shrapnel_main.pytemplate/template_wk_dir_shrapnel/Create the template Workflow
template/template_wk_dir_shrapnel/template_child_main.pyXXX in line 24 and "YYY" in line 25 are place holders. The shrapnel script will replace them to actual values corresponding to each children directory.You can modify this example file to create this main script. Reference Quick start 1 about this.
Create the submission script
template/template_wk_dir_shrapnel/template_hpc_submission.sh.You can modify this example script to create this submission script. Reference here about this.
Use the shrapnel script
You may comment/uncomment functions in main() to change its behavior. Here are directions for using each functions.
Generate children workdirs
In line 124, main() function, the generate_sub_wkdirs() function generate children working directories
that each contains a EnzyHTP main script.
num_group = 5 # the number of groups
child_script="template_child_main.py"
submission_script="template_hpc_submission.sh"
data_rel_path="Mutation.dat"
# == generate sub-directories ==
with open("mutant_list.pickle", "rb") as f:
mutants = pickle.load(f)
generate_sub_wkdirs("KE_07_R7_2_S.pdb",
mutants,
child_script,
submission_script,
num_group)
Config the function by setting the variables
- num_group
- The number of children working directories you want to generate.Each children working directory is a normal EnzyHTP working directory that can be submitted individually.The mutants will be divided into groups based on this number and modeled in each children dir.Basicly set this value to the number of GPUs you plan to use in total.
- child_script
The path of the child main script. This script will be copied to each children working directories
- submission_script
same as above but for the submission script
- data_rel_path
the path of the output data file for each children working directories
Set your target mutants
In the template, we load a python list of mutants from a pickle file.The list is exactly what is shown intemplate/tool/make_pickle_mutant.py. The filealso shows the way of making it a pickle file.Directly defining the mutants in the shrapnel_main.py script works as well.Set your target wild-type
Change “KE_07_R7_2_S.pdb” to the path of your wild-type PDB.The requirement is as same as mentioned here.Leave this function as the only uncomment function in
main()and run the scriptThis will generate a directory with the name of your wt_pdb and contains all the children directories. Click into them to get the idea.
Submit children workdirs
In line 124, main() function, the submit_jobs() function submit the generated children directories.
submit_jobs(range(0,5),
sub_script_rel_path=submission_script)
Set the target
The 1st positional argument of the function allows you to specify the index of the children directories you want to submit. (e.g.: range(0,5) gives [0,1,2,3,4] and will submit group_0 to group_4)
Leave this function as the only uncomment function in
main()and run the script
Other functions
This script also contain other functions that allows you to assign the partitions for part of the children working directories, check which groups are completed, and gather the output. But they are rather limited to be only used on ACCRE at Vanderbilt. You can look into the function defination and modify accordingly for your own case.