PDBParser
Briefs
PDBParser facilitates the conversion between PDB files and Structure objects. It primarily offers two functions: get_structure() for converting a PDB file into a Structure object, and save_structure() for performing the inverse operation.
Input/Output
PDBParser.get_structure()
Input
pathSpecifies the file path of the PDB file.
How to obtain
PDB files can be downloaded from the Protein Data Bank or acquired from experimental results.
Output
Structure()Returns the constructed
Structureobject.Tip
PDBParserfixes missing chain identifiers. For example, polymer PDB files generated by AMBER often lack chain IDs.
Arguments
Click to see more details
pathPath to the PDB file. (See PDBParser.get_structure() section)
model- Index of the model to select if the file contains multiple models. The index starts from 0 (default: 0). Assume that “MODEL” tags appear in order in the file.Determine if there are multiple models by looking for “MODEL {number}” and “ENDMDL” tags in the PDB file, or by using protein visualization tools such as PyMOL or UCSF Chimera.
add_solvent_list- List used to categorize residues, this parameter allows for the inclusion of additional solvent names. For example,
add_solvent_list=["DMS", "ACT"]Solvents are recognized by matching the names in these lists within non-polypeptide chains:chem.RD_SOLVENT_LIST + add_solvent_listRD_SOLVENT_LISTincludes common water aliases such as HOH and WAT. (RD_SOLVENT_LIST: List[str] = ["HOH", "WAT"]) add_ligand_list- List sed to categorize residues, this parameter adds additional names for ligands. This change will only affect the original
RD_NON_LIGAND_LIST.For example,add_ligand_list=["EDO", "FAD", "NAD"]Ligands are identified by names not matching those in these lists within non-polypeptide chains:1.chem.RD_SOLVENT_LIST + add_solvent_listRD_SOLVENT_LISTincludes common water aliases such as HOH and WAT. (RD_SOLVENT_LIST: List[str] = ["HOH", "WAT"])2.chem.RD_NON_LIGAND_LIST - add_ligand_listRD_NON_LIGAND_LISTincludes common co-crystallized ligands found in solvents, including CL (CHLORIDE ION), EDO (1,2-ETHANEDIOL), GOL (GLYCEROL), and EOH (ETHANOL). ("RD_NON_LIGAND_LIST: List[str] = ["CL", "EDO", "GOL", "EOH"]")*Solvent list have higher pirority remove_trash- Option to remove trash ligands, defined by
chem.RD_NON_LIGAND_LIST - add_ligand_list, can be set to eitherremove_trash=Trueorremove_trash=False, default value isTrue. give_idx_map- Option to return a tuple of
(Structure, idx_change_mapper), can be set to eithergive_idx_map=Trueorgive_idx_map=False, default value isFlase.The mapping is a dictionary:{(old_chain_id, old_residue_id): (new_chain_id, new_residue_id), ... } allow_multichain_in_atom- Used for resolving chain id, can be set to either
allow_multichain_in_atom=Trueorallow_multichain_in_atom=False, default value isFlase.When set toTrue, it allows multiple chain IDs to appear within the same chain that consists of ATOM records. Although this conflicts with the standard PDB file format definition, it is useful for resolving chain IDs of multi-chain PDB files exported by PyMOL2.
Example Code
Generate a
Structureobjects from a simple PDB file
In this example, we use PDBParser to process a single-chain, single-model PDB file. We aim to import the PDB file as a Structure object.
How input is prepared
For PDBParser.get_structure(), need to prepare:
pathThe file path of the PDB file. In this example, the PDB file was downloaded from the Protein Data Bank with the ID “8K4Z” and is named
"./8k4z.pdb". (See PDBParser.get_structure() section)add_solvent_listAccording to the PDB file, if you want to prevent the chloride ion from being discarded as trash, you can categorize the chloride ion (named “CL” in the PDB file) as a solvent.
add_ligand_listAccording to the PDB file, if you want to prevent the chloride ion from being discarded as trash, you can categorize the chloride ion (named “CL” in the PDB file) as a ligand.
from enzy_htp.structure import PDBParser
test_A="./8k4z.pdb"
test_A_struc1 = PDBParser.get_structure(path=test_A,
add_solvent_list=["CL"], #In this way, the CL will not be treated as trash.
remove_trash=True)
test_A_struc2 = PDBParser.get_structure(path=test_A,
add_ligand_list=["CL"], #In this way, the CL will not be treated as trash.
remove_trash=True)
PDBParser.save_structure()
Input
outfilePath for saving the
Structure()object as a string.How to obtain
Define the save path as a string, e.g.,
outfile='./save_pro.pdb'.struThe
Structure()object to be saved.How to obtain
(See PDBParser.get_structure() section)
Output
str()Path where the
Structure()was saved, returned as a string.
Arguments
Click to see more details
outfilePath for saving the
Structure()object as a string. (See PDBParser.save_structure() section)struThe
Structure()object to be saved. (See PDBParser.save_structure() section)if_renumberDetermines whether to renumber atoms starting from 1, can be set to either
if_renumber=Trueorif_renumber=False, default value isTrue.if_fix_atomnameDetermines whether atom names should be adjusted to conform to PDB conventions, can be set to either
if_fix_atomname=Trueorif_fix_atomname=False. The default value is True, which ensures that atom names are automatically converted to the standard PDB format.
Example Code
Save a
Structureobjects to a PDB file
In this example, we use PDBParser to export Structure object as a new PDB file.
How input is prepared
For PDBParser.save_structure(), need to prepare:
outfileThe path to save the
Structureobject as a string. In this example, we save the structure as"./2v7m_new.pdb"struThe
Structure()object obtained fromPDBParser.get_structure()(See PDBParser.get_structure() section)
from enzy_htp.structure import PDBParser
test_A="./2v7m.pdb"
test_A_struc = PDBParser.get_structure(path=test_A)
test_A_saved_path = PDBParser.save_structure(outfile="./2v7m_new.pdb",
stru=test_A_struc)
print(test_A_saved_path) #./2v7m_new.pdb
Author: Xingyu Ouyang <ouyangxingyu913@gmail.com>