==============================================
PDBParser
==============================================

Briefs
==============================================
``PDBParser`` facilitates the conversion between PDB files and ``Structure`` objects. It primarily offers two functions: ``get_structure()`` for converting a PDB file into a ``Structure`` object, and ``save_structure()`` for performing the inverse operation.

Input/Output
==============================================

.. panels::

    :column: col-lg-12 col-md-12 col-sm-12 col-xs-12 p-2 text-left

    .. image:: ../../figures/PDBParser_io.svg
        :width: 100%
        :alt: PDBParser_io

PDBParser.get_structure()
==============================================

Input
------------------------------------------------

``path``
    Specifies the file path of the PDB file.

    .. admonition:: How to obtain

        PDB files can be downloaded from the `Protein Data Bank <https://www.rcsb.org/>`_ or acquired from experimental results.

Output
------------------------------------------------

``Structure()``
    Returns the constructed ``Structure`` object.

    .. tip::

        ``PDBParser`` fixes missing chain identifiers. For example, polymer PDB files generated by AMBER often lack chain IDs.

Arguments
------------------------------------------------

.. dropdown:: :fa:`eye,mr-1` Click to see more details

    ``path``
        Path to the PDB file.
        (See `PDBParser.get_structure() <#pdbparser-get-structure>`_ section)

    ``model``
        | Index of the model to select if the file contains multiple models. The index starts from 0 (default: 0). Assume that "MODEL" tags appear in order in the file.
        | Determine if there are multiple models by looking for "MODEL {number}" and "ENDMDL" tags in the PDB file, or by using protein visualization tools such as `PyMOL <https://pymol.org>`_ or `UCSF   Chimera <https://www.cgl.ucsf.edu/chimera/>`_.

    ``add_solvent_list``
        | List used to categorize residues, this parameter allows for the inclusion of additional solvent names. For example, ``add_solvent_list=["DMS", "ACT"]``
        | Solvents are recognized by matching the names in these lists within non-polypeptide chains:
        | ``chem.RD_SOLVENT_LIST + add_solvent_list``
        | ``RD_SOLVENT_LIST`` includes common water aliases such as HOH and WAT. (``RD_SOLVENT_LIST: List[str] = ["HOH", "WAT"]``)

    ``add_ligand_list``
        | List sed to categorize residues, this parameter adds additional names for ligands. This change will only affect the original ``RD_NON_LIGAND_LIST``.
        | For example, ``add_ligand_list=["EDO", "FAD", "NAD"]``
        | Ligands are identified by names not matching those in these lists within non-polypeptide chains:

            | 1. ``chem.RD_SOLVENT_LIST + add_solvent_list``
            | ``RD_SOLVENT_LIST`` includes common water aliases such as HOH and WAT. (``RD_SOLVENT_LIST:  List[str] = ["HOH", "WAT"]``)
            | 2. ``chem.RD_NON_LIGAND_LIST - add_ligand_list``
            | ``RD_NON_LIGAND_LIST`` includes common co-crystallized ligands found in solvents, including CL (CHLORIDE ION), EDO (1,2-ETHANEDIOL), GOL (GLYCEROL), and EOH (ETHANOL).     (``"RD_NON_LIGAND_LIST: List[str] = ["CL", "EDO", "GOL", "EOH"]"``)
            | *Solvent list have higher pirority

    ``remove_trash``
        | Option to remove trash ligands, defined by ``chem.RD_NON_LIGAND_LIST - add_ligand_list``, can be set to either ``remove_trash=True`` or  ``remove_trash=False``, default value is ``True``.

    ``give_idx_map``
        | Option to return a tuple of ``(Structure, idx_change_mapper)``, can be set to either  ``give_idx_map=True`` or  ``give_idx_map=False``, default value is ``Flase``.
        | The mapping is a dictionary: ``{(old_chain_id, old_residue_id): (new_chain_id, new_residue_id), ... }``

    ``allow_multichain_in_atom`` 
        | Used for resolving chain id,  can be set to either ``allow_multichain_in_atom=True`` or ``allow_multichain_in_atom=False``, default value is ``Flase``.
        | When set to ``True``, it allows multiple chain IDs to appear within the same chain that consists of ATOM records. Although this conflicts with the standard PDB file format definition, it is useful for resolving chain IDs of multi-chain PDB files exported by PyMOL2.

Example Code
------------------------------------------------

- Generate a ``Structure`` objects from a simple PDB file

In this example, we use ``PDBParser`` to process a single-chain, single-model PDB file. We aim to import the PDB file as a ``Structure`` object.

.. admonition:: How input is prepared
    
    For ``PDBParser.get_structure()``, need to prepare: 

    ``path``
        The file path of the PDB file. In this example, the PDB file was downloaded from the Protein Data Bank with the ID "8K4Z" and is named ``"./8k4z.pdb"``. 
        (See `PDBParser.get_structure() <#pdbparser-get-structure>`_ section)

    ``add_solvent_list``
        According to the PDB file, if you want to prevent the chloride ion from being discarded as trash, you can categorize the chloride ion (named "CL" in the PDB file) as a solvent.

    ``add_ligand_list``
        According to the PDB file, if you want to prevent the chloride ion from being discarded as trash, you can categorize the chloride ion (named "CL" in the PDB file) as a ligand.

.. code:: python

    from enzy_htp.structure import PDBParser
    
    test_A="./8k4z.pdb"
    test_A_struc1 = PDBParser.get_structure(path=test_A, 
                                           add_solvent_list=["CL"], #In this way, the CL will not be treated as trash.
                                           remove_trash=True)
    
    test_A_struc2 = PDBParser.get_structure(path=test_A, 
                                           add_ligand_list=["CL"], #In this way, the CL will not be treated as trash.
                                           remove_trash=True)
    

PDBParser.save_structure()
==============================================


Input
------------------------------------------------

``outfile``
    Path for saving the ``Structure()`` object as a string.

    .. admonition:: How to obtain

        Define the save path as a string, e.g., ``outfile='./save_pro.pdb'``.

``stru``
    The ``Structure()`` object to be saved.

    .. admonition:: How to obtain

        (See `PDBParser.get_structure() <#pdbparser-get-structure>`_ section)


Output
------------------------------------------------

``str()``
    Path where the ``Structure()`` was saved, returned as a string.

Arguments
------------------------------------------------

.. dropdown:: :fa:`eye,mr-1` Click to see more details

    ``outfile``
        Path for saving the ``Structure()`` object as a string.
        (See `PDBParser.save_structure() <#pdbparser-save-structure>`_ section)
    
    ``stru``
        The ``Structure()`` object to be saved.
        (See `PDBParser.save_structure() <#pdbparser-save-structure>`_ section)
    
    ``if_renumber``
        Determines whether to renumber atoms starting from 1, can be set to either ``if_renumber=True`` or  ``if_renumber=False``, default value is ``True``.
    
    ``if_fix_atomname``
        Determines whether atom names should be adjusted to conform to PDB conventions, can be set to either ``if_fix_atomname=True`` or ``if_fix_atomname=False``. The default value is True, which ensures that   atom names are automatically converted to the standard PDB format.
    
    
Example Code
------------------------------------------------

- Save a ``Structure`` objects to a PDB file

In this example, we use ``PDBParser`` to export ``Structure`` object as a new PDB file.

.. admonition:: How input is prepared

    For ``PDBParser.save_structure()``, need to prepare:
        
    ``outfile``
        The path to save the ``Structure`` object as a string. In this example, we save the structure as ``"./2v7m_new.pdb"``

    ``stru``
        The ``Structure()`` object obtained from ``PDBParser.get_structure()``
        (See `PDBParser.get_structure() <#pdbparser-get-structure>`_ section)

.. code:: python

    from enzy_htp.structure import PDBParser
    
    test_A="./2v7m.pdb"
    test_A_struc = PDBParser.get_structure(path=test_A)
    test_A_saved_path = PDBParser.save_structure(outfile="./2v7m_new.pdb",
                                                 stru=test_A_struc)
    print(test_A_saved_path) #./2v7m_new.pdb

=========================================================================================

Author: Xingyu Ouyang <ouyangxingyu913@gmail.com>