A python package to parse PDBx file into Pandas DataFrames.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

pdbx2df

Parse a PDBx file (mmCIF file: pdb_id.cif) into a python dict with PDBx category names as keys and contents belonging to the category as the corresponding values. Each category content is parsed as a Pandas DataFrame whose columns are the attribute names. On the other hand, we can write a dict of Pandas DataFrame(s) into a PDBx format in which the dict key(s) are used as category names, the DataFrame column names as attribute names, and the DataFrame row(s) as the corresponding record(s).

Also supports parsing a PDB file (pdb_id.pdb) into a python dict of Pandas DataFrames. Currently, only the lines starting with 'ATOM', 'HETATM', and 'TER' are read into a category named '_atom_site' which corresponds to the same category in a mmCIF file.

Requirements

Pandas (>=1.0)

Install

pip install pdbx2df

Usage examples

If you want to read the 3D coordinates for PDB 1vii into a Pandas DataFrame, and you have downloaded the 1vii.cif file to your current working directory ./, you can:

from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_atom_site'])
atoms_df = pdbx['_atom_site']
# 'atoms_df' is a Pandas DataFrame containing the '_atom_site' category which has the detailed 3D coordinates for each atom.

If you want to read the FASTA sequence of 1vii, you can:

from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_entity_poly'])
fasta_df = pdbx['_entity_poly']
fasta = fasta_df['pdbx_seq_one_letter_code_can'].to_list()[0]  # 1vii only has one sequence
# fasta == 'MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF'

You can read them simutanously:

from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_entity_poly', '_atom_site'])
atoms_df = pdbx['_atom_site']
fasta_df = pdbx['_entity_poly']

Putting a list of category names to category_names, you will get them if they are in the PDBx file.

You can parse the whole file by using 'all':

from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['all'])
atoms_df = pdbx['_atom_site']
fasta_df = pdbx['_entity_poly']
# and more

Write back to a PDBx file:

from pdbx2df import read_pdbx, write_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['all'])
keep = ['_atom_site', '_entity_poly']  # suppose we only want to keep the FASTA sequence and 3D coordinates.
pdbx_keep = {k: v for k, v in pdbx.items() if k in keep}
write_pdbx(pdbx_keep, '1vii_save.cif')

For reading the atomic information in a PDB file 1vii.pdb:

from pdbx2df import read_pdb
pdb_file = './1vii.pdb'
pdb = read_pdb(pdb_file, category_names=['_atom_site'])  # We use '_atom_site' here to mirror the mmCIF format
atoms_df = pdb['_atom_site']
# 'atoms_df' is a Pandas DataFrame containing the '_atom_site' category which has the detailed 3D coordinates for each atom.

Suppose we only want to keep the residue atoms in 5u8l.pdb:

from pdbx2df import read_pdb, write_pdb
pdb_file = './5u8l.pdb'
pdb = read_pdb(pdb_file, category_names=['_atom_site'])
df = pdb['_atom_site']
df = df[df.record_name == 'ATOM']
pdb['_atom_site'] = df
write_pdb(pdb, '5u8l_nohetero.pdb')
# The '5u8l_nohetero.pdb' file contains only the protein residues.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.6.7

Sep 22, 2023

0.6.6

Sep 21, 2023

0.6.5

Sep 19, 2023

0.6.4

Sep 15, 2023

0.6.3

Sep 13, 2023

0.6.2

Sep 11, 2023

0.6.1

Sep 6, 2023

0.6.0

Sep 4, 2023

0.5.6

Sep 1, 2023

0.5.5

Aug 31, 2023

0.5.4

Aug 30, 2023

This version

0.5.3

Aug 29, 2023

0.5.2

Aug 29, 2023

0.5.1

Aug 27, 2023

0.5.0

Aug 23, 2023

0.4.1

Aug 22, 2023

0.4.0

Aug 21, 2023

0.3.0

May 6, 2023

0.2.3

May 5, 2023

0.2.2

May 5, 2023

0.2.1

Jan 27, 2023

0.1.0

Jan 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdbx2df-0.5.3.tar.gz (14.5 kB view hashes)

Uploaded Aug 29, 2023 Source

Built Distribution

pdbx2df-0.5.3-py3-none-any.whl (15.6 kB view hashes)

Uploaded Aug 29, 2023 Python 3

Hashes for pdbx2df-0.5.3.tar.gz

Hashes for pdbx2df-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`4d2c952a4daff23e684a41eb2a3c199703be829487f760cc1e9216e3b5c186a4`
MD5	`5e8194cac01db35ebb36b656a0e1cca4`
BLAKE2b-256	`158fc8525c3862a458bc4b7bc7f30ba2e1554fa7b3bd35d07a70ab3fe1bcd2b6`

Hashes for pdbx2df-0.5.3-py3-none-any.whl

Hashes for pdbx2df-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f9e5701af157096ada72313edf4b6186a0b671dddfc673df5ba436e512e3e5c7`
MD5	`6f47e6af1ba11afee29437c0aef6bd30`
BLAKE2b-256	`d5ecae92c9abcba83c92c63585d687bbb6249b7e497170043275363b78124674`