A python package to parse PDBx file into Pandas DataFrames.
Project description
pdbx2df
Parse a PDBx file (mmCIF file: pdb_id.cif) into a python dict with PDBx category names as keys and contents belonging to the category as the corresponding values. Each category content is parsed as a Pandas DataFrame whose columns are the attribute names.
Requirements
. Pandas (>=1.0)
Install
pip install pdbx2df
Usage examples
- If you want to read the 3D coordinates for PDB
1vii
into a Pandas DataFrame, and you have downloaded the1vii.cif
file to your current working directory./
, you can:
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_atom_site'])
atoms_df = pdbx['_atom_site']
# 'atoms_df' is a Pandas DataFrame containing the '_atom_site' category which has the detailed 3D coordinates for each atom.
- If you want to read the FASTA sequence of
1vii
, you can:
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_entity_poly'])
fasta_df = pdbx['_entity_poly']
fasta = fasta_df['pdbx_seq_one_letter_code_can'].to_list()[0] # 1vii only has one sequence
# fasta == 'MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF'
- You can read them simutanously:
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_entity_poly', '_atom_site'])
atoms_df = pdbx['_atom_site']
fasta_df = pdbx['_entity_poly']
Putting a list of category names to category_names
, you will get them if they are in the PDBx file.
- You can parse the whole file by using 'all':
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['all'])
atoms_df = pdbx['_atom_site']
fasta_df = pdbx['_entity_poly']
# and more
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdbx2df-0.2.2.tar.gz
(8.3 kB
view hashes)