RCSB Python I/O Utility Classes
Project description
RCSB Python I/O Utility Classes
Introduction
This module contains a collection of utility classes for performing I/O operations on common file formats encountered in the PDB data repository.
Installation
Download the library source software from the project repository:
git clone --recurse-submodules https://github.com/rcsb/py-rcsb_utils_io.git
Optionally, run test suite (Python versions 2.7, and 3.9) using setuptools or tox:
python setup.py test
or simply run
tox
Installation is via the program pip.
pip install rcsb.utils.io
or from the local repository:
pip install .
Usage
The MarshalUtil
offers an easy way for reading in and writing out files in various formats, including CSV
, JSON
, pickle
, mmCIF
, bcif
(BinaryCIF), fasta
, and "list" files (plain text file in which each row is a list item).
Reading files
Let's say you have a JSON file, "data.json"
. You can read this in by:
from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil(workDir=".")
dataD = mU.doImport("data.json", fmt="json")
The same method works even if the file is compressed (e.g., "data.json.gz"
):
dataD = mU.doImport("data.json.gz", fmt="json")
Note that this automatic handling of compressed gzip
files applies to any type of input format.
You can also import remote files directly from the command line, e.g.:
dataD = mU.doImport("https://files.rcsb.org/pub/pdb/holdings/current_file_holdings.json.gz", fmt="json")
To read in a pickle
file, "data.pic"
:
from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()
dataD = mU.doImport("data.pic", fmt="pickle")
To read in and parse an mmCIF
file, "4hhb.cif.gz"
:
from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()
# Read all data containers from the mmCIF file into `dataContainerList`
dataContainerList = mU.doImport("https://files.rcsb.org/pub/pdb/data/structures/divided/mmCIF/hh/4hhb.cif.gz", fmt="mmcif")
# Get the first dataContainer (in most cases, there will only be one container in the file)
dataContainer = dataContainerList[0]
# Print the name of the container
eName = dataContainer.getName()
print(eName)
# Get the list of categories
catNameList = dataContainer.getObjNameList()
print(catNameList)
# Iterate over all the categories and attributes and store them in a new dictionary
cifDataD = {}
for catName in catNameList:
if not dataContainer.exists(catName):
continue
dObj = dataContainer.getObj(catName)
for ii in range(dObj.getRowCount()):
dD = dObj.getRowAttributeDict(ii)
cifDataD.setdefault(eName, {}).setdefault(catName, []).append(dD)
For more examples, see testMarshallUtil.py.
Writing files
You can use the MarshalUtil
to write out the following data structures into the corresponding file formats:
Object | Output `fmt`
-------------------------------------
list | list
dict | json or pickle
DataContainerList | mmcif or bcif
For example, if you have a dictionary, dataD
, you can export it via:
from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()
dataD = {"name": "John Doe", "age": "33"}
mU.doExport("data.json", dataD, fmt="json", indent=2)
# Or, to export and compress as gzip:
mU.doExport("data.json.gz", dataD, fmt="json", indent=2)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rcsb_utils_io-1.49.tar.gz
.
File metadata
- Download URL: rcsb_utils_io-1.49.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c38c3730d8a7c92139e0e2408ef5d49919d7a73283206bf1a63068f10f1c8952 |
|
MD5 | b3322df9808575deed7e6bed621afc12 |
|
BLAKE2b-256 | c4a28bead58f1421401a760990ae7ba1c43ae37644d36277beeabe912d72126f |