Skip to main content

RCSB Python I/O Utility Classes

Project description

RCSB Python I/O Utility Classes

Build Status

Introduction

This module contains a collection of utility classes for performing I/O operations on common file formats encountered in the PDB data repository.

Installation

Download the library source software from the project repository:

git clone --recurse-submodules https://github.com/rcsb/py-rcsb_utils_io.git

Optionally, run test suite (Python versions 2.7, and 3.9) using setuptools or tox:

python setup.py test

or simply run

tox

Installation is via the program pip.

pip install rcsb.utils.io

or from the local repository:

pip install .

Usage

The MarshalUtil offers an easy way for reading in and writing out files in various formats, including CSV, JSON, pickle, mmCIF, bcif (BinaryCIF), fasta , and "list" files (plain text file in which each row is a list item).

Reading files

Let's say you have a JSON file, "data.json". You can read this in by:

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil(workDir=".")

dataD = mU.doImport("data.json", fmt="json")

The same method works even if the file is compressed (e.g., "data.json.gz"):

dataD = mU.doImport("data.json.gz", fmt="json")

Note that this automatic handling of compressed gzip files applies to any type of input format.

You can also import remote files directly from the command line, e.g.:

dataD = mU.doImport("https://files.rcsb.org/pub/pdb/holdings/current_file_holdings.json.gz", fmt="json")

To read in a pickle file, "data.pic":

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

dataD = mU.doImport("data.pic", fmt="pickle")

To read in and parse an mmCIF file, "4hhb.cif.gz":

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

# Read all data containers from the mmCIF file into `dataContainerList`
dataContainerList = mU.doImport("https://files.rcsb.org/pub/pdb/data/structures/divided/mmCIF/hh/4hhb.cif.gz", fmt="mmcif")

# Get the first dataContainer (in most cases, there will only be one container in the file)
dataContainer = dataContainerList[0]

# Print the name of the container
eName = dataContainer.getName()
print(eName)

# Get the list of categories
catNameList = dataContainer.getObjNameList()
print(catNameList)

# Iterate over all the categories and attributes and store them in a new dictionary 
cifDataD = {}
for catName in catNameList:
    if not dataContainer.exists(catName):
        continue
    dObj = dataContainer.getObj(catName)
    for ii in range(dObj.getRowCount()):
        dD = dObj.getRowAttributeDict(ii)
        cifDataD.setdefault(eName, {}).setdefault(catName, []).append(dD)

For more examples, see testMarshallUtil.py.

Writing files

You can use the MarshalUtil to write out the following data structures into the corresponding file formats:

 Object            |  Output `fmt`
-------------------------------------
 list              |  list
 dict              |  json or pickle
 DataContainerList |  mmcif or bcif

For example, if you have a dictionary, dataD, you can export it via:

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

dataD = {"name": "John Doe", "age": "33"}

mU.doExport("data.json", dataD, fmt="json", indent=2)

# Or, to export and compress as gzip:
mU.doExport("data.json.gz", dataD, fmt="json", indent=2)

Project details


Release history Release notifications | RSS feed

This version

1.48

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rcsb_utils_io-1.48.tar.gz (47.5 kB view details)

Uploaded Source

File details

Details for the file rcsb_utils_io-1.48.tar.gz.

File metadata

  • Download URL: rcsb_utils_io-1.48.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for rcsb_utils_io-1.48.tar.gz
Algorithm Hash digest
SHA256 954704e9a7bddbaf4463eb4458ec83276cf8286037593565b8f157be76be7c00
MD5 8bf88d8058aaeb4ab232f25c1edb03fd
BLAKE2b-256 0ede9cf42a40631e4c0c1d76c94d5dbc72ac12e184f7458f131e6a00e59551f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page