Skip to main content

RCSB Python I/O Utility Classes

Project description

RCSB Python I/O Utility Classes

Build Status

Introduction

This module contains a collection of utility classes for performing I/O operations on common file formats encountered in the PDB data repository.

Installation

Download the library source software from the project repository:

git clone --recurse-submodules https://github.com/rcsb/py-rcsb_utils_io.git

Optionally, run test suite (Python versions 2.7, and 3.9) using setuptools or tox:

python setup.py test

or simply run

tox

Installation is via the program pip.

pip install rcsb.utils.io

or from the local repository:

pip install .

Usage

The MarshalUtil offers an easy way for reading in and writing out files in various formats, including CSV, JSON, pickle, mmCIF, bcif (BinaryCIF), fasta , and "list" files (plain text file in which each row is a list item).

Reading files

Let's say you have a JSON file, "data.json". You can read this in by:

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil(workDir=".")

dataD = mU.doImport("data.json", fmt="json")

The same method works even if the file is compressed (e.g., "data.json.gz"):

dataD = mU.doImport("data.json.gz", fmt="json")

Note that this automatic handling of compressed gzip files applies to any type of input format.

You can also import remote files directly from the command line, e.g.:

dataD = mU.doImport("https://files.rcsb.org/pub/pdb/holdings/current_file_holdings.json.gz", fmt="json")

To read in a pickle file, "data.pic":

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

dataD = mU.doImport("data.pic", fmt="pickle")

To read in and parse an mmCIF file, "4hhb.cif.gz":

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

# Read all data containers from the mmCIF file into `dataContainerList`
dataContainerList = mU.doImport("https://files.rcsb.org/pub/pdb/data/structures/divided/mmCIF/hh/4hhb.cif.gz", fmt="mmcif")

# Get the first dataContainer (in most cases, there will only be one container in the file)
dataContainer = dataContainerList[0]

# Print the name of the container
eName = dataContainer.getName()
print(eName)

# Get the list of categories
catNameList = dataContainer.getObjNameList()
print(catNameList)

# Iterate over all the categories and attributes and store them in a new dictionary 
cifDataD = {}
for catName in catNameList:
    if not dataContainer.exists(catName):
        continue
    dObj = dataContainer.getObj(catName)
    for ii in range(dObj.getRowCount()):
        dD = dObj.getRowAttributeDict(ii)
        cifDataD.setdefault(eName, {}).setdefault(catName, []).append(dD)

For more examples, see testMarshallUtil.py.

Writing files

You can use the MarshalUtil to write out the following data structures into the corresponding file formats:

 Object            |  Output `fmt`
-------------------------------------
 list              |  list
 dict              |  json or pickle
 DataContainerList |  mmcif or bcif

For example, if you have a dictionary, dataD, you can export it via:

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

dataD = {"name": "John Doe", "age": "33"}

mU.doExport("data.json", dataD, fmt="json", indent=2)

# Or, to export and compress as gzip:
mU.doExport("data.json.gz", dataD, fmt="json", indent=2)

Project details


Release history Release notifications | RSS feed

This version

1.49

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rcsb_utils_io-1.49.tar.gz (47.6 kB view details)

Uploaded Source

File details

Details for the file rcsb_utils_io-1.49.tar.gz.

File metadata

  • Download URL: rcsb_utils_io-1.49.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for rcsb_utils_io-1.49.tar.gz
Algorithm Hash digest
SHA256 c38c3730d8a7c92139e0e2408ef5d49919d7a73283206bf1a63068f10f1c8952
MD5 b3322df9808575deed7e6bed621afc12
BLAKE2b-256 c4a28bead58f1421401a760990ae7ba1c43ae37644d36277beeabe912d72126f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page