Skip to main content

RCSB Python I/O Utility Classes

Project description

RCSB Python I/O Utility Classes

Build Status

Introduction

This module contains a collection of utility classes for performing I/O operations on common file formats encountered in the PDB data repository.

Installation

Download the library source software from the project repository:

git clone --recurse-submodules https://github.com/rcsb/py-rcsb_utils_io.git

Optionally, run test suite (Python versions 3.9) using tox:

tox

Installation is via the program pip.

pip install rcsb.utils.io

or from the local repository:

pip install .

Usage

The MarshalUtil offers an easy way for reading in and writing out files in various formats, including CSV, JSON, pickle, mmCIF, bcif (BinaryCIF), fasta , and "list" files (plain text file in which each row is a list item).

Reading files

Let's say you have a JSON file, "data.json". You can read this in by:

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil(workDir=".")

dataD = mU.doImport("data.json", fmt="json")

The same method works even if the file is compressed (e.g., "data.json.gz"):

dataD = mU.doImport("data.json.gz", fmt="json")

Note that this automatic handling of compressed gzip files applies to any type of input format.

You can also import remote files directly from the command line, e.g.:

dataD = mU.doImport("https://files.rcsb.org/pub/pdb/holdings/current_file_holdings.json.gz", fmt="json")

To read in a pickle file, "data.pic":

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

dataD = mU.doImport("data.pic", fmt="pickle")

To read in and parse an mmCIF file, "4hhb.cif.gz":

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

# Read all data containers from the mmCIF file into `dataContainerList`
dataContainerList = mU.doImport("https://files.rcsb.org/pub/pdb/data/structures/divided/mmCIF/hh/4hhb.cif.gz", fmt="mmcif")

# Get the first dataContainer (in most cases, there will only be one container in the file)
dataContainer = dataContainerList[0]

# Print the name of the container
eName = dataContainer.getName()
print(eName)

# Get the list of categories
catNameList = dataContainer.getObjNameList()
print(catNameList)

# Iterate over all the categories and attributes and store them in a new dictionary 
cifDataD = {}
for catName in catNameList:
    if not dataContainer.exists(catName):
        continue
    dObj = dataContainer.getObj(catName)
    for ii in range(dObj.getRowCount()):
        dD = dObj.getRowAttributeDict(ii)
        cifDataD.setdefault(eName, {}).setdefault(catName, []).append(dD)

For more examples, see testMarshallUtil.py.

Writing files

You can use the MarshalUtil to write out the following data structures into the corresponding file formats:

 Object            |  Output `fmt`
-------------------------------------
 list              |  list
 dict              |  json or pickle
 DataContainerList |  mmcif or bcif

For example, if you have a dictionary, dataD, you can export it via:

from rcsb.utils.io.MarshalUtil import MarshalUtil
mU = MarshalUtil()

dataD = {"name": "John Doe", "age": "33"}

mU.doExport("data.json", dataD, fmt="json", indent=2)

# Or, to export and compress as gzip:
mU.doExport("data.json.gz", dataD, fmt="json", indent=2)

Project details


Release history Release notifications | RSS feed

This version

1.53

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rcsb_utils_io-1.53.tar.gz (60.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rcsb_utils_io-1.53-py3-none-any.whl (50.9 kB view details)

Uploaded Python 3

File details

Details for the file rcsb_utils_io-1.53.tar.gz.

File metadata

  • Download URL: rcsb_utils_io-1.53.tar.gz
  • Upload date:
  • Size: 60.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for rcsb_utils_io-1.53.tar.gz
Algorithm Hash digest
SHA256 c8991f428f6f282ee5642923b069304b4bd095a258bfe62fb1534e5392f085e1
MD5 8766217726be543030701b882f241354
BLAKE2b-256 85124c285c8508263502d9b03f20d6a9a061f7d376c15f06fa026834ae1cec4f

See more details on using hashes here.

File details

Details for the file rcsb_utils_io-1.53-py3-none-any.whl.

File metadata

  • Download URL: rcsb_utils_io-1.53-py3-none-any.whl
  • Upload date:
  • Size: 50.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for rcsb_utils_io-1.53-py3-none-any.whl
Algorithm Hash digest
SHA256 42925abc69c3cc41198d0a9a5cd48f2dfb5e1681df53bb9b603aa41689248fb8
MD5 0f78d3464c7734d694a6817d94a7a1d6
BLAKE2b-256 5dbbeccf43d98d835bd6339b3377ec723a73eaa95d64075d17fe15be70859b74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page