Skip to main content

Manipulate macromolecular coordinate data using data frames.

Project description

Macromolecular Data Frames

Last release Python version Documentation Test status Test coverage Last commit

Macromol Dataframe is a library meant to help with processing macromolecular coordinate data, e.g. mmCIF files downloaded from the Protein Data Bank (PDB). The key idea behind this library is that the best way to work with such data is by using data frames, specifically polars.DataFrame. The advantages of this approach are:

  • Flexibility: Data frames are general-purpose data processing tools, and are more than capable of accommodating any kind of analysis.

  • Performance: Data frames are meant for processing huge quantities of data, and are accordingly well-optimized. Polars in particular achieves very good performance by using techniques such as execution planning, SIMD instructions, and multi-threading.

  • Familiarity: Data scientists work with data frames all the time, so using them here lowers the learning curve and makes this library easy to get started with. There's not much to learn!

Here's an example showing how to load a specific biological assembly from an mmCIF file:

>>> import macromol_dataframe as mmdf
>>> df = mmdf.read_biological_assembly('6uad.cif.gz', model_id='1', assembly_id='1')
>>> df.select('seq_id', 'comp_id', 'atom_id', 'x', 'y', 'z')
shape: (2_312, 6)
┌────────┬─────────┬─────────┬───────────┬──────────┬──────────┐
│ seq_id ┆ comp_id ┆ atom_id ┆ x         ┆ y        ┆ z        │
│ ---    ┆ ---     ┆ ---     ┆ ---       ┆ ---      ┆ ---      │
│ i64    ┆ str     ┆ str     ┆ f64       ┆ f64      ┆ f64      │
╞════════╪═════════╪═════════╪═══════════╪══════════╪══════════╡
│ 2      ┆ ASN     ┆ N       ┆ -9.89268  ┆ 25.4788  ┆ 9.32073  │
│ 2      ┆ ASN     ┆ CA      ┆ -11.30656 ┆ 25.42029 ┆ 8.91019  │
│ 2      ┆ ASN     ┆ C       ┆ -12.19303 ┆ 26.2788  ┆ 9.79681  │
│ 2      ┆ ASN     ┆ O       ┆ -12.48258 ┆ 25.8771  ┆ 10.91766 │
│ 2      ┆ ASN     ┆ CB      ┆ -11.82931 ┆ 23.99427 ┆ 8.9393   │
│ …      ┆ …       ┆ …       ┆ …         ┆ …        ┆ …        │
│ null   ┆ HOH     ┆ O       ┆ -41.101   ┆ 23.389   ┆ 7.03     │
│ null   ┆ HOH     ┆ O       ┆ -4.60757  ┆ 22.48844 ┆ 9.93407  │
│ null   ┆ HOH     ┆ O       ┆ -22.48104 ┆ 27.68223 ┆ -4.26327 │
│ null   ┆ HOH     ┆ O       ┆ -38.8232  ┆ 17.99957 ┆ 9.24767  │
│ null   ┆ HOH     ┆ O       ┆ -40.22527 ┆ 15.63538 ┆ 7.88049  │
└────────┴─────────┴─────────┴───────────┴──────────┴──────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macromol_dataframe-0.4.0.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

macromol_dataframe-0.4.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file macromol_dataframe-0.4.0.tar.gz.

File metadata

  • Download URL: macromol_dataframe-0.4.0.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for macromol_dataframe-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e2bad46710b17e04dc1636d920b82996c789ff91005a313db5d54089b492c92d
MD5 dabebf2e2493c7ea209caa6fcadf5a4c
BLAKE2b-256 4fc30eb3157c243a80ff8b7d06e9215084b123e0095b4c466fa721d4d8b4a9d8

See more details on using hashes here.

File details

Details for the file macromol_dataframe-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for macromol_dataframe-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 176285bd929d98e0e5527fb12324dc7ee5f29a29d979b0a1e646a9c01631d4a6
MD5 eb8bd2870e04212872387b417bf2bda1
BLAKE2b-256 fa07fd3f16f5795027572f1645b2fe2e2ec2e20b0eed26d3f332e2bd77e87311

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page