Skip to main content

Manipulate macromolecular coordinate data using data frames.

Project description

Macromolecular Data Frames

Last release Python version Documentation Test status Test coverage Last commit

Macromol Dataframe is a library meant to help with processing macromolecular coordinate data, e.g. mmCIF files downloaded from the Protein Data Bank (PDB). The key idea behind this library is that the best way to work with such data is by using data frames, specifically polars.DataFrame. The advantages of this approach are:

  • Flexibility: Data frames are general-purpose data processing tools, and are more than capable of accommodating any kind of analysis.

  • Performance: Data frames are meant for processing huge quantities of data, and are accordingly well-optimized. Polars in particular achieves very good performance by using techniques such as execution planning, SIMD instructions, and multi-threading.

  • Familiarity: Data scientists work with data frames all the time, so using them here lowers the learning curve and makes this library easy to get started with. There's not much to learn!

Here's an example showing how to load a specific biological assembly from an mmCIF file:

>>> import macromol_dataframe as mmdf
>>> df = mmdf.read_biological_assembly('6uad.cif.gz', model_id='1', assembly_id='1')
>>> df.select('seq_id', 'comp_id', 'atom_id', 'x', 'y', 'z')
shape: (2_312, 6)
┌────────┬─────────┬─────────┬───────────┬──────────┬──────────┐
│ seq_id ┆ comp_id ┆ atom_id ┆ x         ┆ y        ┆ z        │
│ ---    ┆ ---     ┆ ---     ┆ ---       ┆ ---      ┆ ---      │
│ i64    ┆ str     ┆ str     ┆ f64       ┆ f64      ┆ f64      │
╞════════╪═════════╪═════════╪═══════════╪══════════╪══════════╡
│ 2      ┆ ASN     ┆ N       ┆ -9.89268  ┆ 25.4788  ┆ 9.32073  │
│ 2      ┆ ASN     ┆ CA      ┆ -11.30656 ┆ 25.42029 ┆ 8.91019  │
│ 2      ┆ ASN     ┆ C       ┆ -12.19303 ┆ 26.2788  ┆ 9.79681  │
│ 2      ┆ ASN     ┆ O       ┆ -12.48258 ┆ 25.8771  ┆ 10.91766 │
│ 2      ┆ ASN     ┆ CB      ┆ -11.82931 ┆ 23.99427 ┆ 8.9393   │
│ …      ┆ …       ┆ …       ┆ …         ┆ …        ┆ …        │
│ null   ┆ HOH     ┆ O       ┆ -41.101   ┆ 23.389   ┆ 7.03     │
│ null   ┆ HOH     ┆ O       ┆ -4.60757  ┆ 22.48844 ┆ 9.93407  │
│ null   ┆ HOH     ┆ O       ┆ -22.48104 ┆ 27.68223 ┆ -4.26327 │
│ null   ┆ HOH     ┆ O       ┆ -38.8232  ┆ 17.99957 ┆ 9.24767  │
│ null   ┆ HOH     ┆ O       ┆ -40.22527 ┆ 15.63538 ┆ 7.88049  │
└────────┴─────────┴─────────┴───────────┴──────────┴──────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macromol_dataframe-0.9.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

macromol_dataframe-0.9.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file macromol_dataframe-0.9.0.tar.gz.

File metadata

  • Download URL: macromol_dataframe-0.9.0.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for macromol_dataframe-0.9.0.tar.gz
Algorithm Hash digest
SHA256 0a24f85d0b4686060e4f4ef06b9c8d1b10c362278372ef7803e931246686828b
MD5 b1e31762415ade4e0a20d40fad74a932
BLAKE2b-256 11eb0652a0a3dc1e629e62706327065d573e16a6366f32aea86752bc6549b089

See more details on using hashes here.

Provenance

The following attestation bundles were made for macromol_dataframe-0.9.0.tar.gz:

Publisher: release.yml on kalekundert/macromol_dataframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file macromol_dataframe-0.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for macromol_dataframe-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a8984a226ac3acdec9a3e5c2093cbbb859f32a86894136786736251f6ccbe10
MD5 7ae3719ed3212f11a8e683fa621deaa4
BLAKE2b-256 de18017b3d7c2cac642de6927ae885045adc6cb22c48be87a15ad50d3a0d0630

See more details on using hashes here.

Provenance

The following attestation bundles were made for macromol_dataframe-0.9.0-py3-none-any.whl:

Publisher: release.yml on kalekundert/macromol_dataframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page