Manipulate macromolecular coordinate data using data frames.
Project description
Macromolecular Data Frames
Macromol Dataframe is a library meant to help with processing macromolecular
coordinate data, e.g. mmCIF files downloaded from the Protein Data Bank (PDB).
The key idea behind this library is that the best way to work with such data is
by using data frames, specifically polars.DataFrame. The advantages of
this approach are:
-
Flexibility: Data frames are general-purpose data processing tools, and are more than capable of accommodating any kind of analysis.
-
Performance: Data frames are meant for processing huge quantities of data, and are accordingly well-optimized. Polars in particular achieves very good performance by using techniques such as execution planning, SIMD instructions, and multi-threading.
-
Familiarity: Data scientists work with data frames all the time, so using them here lowers the learning curve and makes this library easy to get started with. There's not much to learn!
Here's an example showing how to load a specific biological assembly from an mmCIF file:
>>> import macromol_dataframe as mmdf
>>> df = mmdf.read_biological_assembly('6uad.cif.gz', model_id='1', assembly_id='1')
>>> df.select('seq_id', 'comp_id', 'atom_id', 'x', 'y', 'z')
shape: (2_312, 6)
┌────────┬─────────┬─────────┬───────────┬──────────┬──────────┐
│ seq_id ┆ comp_id ┆ atom_id ┆ x ┆ y ┆ z │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ f64 ┆ f64 ┆ f64 │
╞════════╪═════════╪═════════╪═══════════╪══════════╪══════════╡
│ 2 ┆ ASN ┆ N ┆ -9.89268 ┆ 25.4788 ┆ 9.32073 │
│ 2 ┆ ASN ┆ CA ┆ -11.30656 ┆ 25.42029 ┆ 8.91019 │
│ 2 ┆ ASN ┆ C ┆ -12.19303 ┆ 26.2788 ┆ 9.79681 │
│ 2 ┆ ASN ┆ O ┆ -12.48258 ┆ 25.8771 ┆ 10.91766 │
│ 2 ┆ ASN ┆ CB ┆ -11.82931 ┆ 23.99427 ┆ 8.9393 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ null ┆ HOH ┆ O ┆ -41.101 ┆ 23.389 ┆ 7.03 │
│ null ┆ HOH ┆ O ┆ -4.60757 ┆ 22.48844 ┆ 9.93407 │
│ null ┆ HOH ┆ O ┆ -22.48104 ┆ 27.68223 ┆ -4.26327 │
│ null ┆ HOH ┆ O ┆ -38.8232 ┆ 17.99957 ┆ 9.24767 │
│ null ┆ HOH ┆ O ┆ -40.22527 ┆ 15.63538 ┆ 7.88049 │
└────────┴─────────┴─────────┴───────────┴──────────┴──────────┘
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file macromol_dataframe-0.9.0.tar.gz.
File metadata
- Download URL: macromol_dataframe-0.9.0.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a24f85d0b4686060e4f4ef06b9c8d1b10c362278372ef7803e931246686828b
|
|
| MD5 |
b1e31762415ade4e0a20d40fad74a932
|
|
| BLAKE2b-256 |
11eb0652a0a3dc1e629e62706327065d573e16a6366f32aea86752bc6549b089
|
Provenance
The following attestation bundles were made for macromol_dataframe-0.9.0.tar.gz:
Publisher:
release.yml on kalekundert/macromol_dataframe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
macromol_dataframe-0.9.0.tar.gz -
Subject digest:
0a24f85d0b4686060e4f4ef06b9c8d1b10c362278372ef7803e931246686828b - Sigstore transparency entry: 192542004
- Sigstore integration time:
-
Permalink:
kalekundert/macromol_dataframe@63a4fa6d25e6eec455855a6a1bb28c57b642e702 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/kalekundert
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@63a4fa6d25e6eec455855a6a1bb28c57b642e702 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file macromol_dataframe-0.9.0-py3-none-any.whl.
File metadata
- Download URL: macromol_dataframe-0.9.0-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a8984a226ac3acdec9a3e5c2093cbbb859f32a86894136786736251f6ccbe10
|
|
| MD5 |
7ae3719ed3212f11a8e683fa621deaa4
|
|
| BLAKE2b-256 |
de18017b3d7c2cac642de6927ae885045adc6cb22c48be87a15ad50d3a0d0630
|
Provenance
The following attestation bundles were made for macromol_dataframe-0.9.0-py3-none-any.whl:
Publisher:
release.yml on kalekundert/macromol_dataframe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
macromol_dataframe-0.9.0-py3-none-any.whl -
Subject digest:
6a8984a226ac3acdec9a3e5c2093cbbb859f32a86894136786736251f6ccbe10 - Sigstore transparency entry: 192542008
- Sigstore integration time:
-
Permalink:
kalekundert/macromol_dataframe@63a4fa6d25e6eec455855a6a1bb28c57b642e702 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/kalekundert
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@63a4fa6d25e6eec455855a6a1bb28c57b642e702 -
Trigger Event:
workflow_dispatch
-
Statement type: