Manipulate macromolecular coordinate data using data frames.
Project description
Macromolecular Data Frames
Macromol Dataframe is a library meant to help with processing macromolecular
coordinate data, e.g. mmCIF files downloaded from the Protein Data Bank (PDB).
The key idea behind this library is that the best way to work with such data is
by using data frames, specifically polars.DataFrame
. The advantages of
this approach are:
-
Flexibility: Data frames are general-purpose data processing tools, and are more than capable of accommodating any kind of analysis.
-
Performance: Data frames are meant for processing huge quantities of data, and are accordingly well-optimized. Polars in particular achieves very good performance by using techniques such as execution planning, SIMD instructions, and multi-threading.
-
Familiarity: Data scientists work with data frames all the time, so using them here lowers the learning curve and makes this library easy to get started with. There's not much to learn!
Here's an example showing how to load a specific biological assembly from an mmCIF file:
>>> import macromol_dataframe as mmdf
>>> df = mmdf.read_biological_assembly('6uad.cif.gz', model_id='1', assembly_id='1')
>>> df.select('seq_id', 'comp_id', 'atom_id', 'x', 'y', 'z')
shape: (2_312, 6)
┌────────┬─────────┬─────────┬───────────┬──────────┬──────────┐
│ seq_id ┆ comp_id ┆ atom_id ┆ x ┆ y ┆ z │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ f64 ┆ f64 ┆ f64 │
╞════════╪═════════╪═════════╪═══════════╪══════════╪══════════╡
│ 2 ┆ ASN ┆ N ┆ -9.89268 ┆ 25.4788 ┆ 9.32073 │
│ 2 ┆ ASN ┆ CA ┆ -11.30656 ┆ 25.42029 ┆ 8.91019 │
│ 2 ┆ ASN ┆ C ┆ -12.19303 ┆ 26.2788 ┆ 9.79681 │
│ 2 ┆ ASN ┆ O ┆ -12.48258 ┆ 25.8771 ┆ 10.91766 │
│ 2 ┆ ASN ┆ CB ┆ -11.82931 ┆ 23.99427 ┆ 8.9393 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ null ┆ HOH ┆ O ┆ -41.101 ┆ 23.389 ┆ 7.03 │
│ null ┆ HOH ┆ O ┆ -4.60757 ┆ 22.48844 ┆ 9.93407 │
│ null ┆ HOH ┆ O ┆ -22.48104 ┆ 27.68223 ┆ -4.26327 │
│ null ┆ HOH ┆ O ┆ -38.8232 ┆ 17.99957 ┆ 9.24767 │
│ null ┆ HOH ┆ O ┆ -40.22527 ┆ 15.63538 ┆ 7.88049 │
└────────┴─────────┴─────────┴───────────┴──────────┴──────────┘
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file macromol_dataframe-0.4.0.tar.gz
.
File metadata
- Download URL: macromol_dataframe-0.4.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2bad46710b17e04dc1636d920b82996c789ff91005a313db5d54089b492c92d |
|
MD5 | dabebf2e2493c7ea209caa6fcadf5a4c |
|
BLAKE2b-256 | 4fc30eb3157c243a80ff8b7d06e9215084b123e0095b4c466fa721d4d8b4a9d8 |
File details
Details for the file macromol_dataframe-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: macromol_dataframe-0.4.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 176285bd929d98e0e5527fb12324dc7ee5f29a29d979b0a1e646a9c01631d4a6 |
|
MD5 | eb8bd2870e04212872387b417bf2bda1 |
|
BLAKE2b-256 | fa07fd3f16f5795027572f1645b2fe2e2ec2e20b0eed26d3f332e2bd77e87311 |