Python wrapper around [noodles](https://github.com/zaeleus/noodles).
Project description
bionemo-noodles
bionemo-noodles is a Python wrapper of noodles that supports memmap-based file I/O for FASTA files.
Installation
To install from PyPI, execute the following command:
pip install bionemo-noodles
Compatibility
bionemo-noodles has pre-built wheels for Python/Cython 3.10, 3.11, and 3.12, and is compatible with manylinux_2_28 on x86_64.
For a custom build configuration that is not currently supported on PyPI, reach out to: bionemofeedback@nvidia.com
Usage
An example torch.utils.data.Dataset using NvFaidx / bionemo-noodles:
import json
from pathlib import Path
import torch
from bionemo.noodles.nvfaidx import NvFaidx
class SimpleFastaDataset(torch.utils.data.Dataset):
def __init__(self, fasta_path: Path, tokenizer):
"""Initialize the dataset."""
super().__init__()
self.fasta = NvFaidx(fasta_path)
self.seqids = sorted(self.fasta.keys())
self.tokenizer = tokenizer
def write_idx_map(self, output_dir: Path):
"""Write the index map to the output directory."""
with open(output_dir / "seq_idx_map.json", "w") as f:
json.dump({seqid: idx for idx, seqid in enumerate(self.seqids)}, f)
def __len__(self):
"""Get the length of the dataset."""
return len(self.seqids)
def __getitem__(self, idx: int) -> dict[str, torch.Tensor]:
"""Get an item from the dataset."""
sequence = self.fasta[self.seqids[idx]].sequence().upper()
tokenized_seq = self.tokenizer.text_to_ids(sequence)
loss_mask = torch.ones_like(torch.tensor(tokenized_seq, dtype=torch.long), dtype=torch.long)
return {
"tokens": torch.tensor(tokenized_seq, dtype=torch.long),
"position_ids": torch.arange(len(tokenized_seq), dtype=torch.long),
"seq_idx": torch.tensor(idx, dtype=torch.long),
"loss_mask": loss_mask,
}
BioNeMo Framework Ecosystem Development
To install this sub-package locally (with --editable):
pip install -e .
To run unit tests, execute:
pytest -v .
To build wheels for different Python, Linux, and system architecture configurations, run the BioNeMo Sub-Package GitHub Actions Workflow (bionemo-subpackage-ci.yml)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 278.0 kB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b32a6a49c96b1ef409ae726239bcfecadee49de91fdbf681dd8bcd3082de233b
|
|
| MD5 |
8a40719b0fb2fa870c20c3b5c0207e57
|
|
| BLAKE2b-256 |
ae2ad8900358391bf85f77886e81e103d4ab75f1398b4977960a7b81415884a7
|
Provenance
The following attestation bundles were made for bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl:
Publisher:
bionemo-subpackage-ci.yml on NVIDIA/bionemo-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl -
Subject digest:
b32a6a49c96b1ef409ae726239bcfecadee49de91fdbf681dd8bcd3082de233b - Sigstore transparency entry: 193582297
- Sigstore integration time:
-
Permalink:
NVIDIA/bionemo-framework@21157c40ea302351c1e27ab50d1d38a854943172 -
Branch / Tag:
refs/heads/cye/ml-subpackage-ci - Owner: https://github.com/NVIDIA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
bionemo-subpackage-ci.yml@21157c40ea302351c1e27ab50d1d38a854943172 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 275.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5460157d45c830edca04a22a17b7b5a66f528e57279d672bba4f0068914c0f98
|
|
| MD5 |
e756aa7bd76d5b97f5ffda5fa87c6cd1
|
|
| BLAKE2b-256 |
c40cff22aeca872d48695c81d752a4b8dd10a1543aacefc04c9473564b4c1806
|
Provenance
The following attestation bundles were made for bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl:
Publisher:
bionemo-subpackage-ci.yml on NVIDIA/bionemo-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl -
Subject digest:
5460157d45c830edca04a22a17b7b5a66f528e57279d672bba4f0068914c0f98 - Sigstore transparency entry: 193581991
- Sigstore integration time:
-
Permalink:
NVIDIA/bionemo-framework@21157c40ea302351c1e27ab50d1d38a854943172 -
Branch / Tag:
refs/heads/cye/ml-subpackage-ci - Owner: https://github.com/NVIDIA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
bionemo-subpackage-ci.yml@21157c40ea302351c1e27ab50d1d38a854943172 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 275.5 kB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0f8a4c1a314782fff136738dbd0e04d04498e50c26a6ee9b0bfd7236f4141c4
|
|
| MD5 |
d98bb740a66ddf8986907db853fb7278
|
|
| BLAKE2b-256 |
81f9ee586c4056e48fd74e2d38856d01e9aaa64ce5e4334ea4aed6be0ecc0c68
|
Provenance
The following attestation bundles were made for bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl:
Publisher:
bionemo-subpackage-ci.yml on NVIDIA/bionemo-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl -
Subject digest:
f0f8a4c1a314782fff136738dbd0e04d04498e50c26a6ee9b0bfd7236f4141c4 - Sigstore transparency entry: 193582009
- Sigstore integration time:
-
Permalink:
NVIDIA/bionemo-framework@21157c40ea302351c1e27ab50d1d38a854943172 -
Branch / Tag:
refs/heads/cye/ml-subpackage-ci - Owner: https://github.com/NVIDIA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
bionemo-subpackage-ci.yml@21157c40ea302351c1e27ab50d1d38a854943172 -
Trigger Event:
workflow_dispatch
-
Statement type: