Skip to main content

Python wrapper around [noodles](https://github.com/zaeleus/noodles).

Project description

bionemo-noodles

bionemo-noodles is a Python wrapper of noodles that supports memmap-based file I/O for FASTA files.

Installation

To install from PyPI, execute the following command:

pip install bionemo-noodles

Compatibility

bionemo-noodles has pre-built wheels for Python/Cython 3.10, 3.11, and 3.12, and is compatible with manylinux_2_28 on x86_64.

For a custom build configuration that is not currently supported on PyPI, reach out to: bionemofeedback@nvidia.com

Usage

An example torch.utils.data.Dataset using NvFaidx / bionemo-noodles:

import json
from pathlib import Path

import torch

from bionemo.noodles.nvfaidx import NvFaidx

class SimpleFastaDataset(torch.utils.data.Dataset):

    def __init__(self, fasta_path: Path, tokenizer):
        """Initialize the dataset."""
        super().__init__()
        self.fasta = NvFaidx(fasta_path)
        self.seqids = sorted(self.fasta.keys())
        self.tokenizer = tokenizer

    def write_idx_map(self, output_dir: Path):
        """Write the index map to the output directory."""
        with open(output_dir / "seq_idx_map.json", "w") as f:
            json.dump({seqid: idx for idx, seqid in enumerate(self.seqids)}, f)

    def __len__(self):
        """Get the length of the dataset."""
        return len(self.seqids)

    def __getitem__(self, idx: int) -> dict[str, torch.Tensor]:
        """Get an item from the dataset."""
        sequence = self.fasta[self.seqids[idx]].sequence().upper()
        tokenized_seq = self.tokenizer.text_to_ids(sequence)
        loss_mask = torch.ones_like(torch.tensor(tokenized_seq, dtype=torch.long), dtype=torch.long)
        return {
            "tokens": torch.tensor(tokenized_seq, dtype=torch.long),
            "position_ids": torch.arange(len(tokenized_seq), dtype=torch.long),
            "seq_idx": torch.tensor(idx, dtype=torch.long),
            "loss_mask": loss_mask,
        }

BioNeMo Framework Ecosystem Development

To install this sub-package locally (with --editable):

pip install -e .

To run unit tests, execute:

pytest -v .

To build wheels for different Python, Linux, and system architecture configurations, run the BioNeMo Sub-Package GitHub Actions Workflow (bionemo-subpackage-ci.yml)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl (278.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl (275.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl (275.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b32a6a49c96b1ef409ae726239bcfecadee49de91fdbf681dd8bcd3082de233b
MD5 8a40719b0fb2fa870c20c3b5c0207e57
BLAKE2b-256 ae2ad8900358391bf85f77886e81e103d4ab75f1398b4977960a7b81415884a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for bionemo_noodles-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: bionemo-subpackage-ci.yml on NVIDIA/bionemo-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5460157d45c830edca04a22a17b7b5a66f528e57279d672bba4f0068914c0f98
MD5 e756aa7bd76d5b97f5ffda5fa87c6cd1
BLAKE2b-256 c40cff22aeca872d48695c81d752a4b8dd10a1543aacefc04c9473564b4c1806

See more details on using hashes here.

Provenance

The following attestation bundles were made for bionemo_noodles-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: bionemo-subpackage-ci.yml on NVIDIA/bionemo-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f0f8a4c1a314782fff136738dbd0e04d04498e50c26a6ee9b0bfd7236f4141c4
MD5 d98bb740a66ddf8986907db853fb7278
BLAKE2b-256 81f9ee586c4056e48fd74e2d38856d01e9aaa64ce5e4334ea4aed6be0ecc0c68

See more details on using hashes here.

Provenance

The following attestation bundles were made for bionemo_noodles-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: bionemo-subpackage-ci.yml on NVIDIA/bionemo-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page