Skip to main content

Data processing for Dogma

Project description

Dogma Data

Dogma Data is a Python package built for fast and efficient parsing of FASTA files, optimized for high-performance computing. It leverages multi-threading to fully utilize all available system threads, enabling parallel processing. Additionally, the package supports exporting parsed data to the HDF5 file format for easy storage and access.

Installation

To install Dogma Data, you can use pip:

pip install dogma-data

Usage

import dogma_data

vocab = {
    'a': 0,
    'g': 1,
    'c': 2,
    't': 3,
    ...
}

mapping = dogma_data.FastaMapping(vocab, vocab['a'])
(tokens, sequences, (taxons)) = dogma_data.parse_fasta('input_path.fa', dogma_data.HeaderType.TaxonId, mapping)

header_info = {"taxons": taxons}

dogma_data.export_hdf5(
    'output_path.h5',
    dogma_data.Splitter(
        train_prop=0.95,
        val_prop=0.025,
        test_prop=0.025,
        length=len(sequences) - 1,
    ),
    tokens,
    sequences,
    header_info,
    mapping
)

Requirements

  • Python 3.10

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any questions, feel free to reach out:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (498.0 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (498.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (498.9 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

File details

Details for the file dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fd3bead441a36aa4759ecbc8a7d1d0c5ef4a340970925e6311da2ae8058ec4ea
MD5 1164291cabe1868812a7af6b52d4c669
BLAKE2b-256 1dbd44a028a602d65c2cb2874e706b4ca62d68bfe8f950ce1478c7a454ba58b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: deploy.yml on marcelroed/dogma-data

Attestations:

File details

Details for the file dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 877785e212ae028020f45f3bb8bffd9d1be25ffa39d474a363defdd381157c48
MD5 d2f08fe33ce26c3ef524817395505513
BLAKE2b-256 093a66da3e5b4b658089f8b982129bb4932123fdb1c6c367845989ec8572e347

See more details on using hashes here.

Provenance

The following attestation bundles were made for dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: deploy.yml on marcelroed/dogma-data

Attestations:

File details

Details for the file dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c529509cbdac6c84680e8c6ea45561645afcb176414c12917d61f71a36828e7b
MD5 f6100245b01655160e77a64389fef7e0
BLAKE2b-256 5e8643c7f73f48e94ce7acf3926b39ae6bd3057bafea7c90aca57977e5a341da

See more details on using hashes here.

Provenance

The following attestation bundles were made for dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: deploy.yml on marcelroed/dogma-data

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page