Skip to main content

Data processing for Dogma

Project description

Dogma Data

Dogma Data is a Python package built for fast and efficient parsing of FASTA files, optimized for high-performance computing. It leverages multi-threading to fully utilize all available system threads, enabling parallel processing. Additionally, the package supports exporting parsed data to the HDF5 file format for easy storage and access.

Installation

To install Dogma Data, you can use pip:

pip install dogma-data

Usage

import dogma_data

vocab = {
    'a': 0,
    'g': 1,
    'c': 2,
    't': 3,
    ...
}

mapping = dogma_data.FastaMapping(vocab, vocab['a'])
(tokens, sequences, (taxons)) = dogma_data.parse_fasta('input_path.fa', dogma_data.HeaderType.TaxonId, mapping)

header_info = {"taxons": taxons}

dogma_data.export_hdf5(
    'output_path.h5',
    dogma_data.Splitter(
        train_prop=0.95,
        val_prop=0.025,
        test_prop=0.025,
        length=len(sequences) - 1,
    ),
    tokens,
    sequences,
    header_info,
    mapping
)

Requirements

  • Python 3.10

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any questions, feel free to reach out:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (498.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (498.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (498.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fd3bead441a36aa4759ecbc8a7d1d0c5ef4a340970925e6311da2ae8058ec4ea
MD5 1164291cabe1868812a7af6b52d4c669
BLAKE2b-256 1dbd44a028a602d65c2cb2874e706b4ca62d68bfe8f950ce1478c7a454ba58b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: deploy.yml on marcelroed/dogma-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 877785e212ae028020f45f3bb8bffd9d1be25ffa39d474a363defdd381157c48
MD5 d2f08fe33ce26c3ef524817395505513
BLAKE2b-256 093a66da3e5b4b658089f8b982129bb4932123fdb1c6c367845989ec8572e347

See more details on using hashes here.

Provenance

The following attestation bundles were made for dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: deploy.yml on marcelroed/dogma-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c529509cbdac6c84680e8c6ea45561645afcb176414c12917d61f71a36828e7b
MD5 f6100245b01655160e77a64389fef7e0
BLAKE2b-256 5e8643c7f73f48e94ce7acf3926b39ae6bd3057bafea7c90aca57977e5a341da

See more details on using hashes here.

Provenance

The following attestation bundles were made for dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: deploy.yml on marcelroed/dogma-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page