Skip to main content

PyO3 bindings and Python interface to PAPASMURF, a Platform-Accelerated Package for Alignment-free SMURF analysis.

Project description

🧙‍♂️ PAPASMURF Star me

A Platform-Accelerated Package for Alignment-free SMURF analysis.

Actions Coverage License Docs Crate PyPI Wheel Bioconda Python Versions Python Implementations Source Mirror GitHub issues Changelog Downloads

🗺️ Overview

SMURF (Short MUltiple Region Framework) is a method proposed by Fuks et al.[1] in 2018 for taxonomic profiling of 16S sequencing data. It uses several PCR-amplified regions inside the 16S rRNA gene to reach high taxonomic resolution despite the use of short read sequencing.

PAPASMURF is a Rust reimplementation of the SMURF method from scratch. It does not aim at being a 1-to-1 reimplementation of the original MATLAB implementation, but allows more control over the parameters used in the original to support sequencing data of lesser quality.

This is the Python version, there is a Rust crate available as well.

🔧 Installing

In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.

💡 Example

Use Biopython to generate a database from a file containing 16S gene sequences in FASTA format, for instance the Greengenes database:

import papasmurf

# Create a database builder with the two given primers
builder = papasmurf.Builder([
    ("CCTACGGGNGGCWGCAG", "GACTACHVGGGTATCTAATCC"),  # V3-V4 primers
    ("GTGYCAGCMGCCGCGGTAA", "CCGYCAATTYMTTTRAGTTT"), # V4-V5 primers
])

# Extract k-mers from the reference sequences
with gzip.open("gg_13_5.fasta.gz", "rt") as reader:
    for record in Bio.SeqIO.parse(reader, "fasta"):
        builder.add(record.id, str(record.seq))

# Build and index the database
database = builder.to_database()

# Save the database in JSON format
database.dump("gg.json", format="json")

Then use the database to map reads from a sample:

# Load database and create a new mapper
database = papasmurf.Database.load("gg.json", format="json")
mapper = papasmurf.Mapper(database)

# Map reads to the k-mers database
with gzip.open("data/Example_L001_R1_001.fastq.gz", "rt") as f1:
    with gzip.open("data/Example_L001_R2_001.fastq.gz", "rt") as f2:
        for r1, r2 in zip(Bio.SeqIO.parse(f1, "fastq"), Bio.SeqIO.parse(f2, "fastq")):
            mapper.add(str(r1.seq), str(r2.seq))

Once all the reads have been mapped, compute the final bacterium frequencies:

# Obtain partial mapping result
result = mapper.finish()

# Run the iterative procedure 10 times to estimate the read proportion vector
result.refine(10)

# Print the names of the reference sequences with >5% relative abundance
for (j, name) in enumerate(database.names):
    if result.frequencies[j] > 0.05:
        print(name, result.frequencies[j])

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source GPLv3 license.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original SMURF authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team with support and testing from Fabian Springer.

All brand names and product names used in this material are trademarks or registered trademarks of their respective owners. The author/owner is not affiliated with, endorsed by, or sponsored by any product, organization, or company mentioned. Smurf is a registered trademark of Studio Peyo S.A.

📚 References

  • [1] Fuks, Garold, Michael Elgart, Amnon Amir, Amit Zeisel, Peter J. Turnbaugh, Yoav Soen, and Noam Shental. ‘Combining 16S RRNA Gene Variable Regions Enables High-Resolution Microbial Community Profiling’. Microbiome 6 (26 January 2018): 17. doi:10.1186/s40168-017-0396-x.
  • [2] Gustavson, Fred G. ‘Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition’. ACM Transactions on Mathematical Software 4, no. 3 (September 1978): 250–69. doi:10.1145/355791.355796.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papasmurf-0.1.0-cp314-cp314t-win_amd64.whl (415.3 kB view details)

Uploaded CPython 3.14tWindows x86-64

papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_x86_64.whl (587.7 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.28+ x86-64

papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_aarch64.whl (567.8 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.28+ ARM64

papasmurf-0.1.0-cp314-cp314t-macosx_12_0_x86_64.whl (541.6 kB view details)

Uploaded CPython 3.14tmacOS 12.0+ x86-64

papasmurf-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl (515.4 kB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

papasmurf-0.1.0-cp38-abi3-win_amd64.whl (420.3 kB view details)

Uploaded CPython 3.8+Windows x86-64

papasmurf-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl (598.3 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ x86-64

papasmurf-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl (575.9 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

papasmurf-0.1.0-cp38-abi3-macosx_12_0_x86_64.whl (549.0 kB view details)

Uploaded CPython 3.8+macOS 12.0+ x86-64

papasmurf-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (525.0 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file papasmurf-0.1.0-cp314-cp314t-win_amd64.whl.

File metadata

  • Download URL: papasmurf-0.1.0-cp314-cp314t-win_amd64.whl
  • Upload date:
  • Size: 415.3 kB
  • Tags: CPython 3.14t, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for papasmurf-0.1.0-cp314-cp314t-win_amd64.whl
Algorithm Hash digest
SHA256 f04e092a8b93c52e59662a68aa71f0d41b10d136a30bd8a4138e039b9b8ec624
MD5 6e3cc60842060bafaacba05dc754e3b2
BLAKE2b-256 4937081317a45e0daf479ef450ef299b024032ef556f4d82daec598ed56228cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp314-cp314t-win_amd64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4a5696942def54070dd2d251b27fe05c2584624c4e001db13c4a90520ce14063
MD5 acefac5b8adc1a1d31271fe0175295b1
BLAKE2b-256 8c6a8990f0c0e2dccb282588e69994f061cd6c0dbdbf6f1dd56ede9439c18b56

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2f6bbfe9465a495522e151164110c2539a60d1ada30b03bc1aa3ad6bbb824b32
MD5 71e324e4525cd2bc5c456bbfd66a4660
BLAKE2b-256 d628805e41e586aab6d5e4843a65341fed72ee77a9e439d19c45cc81c1b32723

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp314-cp314t-manylinux_2_28_aarch64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp314-cp314t-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp314-cp314t-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a8203b3a4e0e64360c417e908ffc403bb398dc485fc3d49cb3fb3d0a5b5f761f
MD5 6df82ab52563f9e59bf66d0e94a54de1
BLAKE2b-256 baeea97b29192b48dd40aa205ebf08d04907b4be28e305368cdc045af043c0e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp314-cp314t-macosx_12_0_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8f03ee79ff7a4466de519b24fc746484e209c1b2ec80ef8b77cdf9b2f1fa75cb
MD5 0d6062e0dcf4b2a890e4de1a6fba4c81
BLAKE2b-256 9d5cc3f0c14d5a4e1c943cd8aeeb3ce48fed8eeed1c10f3d6724d622fca775f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: papasmurf-0.1.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 420.3 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for papasmurf-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 94f102c60796080cdfcc3bc3d82d7d83d950070b7bfda1b0c65ad75047c41dc2
MD5 9d0191a9741bd0089917efef8cd58431
BLAKE2b-256 3099f07503fad361e800a9f6076733d1bf7fb155c5cd98f7542853bcdc140349

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp38-abi3-win_amd64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5f619c810ab9ffc4edbe2db245d36c119db956cd2196236b7cbf3e4e38f279d4
MD5 e64f24c612d627524f6ad37e79613f62
BLAKE2b-256 827fa59547e78f73cac9ae6ad1a6f7e5951aa7d904efb407b544b5470e38ecfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0f562de2e3ee19d6aa20c23fe656f9ef38a819af6211ddeb74d06584cd95b68e
MD5 e7a75ac99ce8e2c20238182ac8200594
BLAKE2b-256 69a7e87ef9a04020671bf2027c9e7e81cca01ebcae35bc7abb20aa6ca721458c

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp38-abi3-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp38-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 3732eb57b3cb17a5950d826d5020f079793dda51cc5612205b9f63f7c645511a
MD5 5bdf0394935caa81cd4874954f5e8367
BLAKE2b-256 441c17ab67c9fdd86fdb4e55ec2139c9d82de1a79321cdce582622ff1a6c359a

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp38-abi3-macosx_12_0_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3ac8031c6d8a4837a20b98199ac61fc48861bef31945b434eebd2f5edbc76777
MD5 5effcc3ba04b41a31063a19250c0b383
BLAKE2b-256 1c105aeaaa872d59f14a5adcfddbd2f73a8ef64adb190902314d1e0130985946

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.0-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page