Skip to main content

PyO3 bindings and Python interface to PAPASMURF, a Platform-Accelerated Package for Alignment-free SMURF analysis.

Project description

🧙‍♂️ PAPASMURF Star me

A Platform-Accelerated Package for Alignment-free SMURF analysis.

Actions Coverage License Docs Crate PyPI Wheel Bioconda Python Versions Python Implementations Source Mirror GitHub issues Changelog Downloads

🗺️ Overview

SMURF (Short MUltiple Region Framework) is a method proposed by Fuks et al.[1] in 2018 for taxonomic profiling of 16S sequencing data. It uses several PCR-amplified regions inside the 16S rRNA gene to reach high taxonomic resolution despite the use of short read sequencing.

PAPASMURF is a Rust reimplementation of the SMURF method from scratch. It does not aim at being a 1-to-1 reimplementation of the original MATLAB implementation, but allows more control over the parameters used in the original to support sequencing data of lesser quality.

This is the Python version, there is a Rust crate available as well.

🔧 Installing

In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.

💡 Example

Use Biopython to generate a database from a file containing 16S gene sequences in FASTA format, for instance the Greengenes database:

import papasmurf

# Create a database builder with the two given primers
builder = papasmurf.Builder([
    ("CCTACGGGNGGCWGCAG", "GACTACHVGGGTATCTAATCC"),  # V3-V4 primers
    ("GTGYCAGCMGCCGCGGTAA", "CCGYCAATTYMTTTRAGTTT"), # V4-V5 primers
])

# Extract k-mers from the reference sequences
with gzip.open("gg_13_5.fasta.gz", "rt") as reader:
    for record in Bio.SeqIO.parse(reader, "fasta"):
        builder.add(record.id, str(record.seq))

# Build and index the database
database = builder.to_database()

# Save the database in JSON format
database.dump("gg.json", format="json")

Then use the database to map reads from a sample:

# Load database and create a new mapper
database = papasmurf.Database.load("gg.json", format="json")
mapper = papasmurf.Mapper(database)

# Map reads to the k-mers database
with gzip.open("data/Example_L001_R1_001.fastq.gz", "rt") as f1:
    with gzip.open("data/Example_L001_R2_001.fastq.gz", "rt") as f2:
        for r1, r2 in zip(Bio.SeqIO.parse(f1, "fastq"), Bio.SeqIO.parse(f2, "fastq")):
            mapper.add(str(r1.seq), str(r2.seq))

Once all the reads have been mapped, compute the final bacterium frequencies:

# Obtain partial mapping result
result = mapper.finish()

# Run the iterative procedure 10 times to estimate the read proportion vector
result.refine(10)

# Print the names of the reference sequences with >5% relative abundance
for (j, name) in enumerate(database.names):
    if result.frequencies[j] > 0.05:
        print(name, result.frequencies[j])

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source GPLv3 license.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original SMURF authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team with support and testing from Fabian Springer.

All brand names and product names used in this material are trademarks or registered trademarks of their respective owners. The author/owner is not affiliated with, endorsed by, or sponsored by any product, organization, or company mentioned. Smurf is a registered trademark of Studio Peyo S.A.

📚 References

  • [1] Fuks, Garold, Michael Elgart, Amnon Amir, Amit Zeisel, Peter J. Turnbaugh, Yoav Soen, and Noam Shental. ‘Combining 16S RRNA Gene Variable Regions Enables High-Resolution Microbial Community Profiling’. Microbiome 6 (26 January 2018): 17. doi:10.1186/s40168-017-0396-x.
  • [2] Gustavson, Fred G. ‘Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition’. ACM Transactions on Mathematical Software 4, no. 3 (September 1978): 250–69. doi:10.1145/355791.355796.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papasmurf-0.1.1.tar.gz (87.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papasmurf-0.1.1-cp314-cp314t-win_amd64.whl (416.0 kB view details)

Uploaded CPython 3.14tWindows x86-64

papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_x86_64.whl (587.1 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.28+ x86-64

papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_aarch64.whl (567.7 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.28+ ARM64

papasmurf-0.1.1-cp314-cp314t-macosx_12_0_x86_64.whl (540.3 kB view details)

Uploaded CPython 3.14tmacOS 12.0+ x86-64

papasmurf-0.1.1-cp314-cp314t-macosx_11_0_arm64.whl (515.0 kB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

papasmurf-0.1.1-cp38-abi3-win_amd64.whl (420.9 kB view details)

Uploaded CPython 3.8+Windows x86-64

papasmurf-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl (597.8 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ x86-64

papasmurf-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl (575.6 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

papasmurf-0.1.1-cp38-abi3-macosx_12_0_x86_64.whl (548.0 kB view details)

Uploaded CPython 3.8+macOS 12.0+ x86-64

papasmurf-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (524.8 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file papasmurf-0.1.1.tar.gz.

File metadata

  • Download URL: papasmurf-0.1.1.tar.gz
  • Upload date:
  • Size: 87.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for papasmurf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 024ebf801c65a4a0b07d50d41a66b913b368e7e99fba5e4c73ee00f439e0fe80
MD5 b911da394359e3109c8b38c4131c739a
BLAKE2b-256 982278b4892a8be23be881f1d954f2da12cf96a229d925849e643c857351206e

See more details on using hashes here.

File details

Details for the file papasmurf-0.1.1-cp314-cp314t-win_amd64.whl.

File metadata

  • Download URL: papasmurf-0.1.1-cp314-cp314t-win_amd64.whl
  • Upload date:
  • Size: 416.0 kB
  • Tags: CPython 3.14t, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for papasmurf-0.1.1-cp314-cp314t-win_amd64.whl
Algorithm Hash digest
SHA256 3093488d8ccef14ee443040319df65711b236bbc1f2e6da041a07937f2809f92
MD5 1845cca849df2a022782520c1583c6ef
BLAKE2b-256 fe07ce44af93525113eb64535f05e034531aad06d1b9c9fa260abf2816344f0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp314-cp314t-win_amd64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ba8039cf22084abd23f31d4bada013b037a7b9d61eb6b880142d559c6a69913d
MD5 2612334dd4a470d6790cd28d24fb7b37
BLAKE2b-256 2cf4b69f9595467f5e546d48d5ee73ff201ecbc461b981f0bc378c7752a06f96

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 311a3ce6b405213808f994dd3096b69a6f75e52a102d24e52d127fc8b2f1a7df
MD5 79b8aeb828e32223a7030590a043277b
BLAKE2b-256 3ff972d49c9ef673077b8cb5838f2fdc73d14f765890f9011c990074a76c9890

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp314-cp314t-manylinux_2_28_aarch64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp314-cp314t-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp314-cp314t-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 9aec12e96c2c9464e2b28a43a307577e1a188206c1736ee4a1bc996ae04d6356
MD5 372dcecd9afdbf1e6a73291febac3003
BLAKE2b-256 bed2314bc3e61656664b7db37af08c1d605d02b3d7a48cd470ea38510d2945a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp314-cp314t-macosx_12_0_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f714d468b71c544594c0287e11fb082ae7f8962bbeaab25e8a2074905bdcf5ff
MD5 bb2bc1e74a3d95a6afe65e6c62f02f5c
BLAKE2b-256 6bacbb3ef245d6e17612adcd635cb285bf85825a63448317d3a015cd85e2da56

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp314-cp314t-macosx_11_0_arm64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: papasmurf-0.1.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 420.9 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for papasmurf-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8813d0e38aa2aaa25aa612066afd79a49ae676be6b6250735649d2ee52c4c7c5
MD5 db47b5ea3c5419c222a65a2b88513255
BLAKE2b-256 37696ced594202ae40f72406bc5285039b4d133432bf98e4a8bc48c5db53be8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp38-abi3-win_amd64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fc60e2f3dba14700b6cfbde03dbb50bb4a8e89922ffe0eded1cde924ba670899
MD5 84eb52f961958a34bb8e8879f23ecab4
BLAKE2b-256 d205324e2f7cf86f87b8d9c62ae76de285027e794f1a74d6a2ca0c9bc959afdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp38-abi3-manylinux_2_28_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a16701c4493282ef83a33d9fd5684597cb95cd936956bc90cbd26ca2b49c8564
MD5 6633a11eea9d9f2b231d716b0a7de8d5
BLAKE2b-256 0e59c2f391118e6b87d498296e41821f07a493ac173ca302e4f6fa6679207963

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp38-abi3-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp38-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1db77431ec320bca0485fe6edb360210d6b4f57700e9d3dbcfbf49058b77a591
MD5 50ead73700ee77684835fe6340ca5fad
BLAKE2b-256 5cb1bcb8b46123b9b3614695863126473fffc010832dae347864338348fdce78

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp38-abi3-macosx_12_0_x86_64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papasmurf-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papasmurf-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 955cdea199aa5a384f8eb72bd41386f907980eefa916705c6290eddcf59570ca
MD5 76e3153bcd5d97992c9f3c33d1122a0a
BLAKE2b-256 afd5e124dd1933b65413d0c7389f2efa2e5f87d0d0d9360ce671ab3309d88e11

See more details on using hashes here.

Provenance

The following attestation bundles were made for papasmurf-0.1.1-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: python.yml on althonos/papasmurf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page