Skip to main content

No project description provided

Project description

Vizibridge

This module is a maturin bridge between the rust crate vizicomp containing compiled code for efficient genomic data manipulation in the Python module Vizitig

How to install vizibridge

The simplest way is to use pip as vizibridge is deploy in Pypi:

pip install vizibridge

Alternative, download the wheel from the latest release obtained from gitlab

In the case where your architecture/systems is not presents, it is possible to compile it locally as well as follows.

First install the rust tool chain and then run

cargo install maturin
maturin build --release

To install the module in your python then run

pip install target/wheels/vizibridge**.whl

replacing ** by the appropriate name generated in the folder.

Publication to pypi through CI/CD:

The CI/CD takes care to compiling everything so you can simply push the content to create a new compiled module. To publish to Pypi, simply push a release tag:

git tag -d vx -m "Some description of the release to broadcast
git push origin vx 

Here vx is the version number that should be sync with the version declared in the Cargo.toml.

Publication to pypi:

First you must:

  • Have docker installed
  • A token to push vizibridge on pypi.

Then from the main directory of vizibridge run:

docker build --build-arg PYPI_TOKEN="YOU_PYPI_TOKEN" .

What should be here

The actual computing content should never been performed within this repo but always either in vizicomp repo or through another repo that we would like to have exposed in the Python ecosystem. This repo is solely dedicated to performing the bridge without polluting efficient standalone Rust tooling.

Quick documentation of Python API

The Python Interface is composed of several component:

Base type

DNA type (vizibridge.DNA)

DNA is a Python class wrapper around vizibrdge.rust_DNA which is an encoding of DNA-string in rust. The underlying data-layout is simply using 2bits per nucleotid. The buildup of a DNA type from a string is roughly 1 Gbyte/s.

Its main purpose is to provide a way to enumerate Kmer efficiently through enum_kmer and enum_canonical_kmer methods. It can also be convert back to a string.

from vizibridge import DNA
dna = DNA("ACGT"*10)
print(dna)
# display: ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
print(repr(next(dna.enum_canonical_kmer(4))))
# display: 4-mer(ACGT)

Kmer types (vizibridge.Kmers)

Two kind of type are defined in the rust code: ShortKmer and LongKmer. ShortKmer encode Kmer on 64bits integers while LongKmer uses 128bit integer. Each type came with its own compile code, so we have 64 different rust based KmerType.

The Python module help with a class wrapper that provide a common interface: vizibridge.Kmer. The building of a Kmer must go through DNA class.

from vizibridge import Kmer, DNA
kmer = Kmer.from_DNA(DNA("ACGT")) # a 4-Kmer
print(repr(kmer))
# display: 4-mer(ACGT)

The underlying integer can be found in the data field which is used to compute hash value for the Kmer.

print(kmer.data)
# display: 27 

Carefull, the hash is based uniquely on the integer content, so two kmer of distinct size can have the same integer encoding and thus the same hash.

kmer2 = Kmer.from_DNA(DNA("AAACGT"))
assert hash(kmer) == hash(kmer2)

Kmers can be convert to string, are hashable, and we can build another Kmer by appending left or right nucleotid.

print(repr(kmer))
# display: 4-mer(ACGT)
kmer3 = kmer.add_left_nucleotid('A')
print(repr(kmer3)
# display: 4-mer(AACG)
kmer4 = kmer3.add_left_nucleotid('A')
print(repr(kmer4)
# display: 4-mer(AAAG)

Index types

Index are either:

  • KmerIndex: sorted arrays of pairs (Kmer, integer) where the integers are 32bits unsigned integers.
  • KmerSet: sorted arrays of Kmer.

The filter out dupplicate, KmerSet have a set semantic and KmerIndex have a mapping semantic. They must be provided with a path toward a file for storing the index. Underthehood, the index are simply sorted array and memory map file. Carefull, memory map are not always portable (to check carefully on Windows).

Two method exists to build an KmerIndex:

  • build: take an iterate over a KmerIndexEntry (a dataclass with two field, kmer and value)
  • build_dna: take an iterate over a DNAIndexEntry (a dataclass with two fields, dna and value).

The build_dna unfold each DNA-value into kmer through enum_canonical_kmer methods. It is more efficient to use when you can associate all Kmer of a DNA sequence to one value. On the top of that, build_dna take two integer to filter-out some kmer with respect to their value modulo something. This is usefull when dispatching Kmer amongs several Shard.

Here a small example of usage.

from vizibridge import DNA
some_kmer = list(DNA("ACGT").enum_canonical_kmer(2))
d = { kmer: i for i, kmer in enumerate(some_kmer) }
print(d)
# display: {2-mer(AC): 2, 2-mer(CG): 1}
# We have only two kmer because AC occurs twice, in position 0 and 2

from vizibridge import KmerIndex, KmerIndexEntry
from pathlib import Path
index = KmerIndex.build((KmerIndexEntry(kmer=k, val=i) for i,k in enumerate(some_kmer)), Path("/tmp/some_path"), 2) # 2 is the kmer-size
print(index[some_kmer[1]])
# display: 1

print(dict(index))
# display: {2-mer(AC): 2, 2-mer(CG): 1}

KmerSet follows the same principle, except it have a Set semantic (hence no value associated to Kmer)

s = set(some_kmer)
print(s)
# display: {2-mer(AC), 2-mer(CG)}

from vizibridge import KmerSet
from pathlib import Path
index = KmerSet.build(iter(some_kmer), Path("/tmp/some_other_path"), 2) # 2 is the kmer-size
print(some_kmer[1] in index)
# display: True

print(set(index))
# display: {2-mer(AC), 2-mer(CG)}

TODO

  • Add in the CI/CD windows and MacOS compilations
  • Integrate ggcat binding
  • Other tools?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vizibridge-0.6.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vizibridge-0.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

vizibridge-0.6.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

vizibridge-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

vizibridge-0.6.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

vizibridge-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

vizibridge-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

vizibridge-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

vizibridge-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

vizibridge-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

vizibridge-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

File details

Details for the file vizibridge-0.6.0.tar.gz.

File metadata

  • Download URL: vizibridge-0.6.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for vizibridge-0.6.0.tar.gz
Algorithm Hash digest
SHA256 7a24f1a096ccb6ee683138894a0596136fc2033af4f8ed944f01f303df662d80
MD5 94736bcb9d40c94378c598728b9148d6
BLAKE2b-256 e94aaab3bcd021f301c6a38da588ac8be22c8e0a92da8c0c6ce05e26ddd9f7a0

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bead4cf509e2ef56ec1cd1122ef75c1fcde5f32dd44a8a073d5d4a89bf5f9b64
MD5 c5e6c38086e721677af145a8046ea160
BLAKE2b-256 49b7b484b04fc4414280ed52ba1e6c1a93c5ba0325ad72b1d09b7235bfadb396

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 99bd0eb6cfd81da10e7a29e85400b70ebeacac4005716afef4beb3bf12020cb8
MD5 2eaa313bdd52a61977389b5d5f14b4aa
BLAKE2b-256 813bd41197922cf23e0eab0654a04de19e3031207930ec5a20f6fd5c60073612

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7e4c5590164404832199dc16f8d6ed9a9589e5f38704ef6cadfbb7d2a849b8e4
MD5 33b02516c592d4c79fc30b9ddff66a56
BLAKE2b-256 6961660fbcaf5a07c4d7af7ac48b91031d0d03a464f11139ba673a4169ae3ce1

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dec64d1e68d0610552ba20dd7a4bfc1b908e3e60c9c6701a4e595e5233ae30f2
MD5 134d9c2c44767d21cb234d99bd6c0b2a
BLAKE2b-256 9467f19afecd5bc45df5facc0b573ba48babc9fed8532e5a5ca46125f8ce8be3

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e1b48632558f026a864da2ff8127f196899b3958ace8a6af5e6a25269afcac1b
MD5 3767392eda85c44ec6c425e106884ae6
BLAKE2b-256 f085d85dfe294e31eed1ff5a3d30597ce553939ad45475cb8a41b0862ad176db

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 391eba16d1adf5c369a2b48b9e2e50de4f858734b6e73905dd7853b89f28dd61
MD5 a079881c2d5092fb3f6e514e23b92419
BLAKE2b-256 9ffeda2a44f522afe4240ae88fdfd5e7e0e4fabf17402efd1ea6c272e39a2d3b

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fa08dc18da783629cbd5102e7a2077dbf07c70905675f683889c66eb82983715
MD5 2be106fa89bde206fbb9b39c11ef7f53
BLAKE2b-256 965ee32f3fb617cb7d67d3e0697370d113a04d7bdab06e0c36c058184fc6cd66

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 34edda49418442e3acbfd027f6240dc6acf7bed8b8c96a510c75a35cdbfca072
MD5 c7069b7d146fb968a117fdb12f7ab095
BLAKE2b-256 ad01591c3636b96dcc19580527e0ae21d36d47de1fe08cc31c3b5f39df895f0e

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06bf5d7d5eb54c0585be5109cde1d52bb6c8810acfe76560e04b1cb97b68fbd4
MD5 a0b60c25f4ea5032fdf655f158f2d018
BLAKE2b-256 2915ac0c86b0b16b370af29946dc29623c0d83fd31ff2c595ed82d152d28598f

See more details on using hashes here.

File details

Details for the file vizibridge-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 810a4538a2664d356623b4ac5c5927ab69b8da5a7f1233a42b4f3da1afe3edfc
MD5 d54a5a0419372746b6e51a3848f04a8e
BLAKE2b-256 18fc81d81103d85fbb7b0c791f8b916fc13966177efae12d2078159e2de1e1bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page