Skip to main content

No project description provided

Project description

Vizibridge

This module is a maturin bridge between the rust crate vizicomp containing compiled code for efficient genomic data manipulation in the Python module Vizitig

How to install vizibridge

The simplest way is to use pip as vizibridge is deploy in Pypi:

pip install vizibridge

Alternative, download the wheel from the latest release obtained from gitlab

In the case where your architecture/systems is not presents, it is possible to compile it locally as well as follows.

First install the rust tool chain and then run

cargo install maturin
maturin build --release

To install the module in your python then run

pip install target/wheels/vizibridge**.whl

replacing ** by the appropriate name generated in the folder.

Publication to pypi through CI/CD:

The CI/CD takes care to compiling everything so you can simply push the content to create a new compiled module. To publish to Pypi, simply push a release tag:

git tag -d vx -m "Some description of the release to broadcast
git push origin vx 

Here vx is the version number that should be sync with the version declared in the Cargo.toml.

Publication to pypi:

First you must:

  • Have docker installed
  • A token to push vizibridge on pypi.

Then from the main directory of vizibridge run:

docker build --build-arg PYPI_TOKEN="YOU_PYPI_TOKEN" .

What should be here

The actual computing content should never been performed within this repo but always either in vizicomp repo or through another repo that we would like to have exposed in the Python ecosystem. This repo is solely dedicated to performing the bridge without polluting efficient standalone Rust tooling.

Quick documentation of Python API

The Python Interface is composed of several component:

Base type

DNA type (vizibridge.DNA)

DNA is a Python class wrapper around vizibrdge.rust_DNA which is an encoding of DNA-string in rust. The underlying data-layout is simply using 2bits per nucleotid. The buildup of a DNA type from a string is roughly 1 Gbyte/s.

Its main purpose is to provide a way to enumerate Kmer efficiently through enum_kmer and enum_canonical_kmer methods. It can also be convert back to a string.

from vizibridge import DNA
dna = DNA("ACGT"*10)
print(dna)
# display: ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
print(repr(next(dna.enum_canonical_kmer(4))))
# display: 4-mer(ACGT)

Kmer types (vizibridge.Kmers)

Two kind of type are defined in the rust code: ShortKmer and LongKmer. ShortKmer encode Kmer on 64bits integers while LongKmer uses 128bit integer. Each type came with its own compile code, so we have 64 different rust based KmerType.

The Python module help with a class wrapper that provide a common interface: vizibridge.Kmer. The building of a Kmer must go through DNA class.

from vizibridge import Kmer, DNA
kmer = Kmer.from_DNA(DNA("ACGT")) # a 4-Kmer
print(repr(kmer))
# display: 4-mer(ACGT)

The underlying integer can be found in the data field which is used to compute hash value for the Kmer.

print(kmer.data)
# display: 27 

Carefull, the hash is based uniquely on the integer content, so two kmer of distinct size can have the same integer encoding and thus the same hash.

kmer2 = Kmer.from_DNA(DNA("AAACGT"))
assert hash(kmer) == hash(kmer2)

Kmers can be convert to string, are hashable, and we can build another Kmer by appending left or right nucleotid.

print(repr(kmer))
# display: 4-mer(ACGT)
kmer3 = kmer.add_left_nucleotid('A')
print(repr(kmer3)
# display: 4-mer(AACG)
kmer4 = kmer3.add_left_nucleotid('A')
print(repr(kmer4)
# display: 4-mer(AAAG)

Index types

Index are either:

  • KmerIndex: sorted arrays of pairs (Kmer, integer) where the integers are 32bits unsigned integers.
  • KmerSet: sorted arrays of Kmer.

The filter out dupplicate, KmerSet have a set semantic and KmerIndex have a mapping semantic. They must be provided with a path toward a file for storing the index. Underthehood, the index are simply sorted array and memory map file. Carefull, memory map are not always portable (to check carefully on Windows).

Two method exists to build an KmerIndex:

  • build: take an iterate over a KmerIndexEntry (a dataclass with two field, kmer and value)
  • build_dna: take an iterate over a DNAIndexEntry (a dataclass with two fields, dna and value).

The build_dna unfold each DNA-value into kmer through enum_canonical_kmer methods. It is more efficient to use when you can associate all Kmer of a DNA sequence to one value. On the top of that, build_dna take two integer to filter-out some kmer with respect to their value modulo something. This is usefull when dispatching Kmer amongs several Shard.

Here a small example of usage.

from vizibridge import DNA
some_kmer = list(DNA("ACGT").enum_canonical_kmer(2))
d = { kmer: i for i, kmer in enumerate(some_kmer) }
print(d)
# display: {2-mer(AC): 2, 2-mer(CG): 1}
# We have only two kmer because AC occurs twice, in position 0 and 2

from vizibridge import KmerIndex, KmerIndexEntry
from pathlib import Path
index = KmerIndex.build((KmerIndexEntry(kmer=k, val=i) for i,k in enumerate(some_kmer)), Path("/tmp/some_path"), 2) # 2 is the kmer-size
print(index[some_kmer[1]])
# display: 1

print(dict(index))
# display: {2-mer(AC): 2, 2-mer(CG): 1}

KmerSet follows the same principle, except it have a Set semantic (hence no value associated to Kmer)

s = set(some_kmer)
print(s)
# display: {2-mer(AC), 2-mer(CG)}

from vizibridge import KmerSet
from pathlib import Path
index = KmerSet.build(iter(some_kmer), Path("/tmp/some_other_path"), 2) # 2 is the kmer-size
print(some_kmer[1] in index)
# display: True

print(set(index))
# display: {2-mer(AC), 2-mer(CG)}

TODO

  • Add in the CI/CD windows and MacOS compilations
  • Integrate ggcat binding
  • Other tools?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vizibridge-0.5.0.tar.gz (22.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vizibridge-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

vizibridge-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

vizibridge-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

vizibridge-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

vizibridge-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

vizibridge-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

vizibridge-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

vizibridge-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

vizibridge-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

vizibridge-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

File details

Details for the file vizibridge-0.5.0.tar.gz.

File metadata

  • Download URL: vizibridge-0.5.0.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for vizibridge-0.5.0.tar.gz
Algorithm Hash digest
SHA256 63e6c742c0c1d58a6e07d960a7eaa5c5961e517a0fdfcc4b9a1caa523ca30cd9
MD5 207835da77bc063bb4b2c4ff7b0ffc1d
BLAKE2b-256 881d3f0fb7a744e1df5e44593605fc222e98713969fef5e9abe4e37e292e5c44

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3edb993b0ae36d479f93201b7e7d8968355e0d4707d11d58db66863228acec26
MD5 d0e97a65d26f1611e7c3e25da1757b7a
BLAKE2b-256 8746bcc4c04f436f481da7d6318816264887a7a45b00b2bc0eba7cc4fc50c496

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2685cff636209c8eb54cfd6c5caa532b210c621b6e044de27a0f51148987586f
MD5 172efdc217c65a0fab8c4f2f4dbfe33c
BLAKE2b-256 26c48834acc769c53f929bf3cd7811358f1e8b4c90544159f62a08be8590485c

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 255f7d05c30aa261019b42799e1f464120401b1d572b3a4d43943dd07d9b1b26
MD5 273587c48b27dc4f730ad785da206677
BLAKE2b-256 02463e4a543f3a4cc21623405a9c3cbcb5c5f34d8fc2c47df67fc22e4ef0d676

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3cf63dea690a823f8673140770beea5346302c342264c6d2ce2aaad2239cb265
MD5 0a0a9c0a49f171d9218bde50afcbb83b
BLAKE2b-256 e8f9caae2c7cfc48791ffc48e3622927660f1463407800dad63d2f03228419fd

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dcb0cf5ca654b62e5243fe837c04a7773fb57356ec405575be57a738dead2701
MD5 67ed211df47624c8b77acbb6240fbc74
BLAKE2b-256 0cee87f7b15f3a61b42ee94fe9b7bfcd3a38340d751d5a05445d51cab2446f93

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 35a4644409ca17fc87988192d860c650a2cd14802f9b2915a17df86eef93a29e
MD5 959bdc464a5ecafc7a36b8b387b515d0
BLAKE2b-256 f0260b186dac503e9366095391b56411c3de7423215e3177d75b518aba9ac54b

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f56199b08c769927c1332bb5e77e72b92b6db921a8a1b9c56c534f91a09bd44f
MD5 76207566380394081286ac70a35526e2
BLAKE2b-256 985c239e405a283b44552ba7c22b8459a82e250d8cf04094b8ed19d30d08d0e6

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9012df823b4e75e8bea2e71ba95e421db9475fb92b0487585ee09c872df0731d
MD5 8bf27209f1fa8d53f069ff14ebed0ff3
BLAKE2b-256 3943ef341d96c43fd086d59fd9891567ec40e3a82e15262637acfc1d0dff5724

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 193b5862dbd02536adaf8464cb0980f2fdf96b32be44a98512fbd6e4fde20dcf
MD5 284a114b740fedb9c57f396a68e491fb
BLAKE2b-256 1519452983ce700d4ca3ab37b2595275b4456ced23a266c58864f2335fed44f6

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e4039d0b4840d141a7ce49812208d2bec904204498d0524f3834fc7f1fecc52d
MD5 20c6a75bf1c29f1f32404d8044162d25
BLAKE2b-256 05bc0cf24f9b7f9753002d89db6c891a6b38bb033af92112f336fcc20eb715c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page