Skip to main content

No project description provided

Project description

Vizibridge

This module is a maturin bridge between the rust crate vizicomp containing compiled code for efficient genomic data manipulation in the Python module Vizitig

How to install vizibridge

The simplest way is to use pip as vizibridge is deploy in Pypi:

pip install vizibridge

Alternative, download the wheel from the latest release obtained from gitlab

In the case where your architecture/systems is not presents, it is possible to compile it locally as well as follows.

First install the rust tool chain and then run

cargo install maturin
maturin build --release

To install the module in your python then run

pip install target/wheels/vizibridge**.whl

replacing ** by the appropriate name generated in the folder.

Publication to pypi through CI/CD:

The CI/CD takes care to compiling everything so you can simply push the content to create a new compiled module. To publish to Pypi, simply push a release tag:

git tag -d vx -m "Some description of the release to broadcast
git push origin vx 

Here vx is the version number that should be sync with the version declared in the Cargo.toml.

Publication to pypi:

First you must:

  • Have docker installed
  • A token to push vizibridge on pypi.

Then from the main directory of vizibridge run:

docker build --build-arg PYPI_TOKEN="YOU_PYPI_TOKEN" .

What should be here

The actual computing content should never been performed within this repo but always either in vizicomp repo or through another repo that we would like to have exposed in the Python ecosystem. This repo is solely dedicated to performing the bridge without polluting efficient standalone Rust tooling.

Quick documentation of Python API

The Python Interface is composed of several component:

Base type

DNA type (vizibridge.DNA)

DNA is a Python class wrapper around vizibrdge.rust_DNA which is an encoding of DNA-string in rust. The underlying data-layout is simply using 2bits per nucleotid. The buildup of a DNA type from a string is roughly 1 Gbyte/s.

Its main purpose is to provide a way to enumerate Kmer efficiently through enum_kmer and enum_canonical_kmer methods. It can also be convert back to a string.

from vizibridge import DNA
dna = DNA("ACGT"*10)
print(dna)
# display: ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
print(repr(next(dna.enum_canonical_kmer(4))))
# display: 4-mer(ACGT)

Kmer types (vizibridge.Kmers)

Two kind of type are defined in the rust code: ShortKmer and LongKmer. ShortKmer encode Kmer on 64bits integers while LongKmer uses 128bit integer. Each type came with its own compile code, so we have 64 different rust based KmerType.

The Python module help with a class wrapper that provide a common interface: vizibridge.Kmer. The building of a Kmer must go through DNA class.

from vizibridge import Kmer, DNA
kmer = Kmer.from_DNA(DNA("ACGT")) # a 4-Kmer
print(repr(kmer))
# display: 4-mer(ACGT)

The underlying integer can be found in the data field which is used to compute hash value for the Kmer.

print(kmer.data)
# display: 27 

Carefull, the hash is based uniquely on the integer content, so two kmer of distinct size can have the same integer encoding and thus the same hash.

kmer2 = Kmer.from_DNA(DNA("AAACGT"))
assert hash(kmer) == hash(kmer2)

Kmers can be convert to string, are hashable, and we can build another Kmer by appending left or right nucleotid.

print(repr(kmer))
# display: 4-mer(ACGT)
kmer3 = kmer.add_left_nucleotid('A')
print(repr(kmer3)
# display: 4-mer(AACG)
kmer4 = kmer3.add_left_nucleotid('A')
print(repr(kmer4)
# display: 4-mer(AAAG)

Index types

Index are either:

  • KmerIndex: sorted arrays of pairs (Kmer, integer) where the integers are 32bits unsigned integers.
  • KmerSet: sorted arrays of Kmer.

The filter out dupplicate, KmerSet have a set semantic and KmerIndex have a mapping semantic. They must be provided with a path toward a file for storing the index. Underthehood, the index are simply sorted array and memory map file. Carefull, memory map are not always portable (to check carefully on Windows).

Two method exists to build an KmerIndex:

  • build: take an iterate over a KmerIndexEntry (a dataclass with two field, kmer and value)
  • build_dna: take an iterate over a DNAIndexEntry (a dataclass with two fields, dna and value).

The build_dna unfold each DNA-value into kmer through enum_canonical_kmer methods. It is more efficient to use when you can associate all Kmer of a DNA sequence to one value. On the top of that, build_dna take two integer to filter-out some kmer with respect to their value modulo something. This is usefull when dispatching Kmer amongs several Shard.

Here a small example of usage.

from vizibridge import DNA
some_kmer = list(DNA("ACGT").enum_canonical_kmer(2))
d = { kmer: i for i, kmer in enumerate(some_kmer) }
print(d)
# display: {2-mer(AC): 2, 2-mer(CG): 1}
# We have only two kmer because AC occurs twice, in position 0 and 2

from vizibridge import KmerIndex, KmerIndexEntry
from pathlib import Path
index = KmerIndex.build((KmerIndexEntry(kmer=k, val=i) for i,k in enumerate(some_kmer)), Path("/tmp/some_path"), 2) # 2 is the kmer-size
print(index[some_kmer[1]])
# display: 1

print(dict(index))
# display: {2-mer(AC): 2, 2-mer(CG): 1}

KmerSet follows the same principle, except it have a Set semantic (hence no value associated to Kmer)

s = set(some_kmer)
print(s)
# display: {2-mer(AC), 2-mer(CG)}

from vizibridge import KmerSet
from pathlib import Path
index = KmerSet.build(iter(some_kmer), Path("/tmp/some_other_path"), 2) # 2 is the kmer-size
print(some_kmer[1] in index)
# display: True

print(set(index))
# display: {2-mer(AC), 2-mer(CG)}

TODO

  • Add in the CI/CD windows and MacOS compilations
  • Integrate ggcat binding
  • Other tools?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vizibridge-0.5.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vizibridge-0.5.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

vizibridge-0.5.1-cp313-cp313-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

vizibridge-0.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

vizibridge-0.5.1-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

vizibridge-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

vizibridge-0.5.1-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

vizibridge-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

vizibridge-0.5.1-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

vizibridge-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

vizibridge-0.5.1-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file vizibridge-0.5.1.tar.gz.

File metadata

  • Download URL: vizibridge-0.5.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for vizibridge-0.5.1.tar.gz
Algorithm Hash digest
SHA256 3f5bc36f09c4650a05f2f482d81eb264a13c8da1667ffa0f90f0997b16892c4a
MD5 e44411a924892daa99c452438d7aa918
BLAKE2b-256 cc1a1d55fb62abee808b85d8425598ccae009ba6d781399a675934a783df0b79

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5a6080aff442eee8dcad1abb31d15fc4c9eaa10f3bac0d8bacbfb10b8a400e0a
MD5 b07d08d7fa4fec27dfe10b13aba6f151
BLAKE2b-256 667b2a9c7504a00dab8a67cc4e7eeb73efde401e69383d27ccfa740b5b659a8d

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aeb18bf190121acaddb67b143c0735adec6509406f5a99e70e92de9623c05639
MD5 c988188de77609a8416368749af5e0f1
BLAKE2b-256 fa8fcd457afa101350334436654f32c8a1020917d68e9231fab77376697aea18

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9a89c147e34d8e7affc9af3a438ed768b9f2afc542559f9db4f2bea9923278f9
MD5 fba8214fe2b13f7f62e0e66f96480f9c
BLAKE2b-256 50efa26ed3f1db2feeb42802f0e70a8624a58c4b9c70324d27ec89faffb41746

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 774d1bce2e6c6335127dd76f6ae5b23240da376d4eee0105c0adec05f7e7c01e
MD5 2c26d3a2e53d50512b6f86aafeb78b15
BLAKE2b-256 afccd35a0282f48bf46eb2000d8eae650b939226b13c34552d7021dbb3f7c4c9

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 07076945a393b53af327730b199b1efa1c9cd54d83fdaf0d28dff14bcd3643ce
MD5 2a5eae52c4dc657e85f7fc65ae6b3c6b
BLAKE2b-256 7b4c89036e35e0e521e583813beead1e5bbd1c39d8253bf63f3f2b0b600eb573

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 862ed665c46117ccbe84ae0ba57328b95d46a65de3df953c1d13c4a2bfb879a2
MD5 bf5f9a4013fa39e45692e6536056ca16
BLAKE2b-256 839d23b1ec97e1ea10f81724a3448241bfe2229f054d6ac6fd01cffe1b544165

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 932dabea65a5d0f9531ede1d868d20612ab0780f102eb3f07c73c7b80df275bc
MD5 5e5038fe825f729969891621333bf344
BLAKE2b-256 4d5e22d50fd4a0eac89fbf94003d2237bf64148134ed5a5df22d24f927aa5abb

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b1c208b95b41fa7f20315f9f108caadcfa0c42e04d41d56b3c82d5279d7ab687
MD5 efa53a40347770d8b6c992a8ba3cba34
BLAKE2b-256 7c82514545c111c40104c23392bc84720f1acf914f9685872abbc58da703fe77

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9746abcc8a716199ecbf3382652f562fc4732f6eca39d4289c892d7faa6cf290
MD5 33f7438c6d8b16d0c68b269efc109aa3
BLAKE2b-256 2c0b8df8985f54679db32333135639688af548e182704d73123e088e350c23ca

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 17167e05f1d65ed514a3bd30a9635362d3d665ea239ca0b55590630fb872c74e
MD5 033da5bf79f0c554577541e6969b9685
BLAKE2b-256 47ef1e7664d69bb45a0460e4a65434da5e6a1034b80343bebc76ecbe64ee7d86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page