Skip to main content

No project description provided

Project description

Vizibridge

This module is a maturin bridge between the rust crate vizicomp containing compiled code for efficient genomic data manipulation in the Python module Vizitig

How to install vizibridge

The simplest way is to use pip as vizibridge is deploy in Pypi:

pip install vizibridge

Alternative, download the wheel from the latest release obtained from gitlab

In the case where your architecture/systems is not presents, it is possible to compile it locally as well as follows.

First install the rust tool chain and then run

cargo install maturin
maturin build --release

To install the module in your python then run

pip install target/wheels/vizibridge**.whl

replacing ** by the appropriate name generated in the folder.

How-to modifiate this https://gitlab.inria.fr/cpaperma/vizibridge/-/releases/permalink/latestmodule

The CI/CD takes care to compiling everything so you can simply push the content to create a new compiled module. To publish to Pypi, simply push a release tag:

git tag -d vx -m "Some description of the release to broadcast
git push origin vx 

Here vx is the version number that should be sync with the version declared in the Cargo.toml.

What should be here

The actual computing content should never been performed within this repo but always either in vizicomp repo or through another repo that we would like to have exposed in the Python ecosystem. This repo is solely dedicated to performing the bridge without polluting efficient standalone Rust tooling.

Quick documentation of Python API

The Python Interface is composed of several component:

Base type

DNA type (vizibridge.DNA)

DNA is a Python class wrapper around vizibrdge.rust_DNA which is an encoding of DNA-string in rust. The underlying data-layout is simply using 2bits per nucleotid. The buildup of a DNA type from a string is roughly 1 Gbyte/s.

Its main purpose is to provide a way to enumerate Kmer efficiently through enum_kmer and enum_canonical_kmer methods. It can also be convert back to a string.

from vizibridge import DNA
dna = DNA("ACGT"*10)
print(dna)
# display: ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
print(repr(next(dna.enum_canonical_kmer(4))))
# display: 4-mer(ACGT)

Kmer types (vizibridge.Kmers)

Two kind of type are defined in the rust code: ShortKmer and LongKmer. ShortKmer encode Kmer on 64bits integers while LongKmer uses 128bit integer. Each type came with its own compile code, so we have 64 different rust based KmerType.

The Python module help with a class wrapper that provide a common interface: vizibridge.Kmer. The building of a Kmer must go through DNA class.

from vizibridge import Kmer, DNA
kmer = Kmer.from_DNA(DNA("ACGT")) # a 4-Kmer
print(repr(kmer))
# display: 4-mer(ACGT)

The underlying integer can be found in the data field which is used to compute hash value for the Kmer.

print(kmer.data)
# display: 27 

Carefull, the hash is based uniquely on the integer content, so two kmer of distinct size can have the same integer encoding and thus the same hash.

kmer2 = Kmer.from_DNA(DNA("AAACGT"))
assert hash(kmer) == hash(kmer2)

Kmers can be convert to string, are hashable, and we can build another Kmer by appending left or right nucleotid.

print(repr(kmer))
# display: 4-mer(ACGT)
kmer3 = kmer.add_left_nucleotid('A')
print(repr(kmer3)
# display: 4-mer(AACG)
kmer4 = kmer3.add_left_nucleotid('A')
print(repr(kmer4)
# display: 4-mer(AAAG)

Index types

Index are either:

  • KmerIndex: sorted arrays of pairs (Kmer, integer) where the integers are 32bits unsigned integers.
  • KmerSet: sorted arrays of Kmer.

The filter out dupplicate, KmerSet have a set semantic and KmerIndex have a mapping semantic. They must be provided with a path toward a file for storing the index. Underthehood, the index are simply sorted array and memory map file. Carefull, memory map are not always portable (to check carefully on Windows).

Two method exists to build an KmerIndex:

  • build: take an iterate over a KmerIndexEntry (a dataclass with two field, kmer and value)
  • build_dna: take an iterate over a DNAIndexEntry (a dataclass with two fields, dna and value).

The build_dna unfold each DNA-value into kmer through enum_canonical_kmer methods. It is more efficient to use when you can associate all Kmer of a DNA sequence to one value. On the top of that, build_dna take two integer to filter-out some kmer with respect to their value modulo something. This is usefull when dispatching Kmer amongs several Shard.

Here a small example of usage.

from vizibridge import DNA
some_kmer = list(DNA("ACGT").enum_canonical_kmer(2))
d = { kmer: i for i, kmer in enumerate(some_kmer) }
print(d)
# display: {2-mer(AC): 2, 2-mer(CG): 1}
# We have only two kmer because AC occurs twice, in position 0 and 2

from vizibridge import KmerIndex, KmerIndexEntry
from pathlib import Path
index = KmerIndex.build((KmerIndexEntry(kmer=k, val=i) for i,k in enumerate(some_kmer)), Path("/tmp/some_path"), 2) # 2 is the kmer-size
print(index[some_kmer[1]])
# display: 1

print(dict(index))
# display: {2-mer(AC): 2, 2-mer(CG): 1}

KmerSet follows the same principle, except it have a Set semantic (hence no value associated to Kmer)

s = set(some_kmer)
print(s)
# display: {2-mer(AC), 2-mer(CG)}

from vizibridge import KmerSet
from pathlib import Path
index = KmerSet.build(iter(some_kmer), Path("/tmp/some_other_path"), 2) # 2 is the kmer-size
print(some_kmer[1] in index)
# display: True

print(set(index))
# display: {2-mer(AC), 2-mer(CG)}

TODO

  • Add in the CI/CD windows and MacOS compilations
  • Integrate ggcat binding
  • Other tools?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vizibridge-0.4.4-cp313-cp313-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

vizibridge-0.4.4-cp312-cp312-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

vizibridge-0.4.4-cp311-cp311-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

vizibridge-0.4.4-cp310-cp310-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

vizibridge-0.4.4-cp39-cp39-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

File details

Details for the file vizibridge-0.4.4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.4.4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 066443c023c014a4fb2a48d91bb67d7495613a36033027a7f5895992fb4e8d98
MD5 f8350cd7505cbd1c3e4b6066fa23998b
BLAKE2b-256 21e15b575ee70f0dfdab63e1651902abd8835444c2a732d98ceaa6178d2d3e11

See more details on using hashes here.

File details

Details for the file vizibridge-0.4.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.4.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2af4cc34b61e5fc8b1312a434805c242c76285b1ec8c1daf1b3709eb6c128190
MD5 40b2f30735c6f2fff9505e6014de25cf
BLAKE2b-256 1519d68593cd8090ef1dc56d8d7f5d0cd1575d813826da08ba9fbc5463168f65

See more details on using hashes here.

File details

Details for the file vizibridge-0.4.4-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.4.4-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 da97565e56735da966eb5d0c721708314daca60742670e3b45aa852b1a29d986
MD5 d293fca27b0a40f636c88fce26b93808
BLAKE2b-256 b56f57ae2f6a1e9a9bc99113199a298b96e5d9db8adbeb93829e79f6ef78ac17

See more details on using hashes here.

File details

Details for the file vizibridge-0.4.4-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.4.4-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 836d9cc449ff97fc96ba6063a62f571b76fb7502210213e28baa475dd68c57f5
MD5 d8dc0a8d0cea5461952847d35cee2668
BLAKE2b-256 0b2a0f8a4d28b7f47ea97b87c9e6f50d8d7ef409f4a49fb2d91c5b0760fa0d83

See more details on using hashes here.

File details

Details for the file vizibridge-0.4.4-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.4.4-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 98e2a15798bf65d189d6d0ccd7dc2c6bbb486e88d474b3ccaaf7ddff29f6c956
MD5 1b6c289fc08b7f1b438cff502109ef82
BLAKE2b-256 188dc6acc33cc0c9902e7af90e03e2293cc85222ae0ecbdc6f4bdf3771ab9a42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page