Skip to main content

No project description provided

Project description

Vizibridge

This module is a maturin bridge between the rust crate vizicomp containing compiled code for efficient genomic data manipulation in the Python module Vizitig

How to install vizibridge

The simplest way is to use pip as vizibridge is deploy in Pypi:

pip install vizibridge

Alternative, download the wheel from the latest release obtained from gitlab

In the case where your architecture/systems is not presents, it is possible to compile it locally as well as follows.

First install the rust tool chain and then run

cargo install maturin
maturin build --release

To install the module in your python then run

pip install target/wheels/vizibridge**.whl

replacing ** by the appropriate name generated in the folder.

Publication to pypi through CI/CD:

The CI/CD takes care to compiling everything so you can simply push the content to create a new compiled module. To publish to Pypi, simply push a release tag:

git tag -d vx -m "Some description of the release to broadcast
git push origin vx 

Here vx is the version number that should be sync with the version declared in the Cargo.toml.

Publication to pypi:

First you must:

  • Have docker installed
  • A token to push vizibridge on pypi.

Then from the main directory of vizibridge run:

docker build --build-arg PYPI_TOKEN="YOU_PYPI_TOKEN" .

What should be here

The actual computing content should never been performed within this repo but always either in vizicomp repo or through another repo that we would like to have exposed in the Python ecosystem. This repo is solely dedicated to performing the bridge without polluting efficient standalone Rust tooling.

Quick documentation of Python API

The Python Interface is composed of several component:

Base type

DNA type (vizibridge.DNA)

DNA is a Python class wrapper around vizibrdge.rust_DNA which is an encoding of DNA-string in rust. The underlying data-layout is simply using 2bits per nucleotid. The buildup of a DNA type from a string is roughly 1 Gbyte/s.

Its main purpose is to provide a way to enumerate Kmer efficiently through enum_kmer and enum_canonical_kmer methods. It can also be convert back to a string.

from vizibridge import DNA
dna = DNA("ACGT"*10)
print(dna)
# display: ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
print(repr(next(dna.enum_canonical_kmer(4))))
# display: 4-mer(ACGT)

Kmer types (vizibridge.Kmers)

Two kind of type are defined in the rust code: ShortKmer and LongKmer. ShortKmer encode Kmer on 64bits integers while LongKmer uses 128bit integer. Each type came with its own compile code, so we have 64 different rust based KmerType.

The Python module help with a class wrapper that provide a common interface: vizibridge.Kmer. The building of a Kmer must go through DNA class.

from vizibridge import Kmer, DNA
kmer = Kmer.from_DNA(DNA("ACGT")) # a 4-Kmer
print(repr(kmer))
# display: 4-mer(ACGT)

The underlying integer can be found in the data field which is used to compute hash value for the Kmer.

print(kmer.data)
# display: 27 

Carefull, the hash is based uniquely on the integer content, so two kmer of distinct size can have the same integer encoding and thus the same hash.

kmer2 = Kmer.from_DNA(DNA("AAACGT"))
assert hash(kmer) == hash(kmer2)

Kmers can be convert to string, are hashable, and we can build another Kmer by appending left or right nucleotid.

print(repr(kmer))
# display: 4-mer(ACGT)
kmer3 = kmer.add_left_nucleotid('A')
print(repr(kmer3)
# display: 4-mer(AACG)
kmer4 = kmer3.add_left_nucleotid('A')
print(repr(kmer4)
# display: 4-mer(AAAG)

Index types

Index are either:

  • KmerIndex: sorted arrays of pairs (Kmer, integer) where the integers are 32bits unsigned integers.
  • KmerSet: sorted arrays of Kmer.

The filter out dupplicate, KmerSet have a set semantic and KmerIndex have a mapping semantic. They must be provided with a path toward a file for storing the index. Underthehood, the index are simply sorted array and memory map file. Carefull, memory map are not always portable (to check carefully on Windows).

Two method exists to build an KmerIndex:

  • build: take an iterate over a KmerIndexEntry (a dataclass with two field, kmer and value)
  • build_dna: take an iterate over a DNAIndexEntry (a dataclass with two fields, dna and value).

The build_dna unfold each DNA-value into kmer through enum_canonical_kmer methods. It is more efficient to use when you can associate all Kmer of a DNA sequence to one value. On the top of that, build_dna take two integer to filter-out some kmer with respect to their value modulo something. This is usefull when dispatching Kmer amongs several Shard.

Here a small example of usage.

from vizibridge import DNA
some_kmer = list(DNA("ACGT").enum_canonical_kmer(2))
d = { kmer: i for i, kmer in enumerate(some_kmer) }
print(d)
# display: {2-mer(AC): 2, 2-mer(CG): 1}
# We have only two kmer because AC occurs twice, in position 0 and 2

from vizibridge import KmerIndex, KmerIndexEntry
from pathlib import Path
index = KmerIndex.build((KmerIndexEntry(kmer=k, val=i) for i,k in enumerate(some_kmer)), Path("/tmp/some_path"), 2) # 2 is the kmer-size
print(index[some_kmer[1]])
# display: 1

print(dict(index))
# display: {2-mer(AC): 2, 2-mer(CG): 1}

KmerSet follows the same principle, except it have a Set semantic (hence no value associated to Kmer)

s = set(some_kmer)
print(s)
# display: {2-mer(AC), 2-mer(CG)}

from vizibridge import KmerSet
from pathlib import Path
index = KmerSet.build(iter(some_kmer), Path("/tmp/some_other_path"), 2) # 2 is the kmer-size
print(some_kmer[1] in index)
# display: True

print(set(index))
# display: {2-mer(AC), 2-mer(CG)}

TODO

  • Add in the CI/CD windows and MacOS compilations
  • Integrate ggcat binding
  • Other tools?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vizibridge-0.5.3.tar.gz (22.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vizibridge-0.5.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

vizibridge-0.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

vizibridge-0.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

vizibridge-0.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

vizibridge-0.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file vizibridge-0.5.3.tar.gz.

File metadata

  • Download URL: vizibridge-0.5.3.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for vizibridge-0.5.3.tar.gz
Algorithm Hash digest
SHA256 faa4ed74494e710ca4c8d3acfe362dff64084b6d3756449ac9a9660384803c3b
MD5 0a6a906b9f2878e58257f8f18d34b892
BLAKE2b-256 ab889059049e486f180647a04f0667e2b3fbba74452a3bbcaa6d11c165668919

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0ef4b8cecc7012802654eabacfe30c42ba876c75957228ca5831e9ee530cab8f
MD5 62918c0f07f57a554c8fb5e847b217d2
BLAKE2b-256 bc378d06f2e5badb26802c0fe6a717509c91ea35f59fbfece393484907dcbc66

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ee2b30d3de87598a85ed1036ed0da1b6833698bc405f6a97d15455a942fc83eb
MD5 84bb00499252f596b1d37185241d805e
BLAKE2b-256 6a2a5a7ed2a8c8a3c3f4e6830d2a4ff8cd4c6c153f07eb2f208c5fdbf4bdada9

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c33d2fe26b9586b4d21f16391d5d64542b5a25c37d7359c06b6aef1fe1a810c7
MD5 2cc6523cf7fafb52a784eceef8b52c0b
BLAKE2b-256 ce31cc7740a62bb75a1a7b8cd2a5f5073d079cea9a93d54cd567d0c2facec13e

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c0487d1bff0562b7f7a5aa0b2055618731acd7580b0f3a190fede6bf7004650a
MD5 ce69b236f4cdcc27231a737b00323444
BLAKE2b-256 0f442a0bbf8e3ced94c8eaf05672ba8fe3e6813797d7b142f49510b2b7bdb3b7

See more details on using hashes here.

File details

Details for the file vizibridge-0.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vizibridge-0.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 440d3e09030640e2fdaa43081fd52d224fddd53bb927b535a711b722c98ef2c3
MD5 aac28473b7a61f9a46677c30c6d736db
BLAKE2b-256 7b94aba0d35267466359a035e3a87c9ece876e8bd0a26e08bef7a5c549ba5587

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page