Skip to main content

Perfect hash based index for genome data.

Project description

aindex: perfect hash based index for genomic data

PyPI version PyPI pyversions PyPI - Wheel GitHub Actions Workflow Status PyPI license DOI

Installation

Requirements:

jellyfish 2

(easy to install with apt install jellyfish or with conda install bioconda::jellyfish)

Installation with pip:

pip install aindex2

If you want to install the package from source or you don't have pip version for your system, you can do so by running the following commands:

git clone https://github.com/ad3002/aindex.git
cd aindex
make
pip install .

This will create the necessary executables in the bin directory.

To uninstall:

pip uninstall aindex2
pip uninstall clean

To clean up the compiled files, run:

make clean

Mac Compilation Command

Currently unsupported in Makefile. But you can try to compile the Python wrapper on MacOs manually with the following command:

g++ -c -std=c++11 -fPIC python_wrapper.cpp -o python_wrapper.o && g++ -c -std=c++11 -fPIC kmers.cpp kmers.hpp debrujin.cpp debrujin.hpp hash.cpp hash.hpp read.cpp read.hpp settings.hpp settings.cpp && g++ -shared -Wl,-install_name,python_wrapper.so -o python_wrapper.so python_wrapper.o kmers.o debrujin.o hash.o read.o settings.o

Usage

Compute all binary arrays:

FASTQ1=./tests/raw_reads.101bp.IS350bp25_1.fastq
FASTQ2=./tests/raw_reads.101bp.IS350bp25_2.fastq
OUTPUT_PREFIX=./tests/raw_reads.101bp.IS350bp25

compute_aindex.py -i $FASTQ1,$FASTQ2 -t fastq -o $OUTPUT_PREFIX --lu 2 -P 30

Usage from Python

You can simply run demo.py or:

import aindex

prefix_path = "tests/raw_reads.101bp.IS350bp25"
kmer2tf = aindex.get_aindex(prefix_path)

kmer = "A"*23
rkmer = "T"*23
kid = kmer2tf.get_kid_by_kmer(kmer)
print(kmer2tf.get_kmer_info_by_kid(kid))
print(kmer2tf[kmer], kid, kmer2tf.get_kmer_by_kid(kid), len(kmer2tf.pos(kmer)), kmer2tf.get_strand(kmer), kmer2tf.get_strand(rkmer))
kmer = kmer2tf.get_read(0, 23, 0)
pos = kmer2tf.pos(kmer)[0]
print(pos)

print(kmer2tf.get_kid_by_kmer(kmer), kmer2tf.get_kid_by_kmer(rkmer))

print(kmer2tf.get_hash_size())

print(kmer2tf.get_read(0, 123, 0))

print(kmer2tf.get_read(0, 123, 1))


k = 23
for p in kmer2tf.pos(kmer):
  print(kmer2tf.get_read(p, p+k))
  
test_kmer = "TAAGTTATTATTTAGTTAATACT"
right_kmer = "AGTTAATACTTTTAACAATATTA"

print(kmer2tf[kmer])

sequence = kmer2tf.get_read(0, 1023, 0)

print("Task 1. Get kmer frequency")
for i, (kmer, tf) in enumerate(kmer2tf.iter_sequence_kmers(sequence)):
    print(f"Position {i} kmer {kmer} freq = {tf}")
  
print("Task 2. Iter read by read, print the first 20 reads")
for rid, read in kmer2tf.iter_reads():
    if rid == 20:
        break
    print(rid, read)

print("Task 3. Iter reads by kmer, returs (read id, position in read, read, all_positions)")
for rid, pos, read, poses in aindex.iter_reads_by_kmer(test_kmer, kmer2tf):
  print(read[pos:pos+k])


print("Task 4. Iter reads by sequence, returns (read, position in read, read, all_positions ")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf):
    print(read)

print("Task 5. Iter reads by sequence over hamming distance, returns (read, position in read, read, all_positions, hamming distance). Note that the first kmer used as seed.")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, hd=10):
    print(read)

print("Task 6. Iter reads by sequence over hamming distance or edit distance, returns (read, position in read, read, all_positions, hamming distance). Note that the first kmer used as seed")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, hd=10):
    print(read)

for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, ed=10):
    print(read)


print("Task 7. Get distances in reads for two kmers, returns a list of (rid, left_kmer_pos, right_kmer_pos) tuples.")
for rid, start, end, length, fragment, is_gapped, is_reversed in aindex.get_left_right_distances(test_kmer, right_kmer, kmer2tf):
    print(rid, start, end, length, fragment, is_gapped, is_reversed)

print("Task 8. Get layout for kmer, returns (max_pos, reads, lefts, rights, rids, starts), for details see source code")
max_pos, reads, lefts, rights, rids, starts = aindex.get_layout_from_reads(right_kmer, kmer2tf)
print("Central layout:")
for read in reads:
    print(read)
print("Left flanks:")
print(lefts)
print("Right flanks:")
print(rights)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aindex2-1.1.3.tar.gz (13.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aindex2-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.1.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.1.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.1.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (884.8 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

aindex2-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (518.8 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file aindex2-1.1.3.tar.gz.

File metadata

  • Download URL: aindex2-1.1.3.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for aindex2-1.1.3.tar.gz
Algorithm Hash digest
SHA256 ca939b7bd0ad14dd47c5818b61727e11f63d01711675419a48a020ad29e2e628
MD5 ab9b9cffd5eff9dc21676c81b3846ae2
BLAKE2b-256 22d63359291a6c751fd4338b04605d23b92b4cda1900ad50dd60181f5d71a479

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d589e1fdeb8e1648b896f30ba510cc75b49410502e56a7a0c7e607bc6a0164fa
MD5 83aeab22c78b4c073e9808b9b806aec4
BLAKE2b-256 4ff2526e480f82ecb1e4659ddc6816499e9c318374c45ade8e8b75b1d3c74b6e

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bfc15146fc7709e5e022483bb30668281ace0aae30fc18df71fa9318a1b14354
MD5 10e38db86b5fc7674634b25fcf1c0244
BLAKE2b-256 12ba68160efe7e7dff8f49f248a5e01d0e52d0ff7f7edc053f4684cfd85f51f2

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 555715cff740ae4ca24312696777b47b61f8149f3d4421fa6f4192fdd05f1547
MD5 78f43d865a6d554f3d4960083e22fdd0
BLAKE2b-256 3d115912edc33aed3a9badeb7751c5e84adfd0d4a24b8135387c8134cee149b0

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dd21ba536ad3fd4d6377eaef68a362cb0d400632cc424b024718c84e79a4b8f3
MD5 3405db1f11b00bdcaa63dcc619114b0b
BLAKE2b-256 7bd7246341d9949da08eec858b47fab52d1ac9da2c0baafd2a6f6f0567cea9eb

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c7ca35c56f86b2b4b7b5eb7addf6b399fbdafef4ebe7ed3a5de20ead5577a32a
MD5 ae1a975ab59266a5406cccfd476884df
BLAKE2b-256 e1e25683552951b70d49e72897f02dfb1de868feccd6b81d5ff06dc7d9e5717f

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b8a05d2ee48acd1aa06dab82003aee367d94e695199579d7da27564d8f514739
MD5 8855e74a80e6381494f04f2da42b48b6
BLAKE2b-256 3c8367a4293f47daf09416869b28696712e144369fa3dfd32c1855333046e21c

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6c37415133915112c5c49060025f05d7286946f811a649953f30fd7fc8e347f9
MD5 9e066efc87e2159e086b331deb2c9d64
BLAKE2b-256 fed23c971d6d650a2decdf267083e34df3d6633a20576c9cfda4ea37861310cc

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f280774f301d426083a982567204b432f43b6bdd41f3741defbcfbaddf69826f
MD5 615af5afa0913e84d5465a1665f4d381
BLAKE2b-256 242bb847047cf172a9ff2849f6393315a9ff6f14606adb11ce546e00958e54fc

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8e726c6d7435e641934892c0c29ead3777399f0f84095818809292a1c40d6ad
MD5 be91910428f8d10a58b3c10ded5abf28
BLAKE2b-256 47df8c5d9f0f4cefac526aa9e64fd8f3c52e9ec6c945b8aedaa8f1f4504759ec

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a0defddbd0a79fb3d998ab083972115226cff2ce0e6376d2e45d223f0e5e5830
MD5 04d60770ac7a1783e3d3ce983cc41731
BLAKE2b-256 2dec737c6c72c71ced0c6f35f39a429cb1f24ef0a871f341631a26f2c4080acd

See more details on using hashes here.

File details

Details for the file aindex2-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a87b684766b7227139c0920cd036a6a6be8d0726dc456ce2974b689d38d4b118
MD5 ea73f3e48951f0673508a49a7d823e8c
BLAKE2b-256 ba2bbf3566501c44b2cfcebe8492225dafb1b3cb7b41d7a920e5c479a84477f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page