Perfect hash based index for genome data.
Project description
aindex: perfect hash based index for genomic data
Installation
Requirements:
(easy to install with apt install jellyfish or with conda install bioconda::jellyfish)
Installation with pip:
pip install aindex2
If you want to install the package from source or you don't have pip version for your system, you can do so by running the following commands:
git clone https://github.com/ad3002/aindex.git
cd aindex
make
pip install .
This will create the necessary executables in the bin directory.
To uninstall:
pip uninstall aindex2
pip uninstall clean
To clean up the compiled files, run:
make clean
Mac Compilation Command
Currently unsupported in Makefile. But you can try to compile the Python wrapper on MacOs manually with the following command:
g++ -c -std=c++11 -fPIC python_wrapper.cpp -o python_wrapper.o && g++ -c -std=c++11 -fPIC kmers.cpp kmers.hpp debrujin.cpp debrujin.hpp hash.cpp hash.hpp read.cpp read.hpp settings.hpp settings.cpp && g++ -shared -Wl,-install_name,python_wrapper.so -o python_wrapper.so python_wrapper.o kmers.o debrujin.o hash.o read.o settings.o
Usage
Compute all binary arrays:
FASTQ1=./tests/raw_reads.101bp.IS350bp25_1.fastq
FASTQ2=./tests/raw_reads.101bp.IS350bp25_2.fastq
OUTPUT_PREFIX=./tests/raw_reads.101bp.IS350bp25
compute_aindex.py -i $FASTQ1,$FASTQ2 -t fastq -o $OUTPUT_PREFIX --lu 2 -P 30
Usage from Python
You can simply run demo.py or:
import aindex
prefix_path = "tests/raw_reads.101bp.IS350bp25"
kmer2tf = aindex.get_aindex(prefix_path)
kmer = "A"*23
rkmer = "T"*23
kid = kmer2tf.get_kid_by_kmer(kmer)
print(kmer2tf.get_kmer_info_by_kid(kid))
print(kmer2tf[kmer], kid, kmer2tf.get_kmer_by_kid(kid), len(kmer2tf.pos(kmer)), kmer2tf.get_strand(kmer), kmer2tf.get_strand(rkmer))
kmer = kmer2tf.get_read(0, 23, 0)
pos = kmer2tf.pos(kmer)[0]
print(pos)
print(kmer2tf.get_kid_by_kmer(kmer), kmer2tf.get_kid_by_kmer(rkmer))
print(kmer2tf.get_hash_size())
print(kmer2tf.get_read(0, 123, 0))
print(kmer2tf.get_read(0, 123, 1))
k = 23
for p in kmer2tf.pos(kmer):
print(kmer2tf.get_read(p, p+k))
test_kmer = "TAAGTTATTATTTAGTTAATACT"
right_kmer = "AGTTAATACTTTTAACAATATTA"
print(kmer2tf[kmer])
sequence = kmer2tf.get_read(0, 1023, 0)
print("Task 1. Get kmer frequency")
for i, (kmer, tf) in enumerate(kmer2tf.iter_sequence_kmers(sequence)):
print(f"Position {i} kmer {kmer} freq = {tf}")
print("Task 2. Iter read by read, print the first 20 reads")
for rid, read in kmer2tf.iter_reads():
if rid == 20:
break
print(rid, read)
print("Task 3. Iter reads by kmer, returs (read id, position in read, read, all_positions)")
for rid, pos, read, poses in aindex.iter_reads_by_kmer(test_kmer, kmer2tf):
print(read[pos:pos+k])
print("Task 4. Iter reads by sequence, returns (read, position in read, read, all_positions ")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf):
print(read)
print("Task 5. Iter reads by sequence over hamming distance, returns (read, position in read, read, all_positions, hamming distance). Note that the first kmer used as seed.")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, hd=10):
print(read)
print("Task 6. Iter reads by sequence over hamming distance or edit distance, returns (read, position in read, read, all_positions, hamming distance). Note that the first kmer used as seed")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, hd=10):
print(read)
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, ed=10):
print(read)
print("Task 7. Get distances in reads for two kmers, returns a list of (rid, left_kmer_pos, right_kmer_pos) tuples.")
for rid, start, end, length, fragment, is_gapped, is_reversed in aindex.get_left_right_distances(test_kmer, right_kmer, kmer2tf):
print(rid, start, end, length, fragment, is_gapped, is_reversed)
print("Task 8. Get layout for kmer, returns (max_pos, reads, lefts, rights, rids, starts), for details see source code")
max_pos, reads, lefts, rights, rids, starts = aindex.get_layout_from_reads(right_kmer, kmer2tf)
print("Central layout:")
for read in reads:
print(read)
print("Left flanks:")
print(lefts)
print("Right flanks:")
print(rights)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aindex2-1.1.3.tar.gz.
File metadata
- Download URL: aindex2-1.1.3.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca939b7bd0ad14dd47c5818b61727e11f63d01711675419a48a020ad29e2e628
|
|
| MD5 |
ab9b9cffd5eff9dc21676c81b3846ae2
|
|
| BLAKE2b-256 |
22d63359291a6c751fd4338b04605d23b92b4cda1900ad50dd60181f5d71a479
|
File details
Details for the file aindex2-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.9 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d589e1fdeb8e1648b896f30ba510cc75b49410502e56a7a0c7e607bc6a0164fa
|
|
| MD5 |
83aeab22c78b4c073e9808b9b806aec4
|
|
| BLAKE2b-256 |
4ff2526e480f82ecb1e4659ddc6816499e9c318374c45ade8e8b75b1d3c74b6e
|
File details
Details for the file aindex2-1.1.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.9 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfc15146fc7709e5e022483bb30668281ace0aae30fc18df71fa9318a1b14354
|
|
| MD5 |
10e38db86b5fc7674634b25fcf1c0244
|
|
| BLAKE2b-256 |
12ba68160efe7e7dff8f49f248a5e01d0e52d0ff7f7edc053f4684cfd85f51f2
|
File details
Details for the file aindex2-1.1.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.9 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
555715cff740ae4ca24312696777b47b61f8149f3d4421fa6f4192fdd05f1547
|
|
| MD5 |
78f43d865a6d554f3d4960083e22fdd0
|
|
| BLAKE2b-256 |
3d115912edc33aed3a9badeb7751c5e84adfd0d4a24b8135387c8134cee149b0
|
File details
Details for the file aindex2-1.1.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.9 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd21ba536ad3fd4d6377eaef68a362cb0d400632cc424b024718c84e79a4b8f3
|
|
| MD5 |
3405db1f11b00bdcaa63dcc619114b0b
|
|
| BLAKE2b-256 |
7bd7246341d9949da08eec858b47fab52d1ac9da2c0baafd2a6f6f0567cea9eb
|
File details
Details for the file aindex2-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.8 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7ca35c56f86b2b4b7b5eb7addf6b399fbdafef4ebe7ed3a5de20ead5577a32a
|
|
| MD5 |
ae1a975ab59266a5406cccfd476884df
|
|
| BLAKE2b-256 |
e1e25683552951b70d49e72897f02dfb1de868feccd6b81d5ff06dc7d9e5717f
|
File details
Details for the file aindex2-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.8 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8a05d2ee48acd1aa06dab82003aee367d94e695199579d7da27564d8f514739
|
|
| MD5 |
8855e74a80e6381494f04f2da42b48b6
|
|
| BLAKE2b-256 |
3c8367a4293f47daf09416869b28696712e144369fa3dfd32c1855333046e21c
|
File details
Details for the file aindex2-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.8 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c37415133915112c5c49060025f05d7286946f811a649953f30fd7fc8e347f9
|
|
| MD5 |
9e066efc87e2159e086b331deb2c9d64
|
|
| BLAKE2b-256 |
fed23c971d6d650a2decdf267083e34df3d6633a20576c9cfda4ea37861310cc
|
File details
Details for the file aindex2-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.8 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f280774f301d426083a982567204b432f43b6bdd41f3741defbcfbaddf69826f
|
|
| MD5 |
615af5afa0913e84d5465a1665f4d381
|
|
| BLAKE2b-256 |
242bb847047cf172a9ff2849f6393315a9ff6f14606adb11ce546e00958e54fc
|
File details
Details for the file aindex2-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.8 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8e726c6d7435e641934892c0c29ead3777399f0f84095818809292a1c40d6ad
|
|
| MD5 |
be91910428f8d10a58b3c10ded5abf28
|
|
| BLAKE2b-256 |
47df8c5d9f0f4cefac526aa9e64fd8f3c52e9ec6c945b8aedaa8f1f4504759ec
|
File details
Details for the file aindex2-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 884.8 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0defddbd0a79fb3d998ab083972115226cff2ce0e6376d2e45d223f0e5e5830
|
|
| MD5 |
04d60770ac7a1783e3d3ce983cc41731
|
|
| BLAKE2b-256 |
2dec737c6c72c71ced0c6f35f39a429cb1f24ef0a871f341631a26f2c4080acd
|
File details
Details for the file aindex2-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 518.8 kB
- Tags: CPython 3.6m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a87b684766b7227139c0920cd036a6a6be8d0726dc456ce2974b689d38d4b118
|
|
| MD5 |
ea73f3e48951f0673508a49a7d823e8c
|
|
| BLAKE2b-256 |
ba2bbf3566501c44b2cfcebe8492225dafb1b3cb7b41d7a920e5c479a84477f9
|