Perfect hash based index for genome data.
Project description
aindex
Perfect hash based index for text data.
Installation
With pip:
pip install aindex2
If you want to install the package from source or you don't have pip version for your system, you can do so by running the following commands:
git clone https://github.com/ad3002/aindex.git
cd aindex
make
pip install .
This will create the necessary executables in the bin directory.
To uninstall:
pip uninstall aindex2
pip uninstall clean
To clean up the compiled files, run:
make clean
Mac Compilation Command
Currently unsupported in Makefile. But you can try to compile the Python wrapper on MacOs manually with the following command:
g++ -c -std=c++11 -fPIC python_wrapper.cpp -o python_wrapper.o && g++ -c -std=c++11 -fPIC kmers.cpp kmers.hpp debrujin.cpp debrujin.hpp hash.cpp hash.hpp read.cpp read.hpp settings.hpp settings.cpp && g++ -shared -Wl,-install_name,python_wrapper.so -o python_wrapper.so python_wrapper.o kmers.o debrujin.o hash.o read.o settings.o
Usage
Compute all binary arrays:
FASTQ1=./tests/raw_reads.101bp.IS350bp25_1.fastq
FASTQ2=./tests/raw_reads.101bp.IS350bp25_2.fastq
OUTPUT_PREFIX=tests/raw_reads.101bp.IS350bp25
time python ~/Dropbox/workspace/aindex/scripts/compute_aindex.py -i $FASTQ1,$FASTQ2 -t fastq -o $OUTPUT_PREFIX --lu 2 -P 30
Usage from Python
You can simply run demo.py or:
import aindex
prefix_path = "tests/raw_reads.101bp.IS350bp25"
index = aindex.get_aindex(prefix_path)
kmer = "A"*23
rkmer = "T"*23
kid = kmer2tf.get_kid_by_kmer(kmer)
print(kmer2tf.get_kmer_info_by_kid(kid))
print(kmer2tf[kmer], kid, kmer2tf.get_kmer_by_kid(kid), len(kmer2tf.pos(kmer)), kmer2tf.get_strand(kmer), kmer2tf.get_strand(rkmer))
pos = kmer2tf.pos(kmer)[0]
print(pos)
print(kmer2tf.get_kid_by_kmer(kmer), kmer2tf.get_kid_by_kmer(rkmer))
print(kmer2tf.get_hash_size())
print(kmer2tf.get_read(0, 123, 0))
print(kmer2tf.get_read(0, 123, 1))
k = 23
for p in kmer2tf.pos("GCAGCTCAGCAGGACGGCCAACC"):
print(kmer2tf.get_read(p, p+k))
break
print(kmer2tf["GCAGCTCAGCAGGACGGCCAACC"])
sequence = kmer2tf.get_read(0, 1023, 0)
for kmer, tf in kmer2tf.iter_sequence_kmers(sequence):
print(kmer, tf)
break
k = 23
sequence = "TAAGTTATTATTTAGTTAATACTTTTAACAATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATAGTTAAATACCTTCCTTAATACTGTTAAATTATATTCAATCAATACATATATAATATTATTAAAATACTTGATAAGTATTATTTAGATATTAGACAAATACTAATTTTATATTGCTTTAATACTTAATAAATACTACTTATGTATTAAGTAAATATTACTGTAATACTAATAACAATATTATTACAATATGCTAGAATAATATTGCTAGTATCAATAATTACTAATATAGTATTAGGAAAATACCATAATAATATTTCTACATAATACTAAGTTAATACTATGTGTAGAATAATAAATAATCAGATTAAAAAAATTTTATTTATCTGAAACATATTTAATCAATTGAACTGATTATTTTCAGCAGTAATAATTACATATGTACATAGTACATATGTAAAATATCATTAATTTCTGTTATATATAATAGTATCTATTTTAGAGAGTATTAATTATTACTATAATTAAGCATTTATGCTTAATTATAAGCTTTTTATGAACAAAATTATAGACATTTTAGTTCTTATAATAAATAATAGATATTAAAGAAAATAAAAAAATAGAAATAAATATCATAACCCTTGATAACCCAGAAATTAATACTTAATCAAAAATGAAAATATTAATTAATAAAAGTGAATTGAATAAAATTTTGAAAAAAATGAATAACGTTATTATTTCCAATAACAAAATAAAACCACATCATTCATATTTTTTAATAGAGGCAAAAGAAAAAGAAATAAACTTTTATGCTAACAATGAATACTTTTCTGTCAAATGTAATTTAAATAAAAATATTGATATTCTTGAACAAGGCTCCTTAATTGTTAAAGGAAAAATTTTTAACGATCTTATTAATGGCATAAAAGAAGAGATTATTACTATTCAAGAAAAAGATCAAACACTTTTGGTTAAAACAAAAAAAACAAGTATTAATTTAAACACAATTAATGTGAATGAATTTCCAAGAATAAGGTTTAATGAAAAAAACGATTTAAGTGAATTTAATCAATTCAAAATAAATTATTCACTTTTAGTAAAAGGCATTAAAAAAATTTTTCACTCAGTTTCAAATAATCGTGAAATATCTTCTAAATTTAATGGAGTAAATTTCAATGGATCCAATGGAAAAGAAATATTTTTAGAAGCTTCTGACACTTATAAACTATCTGTTTTTGAGATAAAGCAAGAAACAGAACCATTTGATTTCATTTTGGAGAGTAATTTACTTAGTTTCATTAATTCTTTTAATCCTGAAGAAGATAAATCTATTGTTTTTTATTACAGAAAAGATAATAAAGATAGCTTTAGTACAGAAATGTTGATTTCAATGGATAACTTTATGATTAGTTACACATCGGTTAATGAAAAATTTCCAGAGGTAAACTACTTTTTTGAATTTGAACCTGAAACTAAAATAGTTGTTCAAAAAAATGAATTAAAAGATGCACTTCAAAGAATTCAAACTTTGGCTCAAAATGAAAGAACTTTTTTATGCGATATGCAAATTAACAGTTCTGAATTAAAAATAAGAGCTATTGTTAATAATATCGGAAATTCTCTTGAGGAAATTTCTTGTCTTAAATTTGAAGGTTATAAACTTAATATTTCTTTTAACCCAAGTTCTCTATTAGATCACATAGAGTCTTTTGAATCAAATGAAATAAATTTTGATTTCCAAGGAAATAGTAAGTATTTTTTGATAACCTCTAAAAGTGAACCTGAACTTAAGCAAATATTGGTTCCTTCAAGATAATGAATCTTTACGATCTTTTAGAACTACCAACTACAGCATCAATAAAAGAAATAAAAATTGCTTATAAAAGATTAGCAAAGCGTTATCACCCTGATGTAAATAAATTAGGTTCGCAAACTTTTGTTGAAATTAATAATGCTTATTCAATATTAAGTGATCCTAACCAAAAGGAAAAATATGATTCAATGCTGAAAGTTAATGATTTTCAAAATCGCATCAAAAATTTAGATATTAGTGTTAGATGACATGAAAATTTCATGGAAGAACTCGAACTTCGTAAGAACTGAGAATTTGATTTTTTTTCATCTGATGAAGATTTCTTTTATTCTCCATTTACAAAAA"
test_kmer = "TAAGTTATTATTTAGTTAATACT"
right_kmer = "AGTTAATACTTTTAACAATATTA"
print "Task 1. Get kmer frequency"
raw_input("\nReady?")
for i in xrange(len(sequence)-k+1):
kmer = sequence[i:i+k]
print "Position %s kmer %s freq = %s" % (i, kmer, index[kmer])
print "Task 2. Iter read by read, print the first 20 reads"
raw_input("\nReady?")
for i, read in enumerate(index.iter_reads()):
if i == 20:
break
print i, read
print "Task 3. Iter reads by kmer, returs (start, next_read_start, read, pos_if_uniq|None, all_poses)"
raw_input("\nReady?")
for read in iter_reads_by_kmer(test_kmer, index):
print read
print "Task 4. Get distances in reads for two kmers, returns a list of (rid, left_kmer_pos, right_kmer_pos) tuples."
raw_input("\nReady?")
print get_left_right_distances(test_kmer, right_kmer, index)
print "Task 5. Get layout for kmer, returns (max_pos, reads, lefts, rights, rids, starts), for details see source code"
raw_input("\nReady?")
max_pos, reads, lefts, rights, rids, starts = get_layout_for_kmer(right_kmer, index)
print "Central layout:"
for read in reads:
print read
print "Left flanks:"
print lefts
print "Right flanks:"
print rights
print "Task 6. Iter reads by sequence, returns (start, next_read_start, read, pos_if_uniq|None, all_poses)"
raw_input("\nReady?")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in iter_reads_by_sequence(sequence, index):
print read
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aindex2-1.0.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c598cbcbade18b0986f9abd3d18d917c518c3f90c6ba1e1d79d8b5f602d8fe7f
|
|
| MD5 |
9bf41e665a0b46470fbc2dce73a4edde
|
|
| BLAKE2b-256 |
70940e19dd8572a4cd00dfdf75f8deaaf6041420a5be1be4f5421b6fae26a81a
|
File details
Details for the file aindex2-1.0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bed70f4c1d96f8114b9c2a54d24c15ab2750077523638bddaa98e25f04b4893
|
|
| MD5 |
6c5229ec7b0f0e8e78142525b8e0811f
|
|
| BLAKE2b-256 |
1420c14fc8d26b96b690383b352eabde41ad321495dab62754f751f35323fc70
|
File details
Details for the file aindex2-1.0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66c14abc641000acb467b3751b9e050bee0b016010c8ddeed4e3c91e965ebf30
|
|
| MD5 |
dbcb0205df60941724423073f295753c
|
|
| BLAKE2b-256 |
1e8d6b7188b6ec0281aa475a577b24b3b3f03d3a85ee235be3a9c0fcf916a904
|
File details
Details for the file aindex2-1.0.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7552ece6f5a22059f5a9ef6e64113f8cf52c347998e233e86dbccffed42e4362
|
|
| MD5 |
125c4b0a76eec2f4726873b66c881538
|
|
| BLAKE2b-256 |
150e5d2bcba2eca2824185ea306b71594d225127c5f84f06ab9dc1e4bcc7fa1c
|
File details
Details for the file aindex2-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e51380bcf3b8404d3f3970bc05fb485c4a002a08f5877538d365ec072f30b49a
|
|
| MD5 |
5e24d07e46a417327aa9f811e7fb3e87
|
|
| BLAKE2b-256 |
ec7c7febf72c6081315a900e312384b884d5139db9c422d593e21839dd57cb88
|
File details
Details for the file aindex2-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ea4fd26af1d77bf3f5cfa5ac6677b6f238e1cff6497167e613d0e9b37904662
|
|
| MD5 |
a85d6e353dc0c7521e9e2909bac2d45b
|
|
| BLAKE2b-256 |
db0ec6c93c60071c1fe116ea8e0e140c30028dcc26a234f8300c9b3067e4e543
|
File details
Details for the file aindex2-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57107c5f1d04dcbcfcfa7c026f8122a90ba22309651e5281a78861f4ef47fb7e
|
|
| MD5 |
e18926339dfaa3be7af33875e010fbd4
|
|
| BLAKE2b-256 |
c0f78d42fb8b721c9097fd8581ac0d0d63bde341f80edd9f1dd78f88af4aaa12
|
File details
Details for the file aindex2-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2b88aa14116b32f0d7bf9121066b0c1e6656f2f698daaa10a22185f5b06b299
|
|
| MD5 |
48597269c75e9556a31972bf29b247eb
|
|
| BLAKE2b-256 |
9faef081a6793624347ab760124512e15da77887274d4efa265f36089f11babb
|
File details
Details for the file aindex2-1.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24c58730b771ce2007cd7ee921d4fefc6d9f61cd6cacf991ad38bbdff0e4f209
|
|
| MD5 |
cee980c7a3463b32cdd0a908c6f0b5bd
|
|
| BLAKE2b-256 |
95ba04319af6596603241ee53dc9eb3f7f912f9a4f14040346fa4e07d2bbf2e3
|
File details
Details for the file aindex2-1.0.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49659be9a635a6e261ef926b2be6b1601534187e05fbc948d95626638adb2764
|
|
| MD5 |
5bb677a082dfab8708d0d7767b4b0a26
|
|
| BLAKE2b-256 |
3eac4530c42ecb89e1fb61c459fe051b47d2cf1bd1345c9d074a863d906c5eac
|
File details
Details for the file aindex2-1.0.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: aindex2-1.0.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 475.3 kB
- Tags: CPython 3.6m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
021bb143dcc4744d7bae50b5e82764a7dff0acbed8a0deeb0a26cd7b086190aa
|
|
| MD5 |
c5edc025b07c9f4819b2d80a4b969961
|
|
| BLAKE2b-256 |
34c164c2d01ed519b7ffa74c506f8d3c5e4fb92305cb63ce567cf8808eed640c
|