Skip to main content

Perfect hash based index for genome data.

Project description

aindex

Perfect hash based index for text data.

Installation

With pip:

pip install aindex2

If you want to install the package from source or you don't have pip version for your system, you can do so by running the following commands:

git clone https://github.com/ad3002/aindex.git
cd aindex
make
pip install .

This will create the necessary executables in the bin directory.

To uninstall:

pip uninstall aindex2
pip uninstall clean

To clean up the compiled files, run:

make clean

Mac Compilation Command

Currently unsupported in Makefile. But you can try to compile the Python wrapper on MacOs manually with the following command:

g++ -c -std=c++11 -fPIC python_wrapper.cpp -o python_wrapper.o && g++ -c -std=c++11 -fPIC kmers.cpp kmers.hpp debrujin.cpp debrujin.hpp hash.cpp hash.hpp read.cpp read.hpp settings.hpp settings.cpp && g++ -shared -Wl,-install_name,python_wrapper.so -o python_wrapper.so python_wrapper.o kmers.o debrujin.o hash.o read.o settings.o

Usage

Compute all binary arrays:

FASTQ1=./tests/raw_reads.101bp.IS350bp25_1.fastq
FASTQ2=./tests/raw_reads.101bp.IS350bp25_2.fastq
OUTPUT_PREFIX=tests/raw_reads.101bp.IS350bp25

time python ~/Dropbox/workspace/aindex/scripts/compute_aindex.py -i $FASTQ1,$FASTQ2 -t fastq -o $OUTPUT_PREFIX --lu 2 -P 30

Usage from Python

You can simply run demo.py or:

import aindex

prefix_path = "tests/raw_reads.101bp.IS350bp25"
index = aindex.get_aindex(prefix_path)

kmer = "A"*23
rkmer = "T"*23
kid = kmer2tf.get_kid_by_kmer(kmer)
print(kmer2tf.get_kmer_info_by_kid(kid))
print(kmer2tf[kmer], kid, kmer2tf.get_kmer_by_kid(kid), len(kmer2tf.pos(kmer)), kmer2tf.get_strand(kmer), kmer2tf.get_strand(rkmer))

pos = kmer2tf.pos(kmer)[0]
print(pos)

print(kmer2tf.get_kid_by_kmer(kmer), kmer2tf.get_kid_by_kmer(rkmer))

print(kmer2tf.get_hash_size())

print(kmer2tf.get_read(0, 123, 0))

print(kmer2tf.get_read(0, 123, 1))


k = 23
for p in kmer2tf.pos("GCAGCTCAGCAGGACGGCCAACC"):
  print(kmer2tf.get_read(p, p+k))
  break


print(kmer2tf["GCAGCTCAGCAGGACGGCCAACC"])

sequence = kmer2tf.get_read(0, 1023, 0)

for kmer, tf in kmer2tf.iter_sequence_kmers(sequence):
  print(kmer, tf)
  break


k = 23
sequence = "TAAGTTATTATTTAGTTAATACTTTTAACAATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATAGTTAAATACCTTCCTTAATACTGTTAAATTATATTCAATCAATACATATATAATATTATTAAAATACTTGATAAGTATTATTTAGATATTAGACAAATACTAATTTTATATTGCTTTAATACTTAATAAATACTACTTATGTATTAAGTAAATATTACTGTAATACTAATAACAATATTATTACAATATGCTAGAATAATATTGCTAGTATCAATAATTACTAATATAGTATTAGGAAAATACCATAATAATATTTCTACATAATACTAAGTTAATACTATGTGTAGAATAATAAATAATCAGATTAAAAAAATTTTATTTATCTGAAACATATTTAATCAATTGAACTGATTATTTTCAGCAGTAATAATTACATATGTACATAGTACATATGTAAAATATCATTAATTTCTGTTATATATAATAGTATCTATTTTAGAGAGTATTAATTATTACTATAATTAAGCATTTATGCTTAATTATAAGCTTTTTATGAACAAAATTATAGACATTTTAGTTCTTATAATAAATAATAGATATTAAAGAAAATAAAAAAATAGAAATAAATATCATAACCCTTGATAACCCAGAAATTAATACTTAATCAAAAATGAAAATATTAATTAATAAAAGTGAATTGAATAAAATTTTGAAAAAAATGAATAACGTTATTATTTCCAATAACAAAATAAAACCACATCATTCATATTTTTTAATAGAGGCAAAAGAAAAAGAAATAAACTTTTATGCTAACAATGAATACTTTTCTGTCAAATGTAATTTAAATAAAAATATTGATATTCTTGAACAAGGCTCCTTAATTGTTAAAGGAAAAATTTTTAACGATCTTATTAATGGCATAAAAGAAGAGATTATTACTATTCAAGAAAAAGATCAAACACTTTTGGTTAAAACAAAAAAAACAAGTATTAATTTAAACACAATTAATGTGAATGAATTTCCAAGAATAAGGTTTAATGAAAAAAACGATTTAAGTGAATTTAATCAATTCAAAATAAATTATTCACTTTTAGTAAAAGGCATTAAAAAAATTTTTCACTCAGTTTCAAATAATCGTGAAATATCTTCTAAATTTAATGGAGTAAATTTCAATGGATCCAATGGAAAAGAAATATTTTTAGAAGCTTCTGACACTTATAAACTATCTGTTTTTGAGATAAAGCAAGAAACAGAACCATTTGATTTCATTTTGGAGAGTAATTTACTTAGTTTCATTAATTCTTTTAATCCTGAAGAAGATAAATCTATTGTTTTTTATTACAGAAAAGATAATAAAGATAGCTTTAGTACAGAAATGTTGATTTCAATGGATAACTTTATGATTAGTTACACATCGGTTAATGAAAAATTTCCAGAGGTAAACTACTTTTTTGAATTTGAACCTGAAACTAAAATAGTTGTTCAAAAAAATGAATTAAAAGATGCACTTCAAAGAATTCAAACTTTGGCTCAAAATGAAAGAACTTTTTTATGCGATATGCAAATTAACAGTTCTGAATTAAAAATAAGAGCTATTGTTAATAATATCGGAAATTCTCTTGAGGAAATTTCTTGTCTTAAATTTGAAGGTTATAAACTTAATATTTCTTTTAACCCAAGTTCTCTATTAGATCACATAGAGTCTTTTGAATCAAATGAAATAAATTTTGATTTCCAAGGAAATAGTAAGTATTTTTTGATAACCTCTAAAAGTGAACCTGAACTTAAGCAAATATTGGTTCCTTCAAGATAATGAATCTTTACGATCTTTTAGAACTACCAACTACAGCATCAATAAAAGAAATAAAAATTGCTTATAAAAGATTAGCAAAGCGTTATCACCCTGATGTAAATAAATTAGGTTCGCAAACTTTTGTTGAAATTAATAATGCTTATTCAATATTAAGTGATCCTAACCAAAAGGAAAAATATGATTCAATGCTGAAAGTTAATGATTTTCAAAATCGCATCAAAAATTTAGATATTAGTGTTAGATGACATGAAAATTTCATGGAAGAACTCGAACTTCGTAAGAACTGAGAATTTGATTTTTTTTCATCTGATGAAGATTTCTTTTATTCTCCATTTACAAAAA"

test_kmer = "TAAGTTATTATTTAGTTAATACT"
right_kmer = "AGTTAATACTTTTAACAATATTA"

print "Task 1. Get kmer frequency"
raw_input("\nReady?")
for i in xrange(len(sequence)-k+1):
    kmer = sequence[i:i+k]
    print "Position %s kmer %s freq = %s" % (i, kmer, index[kmer])

print "Task 2. Iter read by read, print the first 20 reads"
raw_input("\nReady?")
for i, read in enumerate(index.iter_reads()):
    if i == 20:
        break
    print i, read

print "Task 3. Iter reads by kmer, returs (start, next_read_start, read, pos_if_uniq|None, all_poses)"
raw_input("\nReady?")
for read in iter_reads_by_kmer(test_kmer, index):
    print read

print "Task 4. Get distances in reads for two kmers, returns a list of (rid, left_kmer_pos, right_kmer_pos) tuples."
raw_input("\nReady?")
print get_left_right_distances(test_kmer, right_kmer, index)

print "Task 5. Get layout for kmer, returns (max_pos, reads, lefts, rights, rids, starts), for details see source code"
raw_input("\nReady?")
max_pos, reads, lefts, rights, rids, starts = get_layout_for_kmer(right_kmer, index)
print "Central layout:"
for read in reads:
    print read
print "Left flanks:"
print lefts
print "Right flanks:"
print rights

print "Task 6. Iter reads by sequence, returns (start, next_read_start, read, pos_if_uniq|None, all_poses)"
raw_input("\nReady?")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in iter_reads_by_sequence(sequence, index):
    print read

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aindex2-1.0.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.0.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

aindex2-1.0.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file aindex2-1.0.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c598cbcbade18b0986f9abd3d18d917c518c3f90c6ba1e1d79d8b5f602d8fe7f
MD5 9bf41e665a0b46470fbc2dce73a4edde
BLAKE2b-256 70940e19dd8572a4cd00dfdf75f8deaaf6041420a5be1be4f5421b6fae26a81a

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bed70f4c1d96f8114b9c2a54d24c15ab2750077523638bddaa98e25f04b4893
MD5 6c5229ec7b0f0e8e78142525b8e0811f
BLAKE2b-256 1420c14fc8d26b96b690383b352eabde41ad321495dab62754f751f35323fc70

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 66c14abc641000acb467b3751b9e050bee0b016010c8ddeed4e3c91e965ebf30
MD5 dbcb0205df60941724423073f295753c
BLAKE2b-256 1e8d6b7188b6ec0281aa475a577b24b3b3f03d3a85ee235be3a9c0fcf916a904

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7552ece6f5a22059f5a9ef6e64113f8cf52c347998e233e86dbccffed42e4362
MD5 125c4b0a76eec2f4726873b66c881538
BLAKE2b-256 150e5d2bcba2eca2824185ea306b71594d225127c5f84f06ab9dc1e4bcc7fa1c

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e51380bcf3b8404d3f3970bc05fb485c4a002a08f5877538d365ec072f30b49a
MD5 5e24d07e46a417327aa9f811e7fb3e87
BLAKE2b-256 ec7c7febf72c6081315a900e312384b884d5139db9c422d593e21839dd57cb88

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8ea4fd26af1d77bf3f5cfa5ac6677b6f238e1cff6497167e613d0e9b37904662
MD5 a85d6e353dc0c7521e9e2909bac2d45b
BLAKE2b-256 db0ec6c93c60071c1fe116ea8e0e140c30028dcc26a234f8300c9b3067e4e543

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 57107c5f1d04dcbcfcfa7c026f8122a90ba22309651e5281a78861f4ef47fb7e
MD5 e18926339dfaa3be7af33875e010fbd4
BLAKE2b-256 c0f78d42fb8b721c9097fd8581ac0d0d63bde341f80edd9f1dd78f88af4aaa12

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a2b88aa14116b32f0d7bf9121066b0c1e6656f2f698daaa10a22185f5b06b299
MD5 48597269c75e9556a31972bf29b247eb
BLAKE2b-256 9faef081a6793624347ab760124512e15da77887274d4efa265f36089f11babb

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 24c58730b771ce2007cd7ee921d4fefc6d9f61cd6cacf991ad38bbdff0e4f209
MD5 cee980c7a3463b32cdd0a908c6f0b5bd
BLAKE2b-256 95ba04319af6596603241ee53dc9eb3f7f912f9a4f14040346fa4e07d2bbf2e3

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 49659be9a635a6e261ef926b2be6b1601534187e05fbc948d95626638adb2764
MD5 5bb677a082dfab8708d0d7767b4b0a26
BLAKE2b-256 3eac4530c42ecb89e1fb61c459fe051b47d2cf1bd1345c9d074a863d906c5eac

See more details on using hashes here.

File details

Details for the file aindex2-1.0.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for aindex2-1.0.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 021bb143dcc4744d7bae50b5e82764a7dff0acbed8a0deeb0a26cd7b086190aa
MD5 c5edc025b07c9f4819b2d80a4b969961
BLAKE2b-256 34c164c2d01ed519b7ffa74c506f8d3c5e4fb92305cb63ce567cf8808eed640c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page