Skip to main content

Merge and intersect sorted numpy arrays.

Project description

Sortednp

The package to intersect or merge sorted numpy arrays.

Pipeline Pylint C++ lint License PyPI

Numpy and Numpy arrays are a really great tool. However, intersecting and merging multiple sorted numpy arrays is rather less performant. The current numpy implementation concatenates the two arrays and sorts the combination. If you want to merge or intersect multiple numpy arrays, there is a much faster way, by using the property, that the resulting array is sorted.

Sortednp (sorted numpy) operates on sorted numpy arrays to calculate the intersection or the union of two numpy arrays in an efficient way. The resulting array is again a sorted numpy array, which can be merged or intersected with the next array. The intended use case is that sorted numpy arrays are sorted as the basic data structure and merged or intersected at request. Typical applications include information retrieval and search engines in particular.

It is also possible to implement a k-way merging or intersecting algorithm, which operates on an arbitrary number of arrays at the same time. This package is intended to deal with arrays with $10^6$ or $10^{10}$ items. Usually, these arrays are too large to keep more than two of them in memory at the same time. This package implements methods to merge and intersect multiple arrays, which can be loaded on-demand.

Links

Installation from PyPI

You can install the package directly from PyPI using pip.

$ pip install sortednp

Numpy Dependency

The installation fails in some cases, because of a build-time dependency on numpy. Usually, the problem can be solved by manually installing a recent numpy version via pip install -U numpy.

ju

Basic Usage

Two-way intersection

Two sorted numpy arrays can be intersected with the intersect method, which takes two numpy arrays and returns the sorted intersection of the two arrays.

## intersect.py
import numpy as np
import sortednp as snp

a = np.array([0, 3, 4, 6, 7])
b = np.array([1, 2, 3, 5, 7, 9])

i = snp.intersect(a, b)
print(i)

If you run this, you should see the intersection of both arrays as a sorted numpy array.

$ python3 intersect.py
[3 7]

Two-way union

Two numpy sorted arrays can be merged with the merge method, which takes two numpy arrays and returns the sorted union of the two arrays.

## merge.py
import numpy as np
import sortednp as snp

a = np.array([0, 3, 4, 6, 7])
b = np.array([1, 2, 3, 5, 7, 9])

m = snp.merge(a, b)
print(m)

If you run this, you should see the union of both arrays as a sorted numpy array.

$ python3 merge.py
[0 1 2 3 3 4 5 6 7 7 9]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sortednp-0.5.0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distributions

sortednp-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (177.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (173.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (177.0 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

File details

Details for the file sortednp-0.5.0.tar.gz.

File metadata

  • Download URL: sortednp-0.5.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for sortednp-0.5.0.tar.gz
Algorithm Hash digest
SHA256 6dbd9964ac49eb1822846f880823b44c5311f7c10b2087532db33dc3e32e74d1
MD5 25385c090773c537857f2cc59226c81a
BLAKE2b-256 06c2eb064e203fd25409580c25ae140e01a00e423b1f0c965770d639ad88fc83

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3b08bdcb66a21f5d21d7c8bac641eff907d0ff696c006a628b835b9524ec9377
MD5 fbb3c5c992d95d6da815c6fc62a30f05
BLAKE2b-256 886a275cd6283a5b571b5b8f090633913351f579663385d05f8e010e706c4445

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cc3942cb780c8861478bc29e5f32ec4757649e338ee210a61e56a1ce6e9b8243
MD5 fa18e3da9e77a42ec54ba6451d66dcae
BLAKE2b-256 7dfd7b8ad2228b1ce9eb4c545b9292ca52fa4ba654095493d28c5707ff449504

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 63f4f471cab3329d47a29f157965d154a4bb6fa190a3c878723df4075e50bc56
MD5 78dce799f361b57609bc3ba012340cf7
BLAKE2b-256 bb0487cd8c3edfdfcd6423fd5c53511d4acebb39417650bdc37e607e070d7822

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2c7061363c5d2dd3a8739d5dd2d812668ba13acc31665e68a35a08676db44fc6
MD5 54be2bb3d666f09529f514e98baa7538
BLAKE2b-256 377f3dde5f7a8fa4b670c95054b30c05874ef0f5d0ac00afa608a0379096d0ce

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9f1913d76a891d78fff1b24ead21b322d27ef32a4e6490588109ae4d3b721f18
MD5 6301efd0baf66dfaa3f1d9cab360dd31
BLAKE2b-256 fd69bdf85b5e8941104eeb47f1b3d1b427ea82d0c4278f69ca8f1134ce3d8d7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page