Skip to main content

Merge and intersect sorted numpy arrays.

Project description

Sortednp

The package to intersect or merge sorted numpy arrays.

Pipeline Pylint C++ lint License PyPI

Numpy and Numpy arrays are a really great tool. However, intersecting and merging multiple sorted numpy arrays is rather less performant. The current numpy implementation concatenates the two arrays and sorts the combination. If you want to merge or intersect multiple numpy arrays, there is a much faster way, by using the property, that the resulting array is sorted.

Sortednp (sorted numpy) operates on sorted numpy arrays to calculate the intersection or the union of two numpy arrays in an efficient way. The resulting array is again a sorted numpy array, which can be merged or intersected with the next array. The intended use case is that sorted numpy arrays are sorted as the basic data structure and merged or intersected at request. Typical applications include information retrieval and search engines in particular.

It is also possible to implement a k-way merging or intersecting algorithm, which operates on an arbitrary number of arrays at the same time. This package is intended to deal with arrays with $10^6$ or $10^{10}$ items. Usually, these arrays are too large to keep more than two of them in memory at the same time. This package implements methods to merge and intersect multiple arrays, which can be loaded on-demand.

Links

Installation from PyPI

You can install the package directly from PyPI using pip.

$ pip install sortednp

Numpy Dependency

The installation fails in some cases, because of a build-time dependency on numpy. Usually, the problem can be solved by manually installing a recent numpy version via pip install -U numpy.

ju

Basic Usage

Two-way intersection

Two sorted numpy arrays can be intersected with the intersect method, which takes two numpy arrays and returns the sorted intersection of the two arrays.

## intersect.py
import numpy as np
import sortednp as snp

a = np.array([0, 3, 4, 6, 7])
b = np.array([1, 2, 3, 5, 7, 9])

i = snp.intersect(a, b)
print(i)

If you run this, you should see the intersection of both arrays as a sorted numpy array.

$ python3 intersect.py
[3 7]

Two-way union

Two numpy sorted arrays can be merged with the merge method, which takes two numpy arrays and returns the sorted union of the two arrays.

## merge.py
import numpy as np
import sortednp as snp

a = np.array([0, 3, 4, 6, 7])
b = np.array([1, 2, 3, 5, 7, 9])

m = snp.merge(a, b)
print(m)

If you run this, you should see the union of both arrays as a sorted numpy array.

$ python3 merge.py
[0 1 2 3 3 4 5 6 7 7 9]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sortednp-0.5.0rc0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sortednp-0.5.0rc0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (177.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0rc0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (173.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0rc0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (173.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0rc0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (173.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

sortednp-0.5.0rc0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (177.0 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

File details

Details for the file sortednp-0.5.0rc0.tar.gz.

File metadata

  • Download URL: sortednp-0.5.0rc0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for sortednp-0.5.0rc0.tar.gz
Algorithm Hash digest
SHA256 88b84d2fad4ff26eefb902c0ec98765fd322f1c14ef74b5a7ec8fefb9fa6699c
MD5 d54b3d1f24305feee6f7d95859699760
BLAKE2b-256 87d01afed82574d74bc3cb778acff68a5c67369dbbad97fa6d6ce713d8751dd7

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0rc0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0rc0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 906f7a26967a6b01e4cf76498dd137ef01b510012cdcdc12665b5b2663234aa7
MD5 a830b3cf776fc95e0a86696bf08f8153
BLAKE2b-256 4d81d5bc4a873d20aa210e581a4719ed5ddb7a0864b1b0767e024026e216e497

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0rc0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0rc0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1a5b8be114231221bada32bff2df2a0a4437b4003751515fdb2b7923985086f7
MD5 ebd14347ac986300e1f97736c6eea601
BLAKE2b-256 bff1cc55433a3e04d0d64c49fcdc8e3144b1f81613471805f4545426b7cca74d

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0rc0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0rc0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cb00b85423a36ced6d5ae072eca160f5b20792a3aedfc906fddef101eb2a8f2f
MD5 a18f59d129402f1b147578a683ac1985
BLAKE2b-256 fd76c4563b3a7f3ee317483199ff8588fdf09b0746b93539db439adc13ea66a1

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0rc0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0rc0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 df72a9aea28cbce4ae62f6c99808ee82b3d5639f1e9e93ff55044cee5f610c2a
MD5 06df239fad6f828a9e2835a9a9f28dd2
BLAKE2b-256 576804983a581f3acfe7f036dd14e1afd111cb4325ed0d3c5ab44d1fc8aaee34

See more details on using hashes here.

File details

Details for the file sortednp-0.5.0rc0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sortednp-0.5.0rc0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1bf256f51699d213906b2994bcccadf6315ed3c00d78a57379be49d68376fb72
MD5 f614952e13a9a6c55abfebe54633e21e
BLAKE2b-256 03461f1690a6ceb2daeaedb0d1960a05f25472002cc4ee0a8ae31bc72762c7bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page