Merge and intersect sorted numpy arrays.
Project description
Sortednp
The package to intersect or merge sorted numpy arrays.
Numpy and Numpy arrays are a really great tool. However, intersecting and merging multiple sorted numpy arrays is rather less performant. The current numpy implementation concatenates the two arrays and sorts the combination. If you want to merge or intersect multiple numpy arrays, there is a much faster way, by using the property, that the resulting array is sorted.
Sortednp (sorted numpy) operates on sorted numpy arrays to calculate the intersection or the union of two numpy arrays in an efficient way. The resulting array is again a sorted numpy array, which can be merged or intersected with the next array. The intended use case is that sorted numpy arrays are sorted as the basic data structure and merged or intersected at request. Typical applications include information retrieval and search engines in particular.
It is also possible to implement a k-way merging or intersecting algorithm,
which operates on an arbitrary number of arrays at the same time. This package
is intended to deal with arrays with $10^6
$ or $10^{10}
$ items. Usually, these
arrays are too large to keep more than two of them in memory at the same
time. This package implements methods to merge and intersect multiple arrays,
which can be loaded on-demand.
Links
Installation from PyPI
You can install the package directly from PyPI using pip
.
$ pip install sortednp
Numpy Dependency
The installation fails in some cases, because of a build-time dependency on
numpy. Usually, the problem can be solved by manually installing a recent numpy
version via pip install -U numpy
.
ju
Basic Usage
Two-way intersection
Two sorted numpy arrays can be intersected with the intersect
method, which takes two
numpy arrays and returns the sorted intersection of the two arrays.
## intersect.py
import numpy as np
import sortednp as snp
a = np.array([0, 3, 4, 6, 7])
b = np.array([1, 2, 3, 5, 7, 9])
i = snp.intersect(a, b)
print(i)
If you run this, you should see the intersection of both arrays as a sorted numpy array.
$ python3 intersect.py
[3 7]
Two-way union
Two numpy sorted arrays can be merged with the merge
method, which takes two
numpy arrays and returns the sorted union of the two arrays.
## merge.py
import numpy as np
import sortednp as snp
a = np.array([0, 3, 4, 6, 7])
b = np.array([1, 2, 3, 5, 7, 9])
m = snp.merge(a, b)
print(m)
If you run this, you should see the union of both arrays as a sorted numpy array.
$ python3 merge.py
[0 1 2 3 3 4 5 6 7 7 9]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sortednp-0.5.0rc0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 906f7a26967a6b01e4cf76498dd137ef01b510012cdcdc12665b5b2663234aa7 |
|
MD5 | a830b3cf776fc95e0a86696bf08f8153 |
|
BLAKE2b-256 | 4d81d5bc4a873d20aa210e581a4719ed5ddb7a0864b1b0767e024026e216e497 |
Hashes for sortednp-0.5.0rc0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a5b8be114231221bada32bff2df2a0a4437b4003751515fdb2b7923985086f7 |
|
MD5 | ebd14347ac986300e1f97736c6eea601 |
|
BLAKE2b-256 | bff1cc55433a3e04d0d64c49fcdc8e3144b1f81613471805f4545426b7cca74d |
Hashes for sortednp-0.5.0rc0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb00b85423a36ced6d5ae072eca160f5b20792a3aedfc906fddef101eb2a8f2f |
|
MD5 | a18f59d129402f1b147578a683ac1985 |
|
BLAKE2b-256 | fd76c4563b3a7f3ee317483199ff8588fdf09b0746b93539db439adc13ea66a1 |
Hashes for sortednp-0.5.0rc0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df72a9aea28cbce4ae62f6c99808ee82b3d5639f1e9e93ff55044cee5f610c2a |
|
MD5 | 06df239fad6f828a9e2835a9a9f28dd2 |
|
BLAKE2b-256 | 576804983a581f3acfe7f036dd14e1afd111cb4325ed0d3c5ab44d1fc8aaee34 |
Hashes for sortednp-0.5.0rc0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bf256f51699d213906b2994bcccadf6315ed3c00d78a57379be49d68376fb72 |
|
MD5 | f614952e13a9a6c55abfebe54633e21e |
|
BLAKE2b-256 | 03461f1690a6ceb2daeaedb0d1960a05f25472002cc4ee0a8ae31bc72762c7bd |