No project description provided
Project description
String Analysis
This package contains a set of data structures and functions used to perform fast analyses for comparing strings.
Installation
To build, you will need to have rust nightly installed on your machine:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default nightly
Then you can install via pip:
pip install <path>/shsdict
Usage
The tests are a great way to se how to use the different available methods, but a summary is as follows.
Creation
from stringanalysis.shsdict import shsdict
my_dict = shsdict()
Insertion
my_dict.insert('key', 'value')
my_dict.insert_pairs([('key1', 'value1'), ('key2', 'value2')])
my_dict.insert_keys_and_values(['key1', 'otherkey'], ['value1', 'value3'])
Retrieval
my_dict.get('key1') # => 'value1'
my_dict.get_by_prefix('key') # => ['value1', 'value2']
my_dict.get_by_any_prefix(['key', 'other']) # => ['value1', 'value2', 'value3']
my_dict.get_by_any_prefix_vectorized([['key'], ['other']]) # => [['value1', 'value2'], ['value3']]
my_dict.get_by_superstring('prefix_key1_suffix') # => ['value1']
my_dict.get_by_any_superstring(['prefix_key1_suffix', 'prefix_key2_suffix']) # => ['value1', 'value2']
my_dict.get_by_any_superstring_vectorized([['prefix_key1_suffix', 'prefix_key2_suffix'], ['a_otherkey_b']]) # => [['value1', 'value2'], ['value3']]
# Use of `get_fuzzy` requires a call to `finalize`, which indexes the data for the fuzzy search
my_dict.finalize()
my_dict.get_fuzzy('key5', 1) # => ['value1', 'value2']
A few things to keep in mind:
get_fuzzy
can typically only handle distances of1
or2
. Beyond that it will error as the search space is too large.- The prefix getters and the superstring getters accept an additional argument to limit minimum string lengths to retrieve values. If a key is shorter than that value, the method will return either
None
or[]
Debugging
If you're getting seg faults and want to debug, you need a debug build of python via something like:
CONFIGURE_OPTS=--enable-shared pyenv install 3.7.2 -g
Create a virtualenv using that python version. You can just update .python-version
to be
3.7.2-debug
and make a new virtualenv.
In test.sh
comment the maturin develop
command and uncomment the command it says to uncomment for a debug build.
Then when you get a segfault you should also get a python stacktrace to see where the seg fault occurred.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stringanalysis-0.5.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 265c54c89dc06e3189fecb52b52646e2f5269c69c76f68292f08bf1ccd5015cd |
|
MD5 | ec56a427095b941e4f154489b5d7c236 |
|
BLAKE2b-256 | 4d707eca49c058a278bb1e26b4f86411a28d6d7ea4d8d4306b2187eac78a361f |