Skip to main content

Crunch 100+ GB Strings in Python with ease

Project description

Stringzilla

Crunch 100+ GB Strings in Python with ease, leveraging SIMD Assembly

Stringzilla was born many years ago as a tutorial for SIMD accelerated string-processing. But one day, processing 100+ GB Chemistry and AI datasets, I decided to transform it into a library. It's designed to replace open(...).readlines(), str().splitlines() and many other common workloads with very long strings.

Benchmark IoT Arm Laptop x86 Server
Python: str.find 4 MB/s 14 MB/s 11 MB/s
C++: std::string::find 560 MB/s 1,2 GB/s 1,3 GB/s
Stringzilla 4,3 Gb/s 12 GB/s 12,1 GB/s

Usage

pip install stringzilla

There are two classes you can use interchangibly:

from stringzilla import Str, File, Slices

text: str = 'some-string'
text: Str = Str('some-string')
text: File = File('some-file.txt')

Once constructed, following interfaces are supported:

len(text) -> int
'substring' in text -> bool
text[42] -> str

text.contains(
    'subtring',
    start=0, # optional
    end=9223372036854775807, # optional
) -> bool

text.find(
    'subtring',
    start=0, # optional
    end=9223372036854775807, # optional
) -> int

text.count(
    'subtring',
    start=0, # optional
    end=9223372036854775807, # optional
    **, # non-traditional arguments:
    allowoverlap=False, # optional
) -> int

text.splitlines(
    keeplinebreaks=False, # optional
    **, # non-traditional arguments:
    separator='\n', # optional
) -> Slices # similar to list[str]

text.split(
    separator=' ', # optional
    maxsplit=9223372036854775807, # optional
    **, # non-traditional arguments:
    keepseparator=False, # optional
) -> Slices # similar to list[str]

Development

rm -rf build && pip install -e . && pytest scripts/test.py -s -x

To benchmark on some custom file and pattern combination:

python scripts/bench.py --path "your file" --pattern "your pattern"

To validate packaging:

cibuildwheel --platform linux

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

stringzilla-0.1.3-cp311-cp311-win_amd64.whl (97.9 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

stringzilla-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl (243.0 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.3-cp311-cp311-manylinux_2_28_aarch64.whl (236.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.3-cp311-cp311-macosx_11_0_arm64.whl (123.5 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

stringzilla-0.1.3-cp311-cp311-macosx_10_9_x86_64.whl (128.3 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

stringzilla-0.1.3-cp311-cp311-macosx_10_9_universal2.whl (250.7 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.3-cp310-cp310-win_amd64.whl (97.8 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

stringzilla-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl (243.4 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.3-cp310-cp310-manylinux_2_28_aarch64.whl (236.5 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.3-cp310-cp310-macosx_11_0_arm64.whl (123.4 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

stringzilla-0.1.3-cp310-cp310-macosx_10_9_x86_64.whl (128.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

stringzilla-0.1.3-cp310-cp310-macosx_10_9_universal2.whl (250.7 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.3-cp39-cp39-win_amd64.whl (97.8 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

stringzilla-0.1.3-cp39-cp39-manylinux_2_28_x86_64.whl (243.3 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.3-cp39-cp39-manylinux_2_28_aarch64.whl (236.6 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.3-cp39-cp39-macosx_11_0_arm64.whl (123.6 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

stringzilla-0.1.3-cp39-cp39-macosx_10_9_x86_64.whl (128.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

stringzilla-0.1.3-cp39-cp39-macosx_10_9_universal2.whl (251.0 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.3-cp38-cp38-win_amd64.whl (97.3 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

stringzilla-0.1.3-cp38-cp38-manylinux_2_28_x86_64.whl (242.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.3-cp38-cp38-manylinux_2_28_aarch64.whl (236.3 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.3-cp38-cp38-macosx_11_0_arm64.whl (123.3 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

stringzilla-0.1.3-cp38-cp38-macosx_10_9_x86_64.whl (128.2 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

stringzilla-0.1.3-cp38-cp38-macosx_10_9_universal2.whl (250.5 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.3-cp37-cp37m-win_amd64.whl (97.0 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

stringzilla-0.1.3-cp37-cp37m-manylinux_2_28_x86_64.whl (245.5 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.28+ x86-64

stringzilla-0.1.3-cp37-cp37m-manylinux_2_28_aarch64.whl (241.1 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.28+ ARM64

stringzilla-0.1.3-cp37-cp37m-macosx_10_9_x86_64.whl (124.2 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

stringzilla-0.1.3-cp36-cp36m-win_amd64.whl (96.9 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

stringzilla-0.1.3-cp36-cp36m-manylinux_2_28_x86_64.whl (247.1 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.28+ x86-64

stringzilla-0.1.3-cp36-cp36m-manylinux_2_28_aarch64.whl (241.2 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.28+ ARM64

stringzilla-0.1.3-cp36-cp36m-macosx_10_9_x86_64.whl (124.5 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page