Skip to main content

Crunch 100+ GB Strings in Python with ease

Project description

Stringzilla

Crunch 100+ GB Strings in Python with ease, leveraging SIMD Assembly

Stringzilla was born many years ago as a tutorial for SIMD accelerated string-processing. But one day, processing 100+ GB Chemistry and AI datasets, I decided to transform it into a library. It's designed to replace open(...).readlines(), str().splitlines() and many other common workloads with very long strings.

Benchmark IoT Arm Laptop x86 Server
Python: str.find 4 MB/s 14 MB/s 11 MB/s
C++: std::string::find 560 MB/s 1,2 GB/s 1,3 GB/s
Stringzilla 4,3 Gb/s 12 GB/s 12,1 GB/s

Usage

pip install stringzilla

There are two classes you can use interchangibly:

from stringzilla import Str, File, Slices

text: str = 'some-string'
text: Str = Str('some-string')
text: File = File('some-file.txt')

Once constructed, following interfaces are supported:

len(text) -> int
'substring' in text -> bool
text[42] -> str

text.contains(
    'subtring',
    start=0, # optional
    end=9223372036854775807, # optional
) -> bool

text.find(
    'subtring',
    start=0, # optional
    end=9223372036854775807, # optional
) -> int

text.count(
    'subtring',
    start=0, # optional
    end=9223372036854775807, # optional
    **, # non-traditional arguments:
    allowoverlap=False, # optional
) -> int

text.splitlines(
    keeplinebreaks=False, # optional
    **, # non-traditional arguments:
    separator='\n', # optional
) -> Slices # similar to list[str]

text.split(
    separator=' ', # optional
    maxsplit=9223372036854775807, # optional
    **, # non-traditional arguments:
    keepseparator=False, # optional
) -> Slices # similar to list[str]

Development

rm -rf build && pip install -e . && pytest scripts/test.py -s -x

To benchmark on some custom file and pattern combination:

python scripts/bench.py --path "your file" --pattern "your pattern"

To validate packaging:

cibuildwheel --platform linux

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

stringzilla-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl (228.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl (223.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (106.6 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

stringzilla-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl (109.4 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

stringzilla-0.1.1-cp311-cp311-macosx_10_9_universal2.whl (214.5 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl (228.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl (223.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (106.7 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

stringzilla-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl (109.4 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

stringzilla-0.1.1-cp310-cp310-macosx_10_9_universal2.whl (214.6 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl (228.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl (223.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (106.7 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

stringzilla-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl (109.6 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

stringzilla-0.1.1-cp39-cp39-macosx_10_9_universal2.whl (214.8 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.1-cp38-cp38-manylinux_2_28_x86_64.whl (228.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

stringzilla-0.1.1-cp38-cp38-manylinux_2_28_aarch64.whl (223.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ ARM64

stringzilla-0.1.1-cp38-cp38-macosx_11_0_arm64.whl (106.6 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

stringzilla-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl (109.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

stringzilla-0.1.1-cp38-cp38-macosx_10_9_universal2.whl (214.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

stringzilla-0.1.1-cp37-cp37m-manylinux_2_28_x86_64.whl (229.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.28+ x86-64

stringzilla-0.1.1-cp37-cp37m-manylinux_2_28_aarch64.whl (225.9 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.28+ ARM64

stringzilla-0.1.1-cp37-cp37m-macosx_10_9_x86_64.whl (107.0 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

stringzilla-0.1.1-cp36-cp36m-manylinux_2_28_x86_64.whl (229.2 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.28+ x86-64

stringzilla-0.1.1-cp36-cp36m-manylinux_2_28_aarch64.whl (224.4 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.28+ ARM64

stringzilla-0.1.1-cp36-cp36m-macosx_10_9_x86_64.whl (107.0 kB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file stringzilla-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 726046a6f54447d3921219ecbaa1abc79efa70a3fbc549b84bf694f9a996198d
MD5 7be8b94dc0faa79fca4e479006d0c81f
BLAKE2b-256 bd7cdd4d5fe643323f123ac1da4fc37975009526569e4c7d4d4f34043fa3818e

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7ebadb540d181b4875fe67c91ee84ed6549401aedc77f320ef9cc289247b51e8
MD5 4f190e262f76e2198d83758b9db2fb64
BLAKE2b-256 2572769e2a79b2176a0a0610ff095a730fe92d45d7ae05daccdd7344f6941cad

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4b3e0bc54602aabda53c6a4c22fbc25abef9458fa4dc1f72e5356a5678b37786
MD5 7c09f76055889030206cf9e26a11ec92
BLAKE2b-256 e0c2bc80af8f2be4b52af164c8bfd9608d62c5ee8556b25a50b6552c9010e41a

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6663f30d7bb9a255e6298dcbec2636ff5a21286d04ba7a320cbf996631adb322
MD5 57f1265f97ccdd70c82d9afbad1441e8
BLAKE2b-256 5e486213c50dd4f415f8061ceae9a08b64af7bfc1f8ab196527eb3e8ac546f83

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 fddfb659988036bcb7eb99cfaed1b290c15c77209bf45adb9249738835449cd2
MD5 8936fa1dd4c9201bb4db3d2892ddbe0e
BLAKE2b-256 1ad954846642dd54ae1e4f80fc303388f402d5499ba9c48c73487b36e00f8b8d

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a7a54febb9de815a9ada5a6c369d8ab8c08ab4083a1fd26d261d99c641ece131
MD5 7edbb632d9d96118c2979cf91355ce51
BLAKE2b-256 30989136f92a4f99fc3488f915891392ee3a533ddcf7e5a9e4359841279c9236

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b8168af10727372e58b844bd3345fb280e847e7e54aafe5c38d5feb7796af810
MD5 ed51981c039ebbd0201f9b6cb177b3c4
BLAKE2b-256 4f261a8015efde2c025b5c5831d3c92f53285f94ed0fd308d16f05ec8c7ef774

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 65fe6e8e3e9008b3a4b1569067f84244dd24002745ccda0d0082c3bc53fa048e
MD5 e125fcf6c07c82279c5719781a514031
BLAKE2b-256 b576cd9e2ad5faead576bf9175c1c0663f2cac5bc450176106a4642c924393e8

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cdf1a79e3a581f2cef87cee61eabe55fca8e55dc622cba2dbec4a6b9b2c05b1c
MD5 4c90e24caf39a3d851b27232a085424b
BLAKE2b-256 80db871752aeaa357b230d1123ebcc5d636ba67d88956a88206a6bbb74356e82

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 e6f312bc22b8d8ad629f665f799c662ae8088218ae891087ea7cdf23623ae60d
MD5 bd4c87f7cb17718bb394047694eca871
BLAKE2b-256 60099d3d000325d048735119ec24b190244c1cc313b11d14d503b776a92e4ee5

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00beeb1c2d488edd225b1ee6c5701ff914dfb6148428615dcb6a3ee3b930c66a
MD5 df31c9572a04ac5cdc884fba679c7223
BLAKE2b-256 76718de2a4ba2edc65ec82cbde5257ef18f99f420a2810d55c302b26737bd7b6

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 32db10ea47a3a152cbd0e9b9b237f60372ddcefd537840b8de7f376a515d365c
MD5 372dfa4e140b022095fd31f6e48db9db
BLAKE2b-256 ec447e19c2c04b095ee94b28203a972d2c58db3e7b6f5fb93287b3c1afba6bcf

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7848596e8cbe0104b6bd4a3b81456f6c28a2f336e048893c0c4041fa3aff6eb9
MD5 2994a9f302b4160c9d66e0f49ff96c01
BLAKE2b-256 3f6572801842fb988ff7b14967f6afcc317378326abd953ca2789dd4cf445da9

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 35df7011217d061b7ead13e2135f11643a27bf7780e129930a3c310feef1a945
MD5 33a18af24e61ec20b082dd7be10aa710
BLAKE2b-256 7fcdd3ca1f81ad38d420515a1bd72a47aee9234c5174fbd3791778ca34c78dc6

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 44d6ca4ac665e111dc4e876dba90af3adb2d38183557b204c23cbe276a309c70
MD5 53a0e56e6470e424d6f27eb00d2373d6
BLAKE2b-256 cbec3a067cbc7ad3e530f4e6f19991951fb6909480e85b5c0f6c647c75d90db8

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9d86a99f44b8a36632d95454bca79ffc5978f5548164531cb8ded16e309ff28b
MD5 165cecfa3fb156a55b08fdf9bb6e9774
BLAKE2b-256 0a0c11ababfac60376d0cca9d3affcdadcd1cceeef3455a6ebce4e1664898620

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp38-cp38-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp38-cp38-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ba42f1da28c1bf50b3d1e34e410d1a01c6a816c5068c1747a26e3a528c8e5d44
MD5 2b333ba43c8784417b01949da2b8dc57
BLAKE2b-256 abc15c61bc907e79b6ea671fce1e6891fdf7cf6c22565a6ef5450dd1055ffc9e

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 99f76f0b7473037078a4d4bdb48241b7f6fb007c7a49cac6e5c78d735717fe2d
MD5 806bbb7575a7a95abb47f0201bd6b49b
BLAKE2b-256 c836e0bf7a38d2865c9dfc516e7fa869545942e23ec774f372261e7b43e17613

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cd332fb0088daa6afcbeeab66a8b8186f7e6568befa9f5f6481ae245d6767a75
MD5 02d8d6c378241af96acdc5dcee957fd2
BLAKE2b-256 b49e5c1c08a29fd8688e049cb5fe9772af30f9baf56c5d3c1f134f5a3e0533ee

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 46eb480e112645df717a2a5ba3d58b55790c25a2205792e2213affd9039461b8
MD5 d8f6502bdba4391cfbefd49d0dcf474b
BLAKE2b-256 7ee0a2be3b0a59ff8bc51ee952a9f640b4e62d0d3a3afbd698b060604ae43f9e

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp37-cp37m-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp37-cp37m-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b44925e8ce84e882263d1b4a83248de426bf0b7eaba3be071716ba5ba20d4bea
MD5 fb1c048aa7b7e9ac2dcb0b44a76f51bf
BLAKE2b-256 a7c2bd07b03766ea2b80eab32e8337b8d28799cbab54d69796310b18518a1f56

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp37-cp37m-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp37-cp37m-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 25472dc290bf33dc260bb475748b79c88e52f52b4967e520920ace976340f2b5
MD5 39a9f135bd256267723ce1700dbe43a6
BLAKE2b-256 c3c60c41fab4f8a706b45531ea3aa8e944fd34a7a83b37a583ff2d2563a750eb

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fe85363b1aefca73424200438acb93c121dda3d0962b0b78d7faf505d9094e3c
MD5 d29cd250f0abd7dac39d994854605418
BLAKE2b-256 1fc00ddef88789124cba5a8b153a543c19a5520c99319befeb05153ff074b705

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp36-cp36m-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp36-cp36m-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b8cbd4de5fd8e202dfdc652268e128e42f9317bcb46bc8ee0c5018d717dc70dc
MD5 21e982d642f9bb5b03da3c46c8d83e98
BLAKE2b-256 fdf6135aba73a17d77ef2c35b1fca76a85b148671871b2f3ca0aa86165a79e27

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp36-cp36m-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp36-cp36m-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 df730f30c08e4e66ded61858b1f1e30726d7428b677e5cca38b9aad3e8af77cc
MD5 0a20411094057b5f7090c6e2e1f01122
BLAKE2b-256 b40fc0fae8781602720b8ca09ecb64835f9eb3f96fe3f348709d99f00b0e2952

See more details on using hashes here.

File details

Details for the file stringzilla-0.1.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for stringzilla-0.1.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3aa0706408dd98e6c649ba3c1d8679a20b4955873622c4c8a1c3b556c660b399
MD5 09df453d8c9c47f2366620dedc5094d2
BLAKE2b-256 de07aa280b28611af266fb481515d0b4b2d51f7d921f9829858c557dbe68902d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page