Fast TF-IDF vectorization with Rust-backed preprocessing
Project description
is-it-slop-preprocessing
Fast TF-IDF text vectorization for training AI text detection models.
Implementation in Rust with Python bindings.
The python bindings allow us to use the same Rust-based text preprocessing at training and inference time.
Features
- Token n-grams: Uses tiktoken BPE token sequences (not characters/words)
- sklearn-compatible API: Drop-in replacement for training pipelines
- Parallel processing: Automatic multi-threading via Rust/rayon
Installation
pip install is-it-slop-preprocessing
Quick Start
from is_it_slop_preprocessing import TfidfVectorizer, VectorizerParams
# Configure vectorizer
params = VectorizerParams(
ngram_range=(3, 5), # 3-5 token n-grams
min_df=10, # Ignore terms in < 10 docs
max_df=0.8, # Ignore terms in > 80% of docs
sublinear_tf=True # Apply log scaling to term frequencies
)
# Fit and transform training data
vectorizer, X_train = TfidfVectorizer.fit_transform(train_texts, params)
# Transform test data
X_test = vectorizer.transform(test_texts)
# Save vectorizer for inference
vectorizer.save("tfidf_vectorizer.bin")
API Overview
VectorizerParams
Configuration for text processing:
ngram_range: Tuple of (min_n, max_n) for token n-gram rangemin_df: Minimum document frequency (proportion or count)max_df: Maximum document frequency (proportion or count)sublinear_tf: Apply1 + log(tf)scaling
TfidfVectorizer
Main vectorizer class:
fit_transform(texts, params): Fit and transform in one pass (faster)fit(texts, params): Fit vocabulary onlytransform(texts): Transform to TF-IDF matrixsave(path): Save to bincode formatload(path): Load from bincode format
Why Token N-grams?
Unlike character n-grams or word n-grams, this uses sequences of BPE tokens:
- With
ngram_range=(3,5), extracts 3-5 consecutive tiktoken tokens - Better captures AI patterns spanning multiple sub-word units
- More compact vocabulary than character n-grams
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file is_it_slop_preprocessing-0.4.0.tar.gz.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0.tar.gz
- Upload date:
- Size: 32.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b90d5bd0c46138bc31b4c7b66d23267cb6da1082634d288dfe826f0ba2c200cf
|
|
| MD5 |
e6e788673afd79771ffb49abe5a3e4a5
|
|
| BLAKE2b-256 |
2c4e8bffd0bae7759cc9f6e891927eefda9d2a7b6d7ce75387f2969a4b4bd96b
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: PyPy, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8bad909e0167e0ce10367a20fa0d70980f16fc1b7adb33ae6494679eb47da04
|
|
| MD5 |
e70faf99093914f6a67db513d80ec846
|
|
| BLAKE2b-256 |
10afa3f7d9aecc9de76fec62e5f2b3d09c022d1fe4faf47700b8f0516eec6bdc
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: PyPy, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42f9279d96e0f308bd635c1385d0096c7c723d398de8292630ae85e4cb5f460d
|
|
| MD5 |
abdf375d2e7efa0a6addb56998473d1d
|
|
| BLAKE2b-256 |
36e5e6c095bb1e81296ebe702039c9531363f8268fa7e440bbae1586bf416d09
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp314-cp314t-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp314-cp314t-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.14t, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a55af57d5fa0defba3562e233d4fef8aa4c3a73d7cb67734a469ca31d90a4ae
|
|
| MD5 |
e2316402f65276f99100602f393af3f9
|
|
| BLAKE2b-256 |
f94d01df2ceb1adeda8eef9f83781db74087c6ccc3f03de6e3fcf5a4cd8fd271
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp314-cp314-win_amd64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp314-cp314-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.14, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2536df4788094d5fdac6ad8e7066188ae4d9008268567aef44d8fe4c5622fcbb
|
|
| MD5 |
dbcdbd811a0824f6c0c005028a768ca6
|
|
| BLAKE2b-256 |
c727cd7a5342321b3ed4af87bc722d4c476cc9e6ab0acac17a07534310b0c67d
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp314-cp314-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp314-cp314-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.14, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dce8d98b3d0a08c245a1f919c6e9e137f00d14c67ee540ef98c2faa1a8e8e951
|
|
| MD5 |
79fc0b09f95ea984e5eedf35aa79099c
|
|
| BLAKE2b-256 |
4cdd5d81fa2b9ffe2cdd1219d454ef28ab5218b5c617a14e5c66a1c47b96d634
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp314-cp314-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp314-cp314-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.14, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5edb37e73376fd1c3e483e4ef4ef3906fca8cf850ade46d51ae2bcc3dfa827c
|
|
| MD5 |
f560cdc1ac2c43dcd4d9081cf9577d20
|
|
| BLAKE2b-256 |
df86966c6174f2319800aec10af4676a8a6c3cc43c19742301a4e02610749540
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp314-cp314-macosx_11_0_arm64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp314-cp314-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.14, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
922e23f3b310f5f4d3e25038bc73b741744a1d8291accc152a5a67f4b086d44d
|
|
| MD5 |
d95d32da69ab8d645727a3426b79f49c
|
|
| BLAKE2b-256 |
a9490d6fea11cb352c88786aeee2ee68809a890335bc60fe20f92269b99974e8
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp313-cp313t-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp313-cp313t-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.13t, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25d9ab11b85e59f1c6ca11ac979a3dadba282b9d1a3a9967601405fdb5df274d
|
|
| MD5 |
a682c00f1b870ba39b7548a5d192cd37
|
|
| BLAKE2b-256 |
a2b2e4437e26b17475921d19ceeb1d5b2d5e0a1f05c37f0e507e8860116f4842
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71c880d5830988042c11ca1a464fbafc42306a4a6ae35ff11817e62c8d3d09e1
|
|
| MD5 |
38fba44d08014d5ecdf45e88ef7ad09f
|
|
| BLAKE2b-256 |
817fcb11df7185911f8c7908c62d860fb041bcf37a41e03cdda5b208030aa04f
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp313-cp313-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp313-cp313-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
978534338fe68f3bdbbb298376626d519459f2356044bc64aaa19001e5eabc5a
|
|
| MD5 |
f76b94bf2a470a340d6f4c95beaa72f7
|
|
| BLAKE2b-256 |
06b2f602e287d7d34b3c468accdec7d1fcc00243358c9dc7e12c15f5ae6bb829
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp313-cp313-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp313-cp313-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.13, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b854aff2aa52c84b6c96fb8f49a27cccc33656c10cac85b5fa004509b5f27bd
|
|
| MD5 |
0f217346a5e78140f475f2f8b5dc19f6
|
|
| BLAKE2b-256 |
d623cf2e0b389f044bbfaa710d4f84ff1e083d60aa06ab8208c1d0e5523f2ee3
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
796d82d0ef26aef9511e892dadec49d0c7fdf77d2de501233e2c50c619e45c07
|
|
| MD5 |
8dbbeaf7f4366d418486ef03174ec310
|
|
| BLAKE2b-256 |
ce39bb3d47bd3cb6747e6796e984a9c2c4694845352ad2712422a2228da9e217
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.13, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d68800deac5a66c09f9d2abc3aed5ae4d21e3f918f079871fc86755a58f69ba
|
|
| MD5 |
f527084ac90ee119075ca4fbdaf7fce5
|
|
| BLAKE2b-256 |
acdd2cc7a39828ff88deba16531094d105496d9f3c4b3ff848882cd3dea79f76
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d740dcd5dd0d819ca8c63e78b087333219004cc165d8c690d9826fbf19a5b93
|
|
| MD5 |
80dd83ce7696a077fa92339d39765ef6
|
|
| BLAKE2b-256 |
a2ea02c9594fb817f574c8a4c939c8e8dfce5549905c7ac2dc65490f33749197
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf4c24aeb5d42d2bd1b7e3a01b780a56ac4c24abf12224e5ac5eb3420129b728
|
|
| MD5 |
3988965e352f397caa33caffa80b3bb9
|
|
| BLAKE2b-256 |
a385e5224307d62465974122cb1a425a260557809bc70a527728ddcdb8467f7f
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp312-cp312-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp312-cp312-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa4b6f7f280234a2a03815fe6b7451515e241eda4d3c3555b505866789b63dd1
|
|
| MD5 |
3785f44dfbd13cb8b453618968738f0b
|
|
| BLAKE2b-256 |
35167a48acb08df1d7b135020eb9dc74f5de783c4f96b427601d87279b9a456e
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc80094289e271da93b0b0f6d64e75b46a94e857b373d78503a56f2ad1e3d43e
|
|
| MD5 |
87e8b97c2932572bb290d025598e1afd
|
|
| BLAKE2b-256 |
14635ae5cfaadacad675e4bbff59f62d9cc3e32bb0bdd744365817b91a91b97d
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4faf1b3ce12fc674578ae653cd786c665f2ade3292ac17cd10d5f24351021f17
|
|
| MD5 |
ce03d96ffd68a4f21f61db65eca0a88c
|
|
| BLAKE2b-256 |
80bb7c53ca4f7b7f51ffdb922eb1d7b8033f6a94e9e9da5a5c0f09cfe49c7dd6
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b913043d76da252f5fc4e9b48a844fe0c4bee88eacb55c5de09b196448d1407
|
|
| MD5 |
e8c627d1a874b0cdeb4143fb7a512ef2
|
|
| BLAKE2b-256 |
7a3549577aa249c1f044056644f2c3ce91b24b9db00f4d1a4c53d844acaddc9f
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11387a3d1dc9cd3305c8c283718767c4ba28fee03b0f700191c75d36bac7c93c
|
|
| MD5 |
0c98dcbb1302aafdf77d814ee38f029a
|
|
| BLAKE2b-256 |
2f5271c8633d9036ddee4b8075a5ff73d2857c31cd093017b193287a75de8a67
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp311-cp311-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
775a2ea2bbace31952ec8504240d713595117a1c89d7db766a180681b043ae9f
|
|
| MD5 |
061d8754acef27ff60ec7a15b11a238a
|
|
| BLAKE2b-256 |
56d73c98391f23833572876d060d0a14815537274aa8f02e6bbbad3d2be1a31f
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0103439057649649fdaea68892c5d74049edb7d96b3d382346bc6177fc2ac9f9
|
|
| MD5 |
e3ca395673a18fba95002c7174c51708
|
|
| BLAKE2b-256 |
b890d72dda2e9b9fedc1f5de3699637cad256b2dd2a62351f21127407cfd3625
|
File details
Details for the file is_it_slop_preprocessing-0.4.0-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: is_it_slop_preprocessing-0.4.0-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cfb79fa3e67dc70752258d3018b9defe8d86ff44344fcbab6616ea15fe17842
|
|
| MD5 |
5c2fbd2d8eacc65b0fa9c13809987d49
|
|
| BLAKE2b-256 |
68eb0c3dc2e9faf4a2e7978bb804b19b8ac62dbc982956bcf2335169eb9aaf48
|