Skip to main content

High-performance text parser with smart chunking and multiple file format support

Project description

CrabParser

A high-performance text parsing library written in Rust with Python bindings.

Features

  • Fast semantic text chunking
  • Respects paragraph and sentence boundaries
  • Configurable chunk sizes
  • Written in Rust for optimal performance
  • Easy-to-use Python API

Installation

pip install crabparser

Usage

from crabparser import TextParser

# Create parser with custom settings
parser = TextParser(
    chunk_size=500,
    respect_paragraphs=True,
    respect_sentences=True
)

# Parse text
text = "Your long text here..."
chunks = parser.parse(text)

# Parse file
chunks = parser.parse_file("document.txt")

# Save chunks to files
parser.save_chunks(chunks, "output_dir", "base_name")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crabparser-0.1.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crabparser-0.1.0-cp39-abi3-manylinux_2_34_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

File details

Details for the file crabparser-0.1.0.tar.gz.

File metadata

  • Download URL: crabparser-0.1.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for crabparser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d73ff93775c7aab3d4427c9930b350960d099915449f6a1acc68a4bd89eeadcd
MD5 93fbc3028d6d945cf06de76b8e1db2c1
BLAKE2b-256 391e7e0d2b8aeff68f17c40c27757e1f414c1b3a558c6af87509a0e023e7df3d

See more details on using hashes here.

File details

Details for the file crabparser-0.1.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for crabparser-0.1.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b664d0999b62da5aeb5551920fe215282e65556edfd0d94a5a0264c7e7e75e1b
MD5 f3ee2f451561eafd5d4be7e7571c26b6
BLAKE2b-256 3c6e34143052b8eb7eb7e39b33550d5260e942a95e5a3b9c584b0e66dc2cb165

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page