Python wrapper for LiteParse - fast, lightweight PDF and document parsing
Project description
LiteParse Python
Python wrapper for LiteParse - fast, lightweight document parsing with optional OCR.
Installation
pip install liteparse
Prerequisites: The LiteParse Node.js CLI must be installed:
npm install -g liteparse
# or
npx liteparse --version
Quick Start
from liteparse import LiteParse
# Create parser
parser = LiteParse()
# Parse a document
result = parser.parse("document.pdf")
print(result.text)
# Access structured data
for page in result.pages:
print(f"Page {page.pageNum}: {len(page.textItems)} text items")
Configuration
from liteparse import LiteParse
parser = LiteParse()
result = parser.parse(
"document.pdf",
ocr_enabled=False,
max_pages=10,
dpi=150,
preserve_small_text=True,
)
print(result.text)
Batch Processing
For parsing multiple files, batch mode is significantly faster as it reuses the PDF engine:
from liteparse import LiteParse
parser = LiteParse(ocr_enabled=False)
# Parse all documents in a directory
result = parser.batch_parse(
input_dir="./documents",
output_dir="./output",
recursive=True, # Include subdirectories
extension_filter=".pdf", # Only PDF files
)
print(f"Parsed {result.success_count} files in {result.total_time_seconds}s")
print(f"Average: {result.avg_time_ms}ms per file")
Supported Formats
- PDF (
.pdf) - Microsoft Office (
.docx,.xlsx,.pptx, etc.) - requires LibreOffice - OpenDocument (
.odt,.ods,.odp) - requires LibreOffice - Images (
.png,.jpg,.tiff, etc.) - requires ImageMagick - And more!
Performance Tips
-
Disable OCR if your documents have selectable text:
parser = LiteParse(ocr_enabled=False)
-
Use batch mode for multiple files to avoid cold-start overhead:
parser.batch_parse("./input", "./output")
-
Limit pages if you only need specific pages:
result = parser.parse("doc.pdf", target_pages="1-5")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file liteparse-1.0.1.tar.gz.
File metadata
- Download URL: liteparse-1.0.1.tar.gz
- Upload date:
- Size: 28.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
971c31d77aea18809e07163157b2fc4ef103a13d2eea570d2cf0926dfdf4fcc8
|
|
| MD5 |
971a178d0f854b7370ff2fc856af35c3
|
|
| BLAKE2b-256 |
564225bd4109f1733e51fc9c47e360cc26e2c9abbfbd4f06f67bdcf4e7a27fca
|
File details
Details for the file liteparse-1.0.1-py3-none-any.whl.
File metadata
- Download URL: liteparse-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ea9569fffd3015bc545187c45c1368b038f7e2c4446d5a9ffe46605969d9dfb
|
|
| MD5 |
834375072838b377fdb8abafa5da983e
|
|
| BLAKE2b-256 |
a8fd6bf2b0fc2b36672f69be58221f5866fbc6007c18d428cc2b91932ceabcc8
|