Intelliparse is your all-in-one solution to extract text, images, tables, and metadata from **various file formats** - from common documents to complex CAD drawings. Powered by AI for intelligent content understanding.
Project description
🧠 Intelliparse
Smart File Parsing & Content Extraction Made Simple
Intelliparse is your all-in-one solution to extract text, images, tables, and metadata from various file formats - from common documents to complex CAD drawings. Powered by AI for intelligent content understanding. 🚀
from intelliparse.parsers import FileParser
from intelliparse.types import RawFile
# Parse any file with AI-powered insights
file = RawFile.from_path("contract.pdf")
parser = FileParser()
parsed_file = parser.parse(file)
print(f"🔍 Found {len(parsed_file.sections)} sections!")
print(f"📄 Text: {parsed_file.sections[0].text[:200]}...")
🌟 Features
- Common File Formats supported (PDF, DOCX, PPT, Images, Audio, Video, CAD, and more)
- AI-Powered Insights - Automatic image descriptions, audio transcriptions, and content analysis
- Military-Grade Extraction (WIP) - Text, tables, images, metadata, and document structure
- Easy Extension - Add custom parsers in <10 lines of code
📦 Installation
# Install core library
pip install intelliparse
# Install system dependencies (choose your OS)
# Ubuntu/Debian
sudo apt-get install libmagic1
# macOS
brew install libmagic
# Windows (via Chocolatey)
choco install magic
🚀 Basic Usage
Parse Any File
file = RawFile.from_bytes(b"file content", "secret_data.xlsx")
parsed = FileParser().parse(file) # ParsedFile
for section in parsed.sections:
print(f"Section {section.number}:")
print(f"- Text: {section.text[:100]}...")
print(f"- Found {len(section.images)} images!")
Extract Tables
table_data = parsed.sections[0].items[0]
if isinstance(table_data, TablePageItem):
print("📊 Perfect Table Found!")
print("\n".join(table_data.csv.split("\n")[:3]))
🔍 Advanced Usage
AI-Powered Parsing
from intellibricks.agents import Agent
from intellibricks.llms import TextTranscriptionSynapse, Synapse
from intellibricks.llms.types import (
GenerationConfig,
ChainOfThought,
VisualMediaDescription,
AudioDescription
)
# Use AI to describe images and transcribe audio
parser = FileParser(
strategy="high",
visual_description_agent=Agent(
task="Detailed description of visual elements.",
instructions=[
"Describe the provided visual elements in a"
"detailed manner, following the instructions."
"Descriptions must be in Portuguese.",
],
metadata={
"name": "Visual Elements Descriptor",
"description": "Description of visual elements in Portuguese.",
},
synapse=Synapse.of("google/genai/gemini-1.5-flash"),
response_model=ChainOfThought[VisualMediaDescription],
output_language="en",
generation_config=GenerationConfig(timeout=60, max_retries=1),
),
audio_description_agent=Agent(
task="Audio transcription",
instructions=[
"Transcribe the provided audio in a"
"clear and precise manner, following the instructions."
"Transcriptions must be in Portuguese.",
],
metadata={
"name": "Audio Transcriber",
"description": "Audio transcription in Portuguese.",
},
synapse=Synapse.of("google/genai/gemini-1.5-flash"),
audio_transcriptions_synapse=TextTranscriptionSynapse.of(
"groq/api/whisper-large-v3-turbo"
),
response_model=ChainOfThought[AudioDescription],
),
)
parsed = parser.parse(RawFile.from_path("presentation.mp4"))
print(f"📽 Video Description: {parsed.md}")
📚 Supported Formats
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, PPTX, XLSX, TXT, XML |
| Images | PNG, JPG, TIFF, BMP, GIF, SVG, WEBP, |
| Audio/Video | MP3, WAV, FLAC, AAC, MP4, AVI, MOV, |
| CAD/Design | DWG |
| Archives | ZIP, RAR, 7Z, TAR, GZ |
| Specialized | PKT (Cisco - TODO), |
🤝 Contributing
We welcome contributors! To get started:
git clone https://github.com/arthurbrenno/intelliparse.git
cd intelliparse
uv sync
Run tests (TODO. Will work like this):
pytest tests/ --verbose
📜 License
Apache 2.0 - Made with ❤️ by Arthur Brenno
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intelliparse-0.0.5.tar.gz.
File metadata
- Download URL: intelliparse-0.0.5.tar.gz
- Upload date:
- Size: 69.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55d1de3fc3c47e4263f25d07d497ece1090d7fea596b5bad66fb8d47d96a3cde
|
|
| MD5 |
c198c08ace241ea1280cc4c34335741a
|
|
| BLAKE2b-256 |
797578b57c1abf1010e97de143dfb89f7745f5640cd991aae4300f87a525a423
|
File details
Details for the file intelliparse-0.0.5-py3-none-any.whl.
File metadata
- Download URL: intelliparse-0.0.5-py3-none-any.whl
- Upload date:
- Size: 35.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f3cfc15fdbfab05b00d76a1bf6ae55aef7fcc91b4f41f7559809751c2aa500b
|
|
| MD5 |
029cee94c62158a7318c7aaa1b30be43
|
|
| BLAKE2b-256 |
b66ea18aef6e44cec8d2e084970e402d80a978430c21972bec85bd210a3f0eda
|