No project description provided
Project description
DocFusion
DocFusion is a Python library for deep document visual understanding. It provides a unified interface for a suite of tasks like layout detection, OCR, table extraction, reading order detection, and more. By abstracting the complexities of setting up pipelines across different libraries and models, DocFusion makes it easier than ever to integrate and optimize document analysis workflows.
🚀 Why DocFusion?
Working with multiple document analysis tools can be challenging due to differences in APIs, outputs, and data formats. DocFusion addresses these pain points by:
- Unifying APIs: A consistent interface for all tasks, irrespective of the underlying library or model.
- Pipeline Optimization: Pre-built, customizable pipelines for end-to-end document processing.
- Interoperability: Smooth integration of outputs from different models into cohesive workflows.
- Ease of Use: Focus on high-level functionality without worrying about the underlying complexities.
✨ Features
- Layout Detection: Identify the structure of documents with popular models and tools.
- OCR: Extract text from images or scanned PDFs with support for multiple OCR engines.
- Table Extraction: Parse and extract data from tables in documents.
- Reading Order Detection: Determine the logical reading sequence of elements.
- Custom Pipelines: Easily configure and extend pipelines to meet specific use cases.
- Scalability: Built to handle large-scale document processing tasks.
🔧 Installation
Prerequisites
- Python 3.8 or higher
pip
package manager
To install DocFusion, run:
pip install docfusion
🛠️ Getting Started
Here's a quick example to demonstrate the power of DocFusion:
from docfusion import DocFusion
# Initialize DocFusion
docfusion = DocFusion()
# Load a document
doc = docfusion.load_document("sample.pdf")
# Load a images
# doc = docfusion.load_image("sample.png")
# Detect layout
layout = docfusion.detect_layout(doc)
# Perform OCR
text = docfusion.extract_text(doc)
# Extract tables
tables = docfusion.extract_tables(doc)
# Print results
print("Layout:", layout)
print("Text:", text)
print("Tables:", tables)
📚 Supported Models and Libraries
DocFusion integrates seamlessly with a variety of popular tools, including:
(will be updated soon)
🏗️ How It Works
DocFusion organizes document processing tasks into modular components. Each component corresponds to a specific task and offers:
- A Unified Interface: Consistent input and output formats.
- Model Independence: Switch between libraries or models effortlessly.
- Pipeline Flexibility: Combine components to create custom workflows.
📈 Roadmap
- Add support for semantic understanding tasks (e.g., entity extraction).
- Integrate pre-trained transformer models for context-aware document analysis.
- Expand pipelines for multilingual document processing.
- Add CLI support for batch processing.
🤝 Contributing
We welcome contributions to DocFusion! Here's how you can help:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes and open a pull request.
For more details, refer to our CONTRIBUTING.md.
🛡️ License
This project is licensed under multiple licenses, depending on the models and libraries you use in your pipeline. Please refer to the individual licenses of each component for specific terms and conditions.
🌟 Support the Project
If you find DocFusion helpful, please give us a ⭐ on GitHub and share it with others in the community.
🗨️ Join the Community
For discussions, questions, or feedback:
- Issues: Report bugs or suggest features here.
- Email: Reach out at adithyaskolavi@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file docfusion_ai-0.1.0.tar.gz
.
File metadata
- Download URL: docfusion_ai-0.1.0.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.0-1042-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b6055865b7fb4160787b6ea0471e009bc42e248e9b17c928b743450929653f0 |
|
MD5 | 4756bfe30bcc117bd846d6c537df9fc9 |
|
BLAKE2b-256 | 05dfc4cac8aa3e405dec209d5efe66cc67e18674456bc0d465f1590115dfdb2f |
File details
Details for the file docfusion_ai-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: docfusion_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.0-1042-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 807d8dd62d6dccb990b93712c3d5a4f6413b79a5ed97af37330d527b66095f77 |
|
MD5 | 060f63de11ae2c4c4ec8ae5015c62054 |
|
BLAKE2b-256 | 1fce26e7e5e11f469006c88820ca81104a6bfac020d95a708a70d6baebb36b93 |