Skip to main content

No project description provided

Project description

DocFusion

DodFusion Banner

DocFusion is a Python library for deep document visual understanding. It provides a unified interface for a suite of tasks like layout detection, OCR, table extraction, reading order detection, and more. By abstracting the complexities of setting up pipelines across different libraries and models, DocFusion makes it easier than ever to integrate and optimize document analysis workflows.

🚀 Why DocFusion?

Working with multiple document analysis tools can be challenging due to differences in APIs, outputs, and data formats. DocFusion addresses these pain points by:

  • Unifying APIs: A consistent interface for all tasks, irrespective of the underlying library or model.
  • Pipeline Optimization: Pre-built, customizable pipelines for end-to-end document processing.
  • Interoperability: Smooth integration of outputs from different models into cohesive workflows.
  • Ease of Use: Focus on high-level functionality without worrying about the underlying complexities.

✨ Features

  • Layout Detection: Identify the structure of documents with popular models and tools.
  • OCR: Extract text from images or scanned PDFs with support for multiple OCR engines.
  • Table Extraction: Parse and extract data from tables in documents.
  • Reading Order Detection: Determine the logical reading sequence of elements.
  • Custom Pipelines: Easily configure and extend pipelines to meet specific use cases.
  • Scalability: Built to handle large-scale document processing tasks.

🔧 Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

To install DocFusion, run:

pip install docfusion

🛠️ Getting Started

Here's a quick example to demonstrate the power of DocFusion:

from docfusion import DocFusion

# Initialize DocFusion
docfusion = DocFusion()

# Load a document
doc = docfusion.load_document("sample.pdf")
# Load a images
# doc = docfusion.load_image("sample.png")

# Detect layout
layout = docfusion.detect_layout(doc)

# Perform OCR
text = docfusion.extract_text(doc)

# Extract tables
tables = docfusion.extract_tables(doc)

# Print results
print("Layout:", layout)
print("Text:", text)
print("Tables:", tables)

📚 Supported Models and Libraries

DocFusion integrates seamlessly with a variety of popular tools, including:

(will be updated soon)

🏗️ How It Works

DocFusion organizes document processing tasks into modular components. Each component corresponds to a specific task and offers:

  1. A Unified Interface: Consistent input and output formats.
  2. Model Independence: Switch between libraries or models effortlessly.
  3. Pipeline Flexibility: Combine components to create custom workflows.

📈 Roadmap

  • Add support for semantic understanding tasks (e.g., entity extraction).
  • Integrate pre-trained transformer models for context-aware document analysis.
  • Expand pipelines for multilingual document processing.
  • Add CLI support for batch processing.

🤝 Contributing

We welcome contributions to DocFusion! Here's how you can help:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Commit your changes and open a pull request.

For more details, refer to our CONTRIBUTING.md.

🛡️ License

This project is licensed under multiple licenses, depending on the models and libraries you use in your pipeline. Please refer to the individual licenses of each component for specific terms and conditions.

🌟 Support the Project

If you find DocFusion helpful, please give us a ⭐ on GitHub and share it with others in the community.

🗨️ Join the Community

For discussions, questions, or feedback:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docfusion_ai-0.1.1.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

docfusion_ai-0.1.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file docfusion_ai-0.1.1.tar.gz.

File metadata

  • Download URL: docfusion_ai-0.1.1.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for docfusion_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 423656aaddc7bda09508fe3ba15e5dfed78afb89a81556a8231ab9f11458ecf7
MD5 8204bdac7e3f0e31cab671ba7ba259e8
BLAKE2b-256 6bbb0bea239af1d59fb30876ab33db120d2a25aaeeccbf2161e79dc493e955d6

See more details on using hashes here.

File details

Details for the file docfusion_ai-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: docfusion_ai-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for docfusion_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85ea1d94a4e9686cd5fe8b70235481fd7fb97034ca03bd654ee45c537d520876
MD5 7eebb6d17bad1fc570d2211862155bc5
BLAKE2b-256 49e2a0856d919b912e1a3cf070a6dc9419a0a6314f75d77d6606b72de760f7d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page