Skip to main content

Paper - Pytorch

Project description

Multi-Modality

Doc Master 📚

Join our Discord Subscribe on YouTube Connect on LinkedIn Follow on X.com

PyPI version License: MIT Python 3.8+ Discord

A powerful, lightweight Python library for automated file reading and content extraction. Doc Master simplifies the process of reading various file formats into string representations, making it perfect for data processing, content analysis, and document management systems.

🚀 Features

  • Universal File Reading: Seamlessly handle multiple file formats including:

    • PDF documents
    • Microsoft Word documents (.docx)
    • Excel spreadsheets
    • Text files
    • XML documents
    • Images (with base64 encoding)
    • Binary files
  • Smart Format Detection: Automatic file type detection and appropriate processing

  • Flexible Output: Choose between string or dictionary output formats

  • Batch Processing: Process entire folders of documents efficiently

  • Encoding Detection: Smart encoding detection for text files

  • Enterprise-Ready: Built with stability and performance in mind

📦 Installation

pip install -U doc-master

🔧 Quick Start

from doc_master import doc_master

# Read all files in a folder
results = doc_master(folder_path="path/to/folder", output_type="dict")

# Or read a single file
content = doc_master(file_path="path/to/file.docx")

📋 Requirements

  • Python 3.8+
  • pandas
  • pypdf
  • python-docx
  • Pillow

🤝 Contributing

We love your input! We want to make contributing to Doc Master as easy and transparent as possible. Here's how you can help:

  1. Fork the repo
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Check out our Contributing Guidelines for more details.

🌟 Support the Project

If you find Doc Master useful, please consider:

  • Starring the repository ⭐
  • Following us on GitHub
  • Joining our Discord community
  • Sharing the project with others

📖 Documentation

For detailed documentation, visit our Wiki.

Basic Usage Examples

# Read a PDF file
content = read_single_file("document.pdf")

# Read an Excel file with specific sheet
reader = AutoFileReader()
content = reader.read_file("spreadsheet.xlsx", sheet_name="Data")

# Process a folder of documents
results = doc_master(
    folder_path="documents/",
    output_type="dict"
)

🔍 Error Handling

The library includes comprehensive error handling:

try:
    content = read_single_file("file.pdf")
except Exception as e:
    print(f"Error processing file: {e}")

🛣️ Roadmap

  • Add OCR capabilities for image processing
  • Support for additional file formats
  • Performance optimizations for large files
  • Async file processing
  • CLI interface

💬 Community and Support

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • All our amazing contributors
  • The open-source community
  • The Swarm Corporation team

Made with ❤️ by The Swarm Corporation

⭐ Star us on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_master-0.0.2.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

doc_master-0.0.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file doc_master-0.0.2.tar.gz.

File metadata

  • Download URL: doc_master-0.0.2.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.3.0

File hashes

Hashes for doc_master-0.0.2.tar.gz
Algorithm Hash digest
SHA256 422757a56ef07f03a58088378c5ccdb3ce7931d45c12c4bbe4f92b51f5d33e46
MD5 e0cc89bd789a97438ccf601793cf0912
BLAKE2b-256 007742cbb470dad68887ba58ef23c182f5af145f8f65de010df8eb3002751f41

See more details on using hashes here.

File details

Details for the file doc_master-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: doc_master-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.3.0

File hashes

Hashes for doc_master-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ab872c5eda063c306a55242ab2e18cb5dc34ec47a23a050d0a6d54146a8eb53
MD5 9412114ab30c4915d323076bec35475b
BLAKE2b-256 4fa284e87aa54f52aaf749da5aa8a99f1d1ec626c19af60d737e757fd98c3920

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page