Skip to main content

A lightweight Python library to read various data formats including PDF, images, YAML, and more.

Project description

Great! Here's your updated README.md with:

  • ✅ Shields.io badges
  • 🔗 PyPI link
  • 📄 Auto-generated Docs section
  • 📬 Contribution and License section

# Sandyie Read 📚

[![PyPI version](https://img.shields.io/pypi/v/sandyie_read?color=blue)](https://pypi.org/project/sandyie-read/)
[![Downloads](https://img.shields.io/pypi/dm/sandyie_read)](https://pypi.org/project/sandyie-read/)
[![License](https://img.shields.io/github/license/sandyie/sandyie-read)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.7%2B-blue.svg)](https://www.python.org/downloads/)

**Sandyie Read** is a lightweight Python library that helps you effortlessly read and extract data from a variety of file formats including PDF, images (JPG, PNG), YAML, and more — all with clean logging and custom exception handling.

---

## 🔧 Features

- ✅ Read and extract content from:
  - PDF (text-based and scanned with OCR)
  - Image files (JPG, PNG)
  - YAML files
  - Text files
  - CSV, Excel (if supported)
- 🧠 OCR support for scanned documents using Tesseract
- 📋 Clean, human-readable logging
- 🛡️ Custom exception handling (via `SandyieException`)

---

## 📦 Installation

```bash
pip install sandyie_read

🚀 Quick Start

from sandyie_read import read

data = read("example.pdf")
print(data)

📁 Supported File Types & Examples

1. 📄 PDF (Text-based or Scanned)

data = read("sample.pdf")
print(data)

🟢 Returns: A string containing all extracted text. OCR is auto-applied to scanned PDFs.


2. 🖼️ Image Files (PNG, JPG)

data = read("photo.jpg")
print(data)

🟢 Returns: A string of extracted text using OCR (via Tesseract).


3. ⚙️ YAML Files

data = read("config.yaml")
print(data)

🟢 Returns: A dictionary representing the parsed YAML structure.


4. 📄 Text Files (.txt)

data = read("notes.txt")
print(data)

🟢 Returns: A string containing the full content of the file.


5. 📊 CSV Files

data = read("data.csv")
print(data)

🟢 Returns: A pandas.DataFrame of structured tabular data.


6. 📈 Excel Files (.xlsx, .xls)

data = read("report.xlsx")
print(data)

🟢 Returns: A pandas.DataFrame or a dictionary of DataFrames (if multiple sheets exist).


⚠️ Error Handling

All exceptions are wrapped in a custom SandyieException class, providing clean and traceable messages.


🧪 Logging

Logs show:

  • File type detection
  • Successful/failed read attempts
  • Detailed file handling info

📚 Auto-Generated Docs

Coming soon at: https://sandyie.in/docs Will include:

  • API Reference
  • Exception documentation
  • Usage notebooks

🤝 Contribution

Found a bug or want a new feature? Feel free to create an issue or submit a PR.


📄 License

This project is licensed under the MIT License – see the LICENSE file for details.


📬 Author

Sanju (aka Sandyie) 📧 Email: dksanjay39@gmail.com 🔗 Portfolio: https://sandyie.in 🐍 PyPI: https://pypi.org/project/sandyie-read


---

Let me know if you'd like:
- A `docs/` folder setup with `mkdocs` or `Sphinx`
- GitHub Actions for automated PyPI deployment
- Jupyter notebooks or Colab demos linked

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandyie_read-0.4.6.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sandyie_read-0.4.6-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file sandyie_read-0.4.6.tar.gz.

File metadata

  • Download URL: sandyie_read-0.4.6.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for sandyie_read-0.4.6.tar.gz
Algorithm Hash digest
SHA256 90b5dfe05332a8a22dd3d8a70a828f688931956e941f88e9b4a621cc5cc5c52a
MD5 69c3d80a520d381dc5d8a93bb908adb0
BLAKE2b-256 b136b1dae780bdbf344ac08c7379865590de33622ba887042cc9e91689046fd0

See more details on using hashes here.

File details

Details for the file sandyie_read-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: sandyie_read-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for sandyie_read-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 63ceb5e80074faa17bf0c2682f901258f91a99bf8a88a74aa3e38fd105aefac5
MD5 2f41a1ec4532885346752e2ad7361fc6
BLAKE2b-256 04e2697604fe4767b00bf2312de4de8c951c0c742a0be28a6553ee9fddf13a6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page