Skip to main content

A lightweight Python library to read various data formats including PDF, images, YAML, and more.

Project description

Sandyie Logo

📚 Sandyie Read

PyPI version Downloads License Python Version

Effortlessly read files of almost any type — PDFs, images, YAML, CSV, Excel, and more — with built-in logging and custom exceptions.


⚠️ Python Compatibility

🐍 Requires Python 3.7 or higher


🔧 Features

  • ✅ Read and extract content from:
    • Documents → PDF, DOCX, TXT, HTML, Markdown
    • Data files → CSV, TSV, Excel, Parquet
    • Serialized files → Pickle, Models
    • Configs → JSON, YAML, JS
    • Archives → ZIP
    • Images → JPG, PNG, SVG (OCR-enabled)
  • 🧠 OCR support via Tesseract
  • 📋 Human-friendly logging
  • 🛡️ Consistent error handling with SandyieException

📦 Installation

# Upgrade pip and setuptools first
python -m pip install --upgrade pip setuptools

# Clear old caches (optional)
pip cache purge

# Install sandyie_read
pip install sandyie_read

🚀 Quick Start

from sandyie_read import read

# Example: Reading a PDF
data = read("example.pdf", pages = [0])
print(data)

📁 Supported File Types & Examples

1. 📄 Pickle / Model Files

data = read("model.pkl")
print(data)

🟢 Returns: A Python object / model container.


2. 🖼️ Images (PNG, JPG, SVG)

data = read("photo.jpg")
print(data)

🟢 Returns: OCR-extracted text as a string or NumPy array.


3. 📊 Parquet

data = read("data.parquet")
print(data)

🟢 Returns: pandas.DataFrame.


4. 📊 CSV / Excel

data = read("data.csv")
print(data)

🟢 Returns: pandas.DataFrame.


⚠️ Error Handling

All exceptions are wrapped in a custom SandyieException, making debugging simple and consistent.


🧪 Logging

Logs include:

  • File type detection
  • Success/failure reports
  • Processing details

📚 Documentation

📖 Full documentation (with API reference and usage notebooks) will be available soon at 👉 sandyie.in/docs


🗺️ Roadmap

  • Cloud Storage Support: Read files directly from S3, Azure Blob, and Google Cloud Storage.
  • Streaming Files: Process large files without loading the entire content into memory.
  • Improved Performance: Optimize parsing for various file formats.

🤝 Contributing

Got an idea or found a bug?

  • Open an Issue
  • Or submit a Pull Request 🚀

📄 License

Licensed under the MIT License. See LICENSE for details.


👤 Author

Sanju (aka Sandyie)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandyie_read-1.2.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sandyie_read-1.2.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file sandyie_read-1.2.1.tar.gz.

File metadata

  • Download URL: sandyie_read-1.2.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for sandyie_read-1.2.1.tar.gz
Algorithm Hash digest
SHA256 83eda23496ab5ee7a02f3d2667de0774edf9d307b3d42118347a5433baf9bdb9
MD5 3a7b7382ca36057a368ab71f81ee3c43
BLAKE2b-256 e6bb241f3fc4931c1e11e313d226cb0c91bcc58ff80a4891a23acc19955a7a2c

See more details on using hashes here.

File details

Details for the file sandyie_read-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: sandyie_read-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for sandyie_read-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 437259bef5fa87b7191efaf6054ba0b9f612652d7839c5d19e5fb638aaef833b
MD5 07a689310f24ac177b97f9dad6aa273b
BLAKE2b-256 57f4dca67b63744b33076d0a54e9955d3c4624a62fcfb6cb16b136b09c7d6805

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page