A lightweight Python library to read various data formats including PDF, images, YAML, and more.
Project description
📚 Sandyie Read
Effortlessly read files of almost any type — PDFs, images, YAML, CSV, Excel, and more — with built-in logging and custom exceptions.
⚠️ Python Compatibility
🐍 Requires Python 3.7 or higher
🔧 Features
- ✅ Read and extract content from:
- Documents → PDF, DOCX, TXT, HTML, Markdown
- Data files → CSV, TSV, Excel, Parquet
- Serialized files → Pickle, Models
- Configs → JSON, YAML, JS
- Archives → ZIP
- Images → JPG, PNG, SVG (OCR-enabled)
- 🧠 OCR support via Tesseract
- 📋 Human-friendly logging
- 🛡️ Consistent error handling with
SandyieException
📦 Installation
# Upgrade pip and setuptools first
python -m pip install --upgrade pip setuptools
# Clear old caches (optional)
pip cache purge
# Install sandyie_read
pip install sandyie_read
🚀 Quick Start
from sandyie_read import read
# Example: Reading a PDF
data = read("example.pdf", pages = [0])
print(data)
📁 Supported File Types & Examples
1. 📄 Pickle / Model Files
data = read("model.pkl")
print(data)
🟢 Returns: A Python object / model container.
2. 🖼️ Images (PNG, JPG, SVG)
data = read("photo.jpg")
print(data)
🟢 Returns: OCR-extracted text as a string or NumPy array.
3. 📊 Parquet
data = read("data.parquet")
print(data)
🟢 Returns: pandas.DataFrame.
4. 📊 CSV / Excel
data = read("data.csv")
print(data)
🟢 Returns: pandas.DataFrame.
⚠️ Error Handling
All exceptions are wrapped in a custom SandyieException, making debugging simple and consistent.
🧪 Logging
Logs include:
- File type detection
- Success/failure reports
- Processing details
📚 Documentation
📖 Full documentation (with API reference and usage notebooks) will be available soon at 👉 sandyie.in/docs
🗺️ Roadmap
- Cloud Storage Support: Read files directly from S3, Azure Blob, and Google Cloud Storage.
- Streaming Files: Process large files without loading the entire content into memory.
- Improved Performance: Optimize parsing for various file formats.
🤝 Contributing
Got an idea or found a bug?
- Open an Issue
- Or submit a Pull Request 🚀
📄 License
Licensed under the MIT License. See LICENSE for details.
👤 Author
Sanju (aka Sandyie)
- 🌐 Website: www.sandyie.in
- 📧 Email: business@sandyie.in
- 🐍 PyPI: sandyie-read
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sandyie_read-1.2.1.tar.gz.
File metadata
- Download URL: sandyie_read-1.2.1.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83eda23496ab5ee7a02f3d2667de0774edf9d307b3d42118347a5433baf9bdb9
|
|
| MD5 |
3a7b7382ca36057a368ab71f81ee3c43
|
|
| BLAKE2b-256 |
e6bb241f3fc4931c1e11e313d226cb0c91bcc58ff80a4891a23acc19955a7a2c
|
File details
Details for the file sandyie_read-1.2.1-py3-none-any.whl.
File metadata
- Download URL: sandyie_read-1.2.1-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
437259bef5fa87b7191efaf6054ba0b9f612652d7839c5d19e5fb638aaef833b
|
|
| MD5 |
07a689310f24ac177b97f9dad6aa273b
|
|
| BLAKE2b-256 |
57f4dca67b63744b33076d0a54e9955d3c4624a62fcfb6cb16b136b09c7d6805
|