A standardized interface for data providers with sync and async support
Project description
Data Retrieval Module
A standardized interface for data providers with both synchronous and asynchronous support. This module provides abstract base classes that enable consistent data retrieval patterns across different data sources (APIs, databases, files, etc.).
Features
- 🔄 Dual API Support: Both sync and async interfaces
- 🏗️ Abstract Base Classes: Standardized patterns for data providers
- 🔌 Connection Management: Built-in connection handling with context managers
- 🔄 Retry Logic: Automatic retry with configurable parameters
- 📊 Pagination Support: Standardized pagination with QueryResult
- 🎣 Hook Methods: Customizable validation and transformation
- 🧪 Type Safety: Full type hints and generic support
- ✅ Well Tested: Comprehensive unit test coverage
Installation
Basic Installation
pip install data-retrieval-module
With Async Support
pip install data-retrieval-module[async]
Development Installation
pip install data-retrieval-module[dev]
All Features
pip install data-retrieval-module[all]
Quick Start
Synchronous Data Provider
from data_retrieval import DataProvider, QueryResult
from data_retrieval.model import ProviderStatus
class UserProvider(DataProvider[User]):
def _connect(self) -> None:
self._db = Database.connect(...)
def _disconnect(self) -> None:
self._db.close()
def fetch(self, *args, **kwargs) -> QueryResult[User]:
filters = kwargs.get("filters", {})
users = self._db.users.find(filters)
return QueryResult(
data=users,
total_count=len(users),
metadata={"source": "database"}
)
# Usage
provider = UserProvider()
with provider.connection(host="localhost", port=5432):
result = provider.fetch(filters={"active": True})
for user in result.data:
print(user.name)
Asynchronous Data Provider
from data_retrieval import AsyncDataProvider
class AsyncUserProvider(AsyncDataProvider[User]):
async def _connect(self) -> None:
self._db = await Database.connect(...)
async def _disconnect(self) -> None:
await self._db.close()
async def fetch(self, *args, **kwargs) -> QueryResult[User]:
filters = kwargs.get("filters", {})
users = await self._db.users.find(filters)
return QueryResult(
data=users,
total_count=len(users),
metadata={"source": "database"}
)
# Usage
async def main():
provider = AsyncUserProvider()
async with provider.async_connection(host="localhost", port=5432) as p:
result = await p.fetch(filters={"active": True})
for user in result.data:
print(user.name)
Core Classes
DataProvider (Synchronous)
Abstract base class for synchronous data providers.
Key Methods:
connect(**config)- Establish connectiondisconnect()- Close connectionfetch(*args, **kwargs)- Retrieve datafetch_or_raise(*args, **kwargs)- Fetch with error handlingwith_retry(operation, max_retries, retry_delay)- Retry logic
Hook Methods:
validate(data)- Validate datatransform(data)- Transform datahealth_check()- Health status
AsyncDataProvider (Asynchronous)
Abstract base class for asynchronous data providers.
Key Methods:
async connect(**config)- Establish connectionasync disconnect()- Close connectionasync fetch(*args, **kwargs)- Retrieve dataasync fetch_or_raise(*args, **kwargs)- Fetch with error handlingasync with_retry(operation, max_retries, retry_delay)- Retry logic
QueryResult
Standardized container for query results.
@dataclass
class QueryResult[T]:
data: List[T]
total_count: int
metadata: Dict[str, Any]
def is_empty(self) -> bool:
return self.total_count == 0
Advanced Usage
Custom Validation
class ValidatedProvider(DataProvider[User]):
def validate(self, data: User) -> bool:
# Custom validation logic
return data.email and "@" in data.email
Data Transformation
class TransformingProvider(DataProvider[User]):
def transform(self, data: dict) -> User:
# Convert raw data to User object
return User(**data)
Retry Logic
provider = MyProvider()
# Retry with custom parameters
result = provider.with_retry(
operation=lambda: provider.fetch(filters={"id": "123"}),
max_retries=5,
retry_delay=2.0,
parameters={}
)
Context Managers
# Automatic connection management
with provider.connection(host="localhost") as p:
data = p.fetch()
# Async version
async with provider.async_connection(host="localhost") as p:
data = await p.fetch()
Error Handling
The module provides specific exception types:
from data_retrieval.model.exceptions import (
DataProviderError,
ConnectionError,
QueryError,
ValidationError
)
try:
result = provider.fetch(filters={"invalid": "field"})
except ConnectionError as e:
print(f"Connection failed: {e}")
except QueryError as e:
print(f"Query failed: {e}")
except DataProviderError as e:
print(f"General error: {e}")
Development
Setup Development Environment
# Clone repository
git clone https://github.com/AbigailWilliams1692/data-retrieval-module.git
cd data-retrieval-module
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -e .[dev]
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=data_retrieval --cov-report=html
# Run specific test file
pytest tests/test_data_provider.py
Code Quality
# Format code
black data_retrieval/ tests/
# Sort imports
isort data_retrieval/ tests/
# Type checking
mypy data_retrieval/
# Linting
flake8 data_retrieval/ tests/
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a list of changes and version history.
Support
Related Projects
- SQLAlchemy - SQL toolkit and ORM
- Django ORM - Django's database ORM
- Tortoise ORM - Async ORM for Python
Made with ❤️ by AbigailWilliams1692
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_retrieval_module-1.0.1.tar.gz.
File metadata
- Download URL: data_retrieval_module-1.0.1.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5111b972193eb11f3ad7138e0c38bc860e3ae2aa68fdfb48ad679bd80bebf8d2
|
|
| MD5 |
76772e831a58cac895f3bc2f2bcf746a
|
|
| BLAKE2b-256 |
7ffb78f44303769e0d9028ac843438d3a1c8c07fb25bb57adfe4fc86f1a49e99
|
Provenance
The following attestation bundles were made for data_retrieval_module-1.0.1.tar.gz:
Publisher:
publish.yml on AbigailWilliams1692/data_retrieval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_retrieval_module-1.0.1.tar.gz -
Subject digest:
5111b972193eb11f3ad7138e0c38bc860e3ae2aa68fdfb48ad679bd80bebf8d2 - Sigstore transparency entry: 852301780
- Sigstore integration time:
-
Permalink:
AbigailWilliams1692/data_retrieval@dba8a981130ef7cbeaead98e19d536fda84f4668 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/AbigailWilliams1692
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dba8a981130ef7cbeaead98e19d536fda84f4668 -
Trigger Event:
release
-
Statement type:
File details
Details for the file data_retrieval_module-1.0.1-py3-none-any.whl.
File metadata
- Download URL: data_retrieval_module-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6916a8b587fb4bdb21c2c134d89f23bd8323510b3ab8f68080cdedc8bb604a3a
|
|
| MD5 |
4963f55edc82833f437c8511e9572651
|
|
| BLAKE2b-256 |
6b3bd841a87476769ac629817e3bc939a7c9870e2009d96c82f4ffdae93b8059
|
Provenance
The following attestation bundles were made for data_retrieval_module-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on AbigailWilliams1692/data_retrieval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_retrieval_module-1.0.1-py3-none-any.whl -
Subject digest:
6916a8b587fb4bdb21c2c134d89f23bd8323510b3ab8f68080cdedc8bb604a3a - Sigstore transparency entry: 852301831
- Sigstore integration time:
-
Permalink:
AbigailWilliams1692/data_retrieval@dba8a981130ef7cbeaead98e19d536fda84f4668 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/AbigailWilliams1692
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dba8a981130ef7cbeaead98e19d536fda84f4668 -
Trigger Event:
release
-
Statement type: