E-commerce data extraction and processing platform with AI-powered enrichment
Project description
Zerve Data Platform
An enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.
Features
- Multi-stage Pipeline Framework - Orchestrate complex ETL workflows with checkpointing and progress tracking
- Web Scraping Automation - Selenium-based browser automation for e-commerce sites
- AI-Powered Data Enrichment - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)
- Cloud Integration - AWS S3 and Spark data lake support
- Database Connectors - PostgreSQL and Spark SQL with auto-schema generation
- Distributed Processing - Apache Spark for big data ETL workflows
Installation
Development Installation
# Clone the repository
git clone https://github.com/zerveme/zervemedata.git
cd zervedataplatform
# Install in editable mode with development dependencies
pip install -e ".[dev]"
Production Installation
pip install zervedataplatform
Quick Start
Import the package
from pipeline import DataPipeline, DataConnectorBase
from connectors.ai import GenAIManager
from connectors.sql_connectors import PostgresSqlConnector
from connectors.cloud_storage_connectors import S3CloudConnector
from utils import Utility
# Configure your pipeline
config = Utility.read_in_json_file("config.json")
# Create AI connector
ai_manager = GenAIManager(config["ai_config"])
# Create database connector
db = PostgresSqlConnector(config["db_config"])
# Create and run pipeline
pipeline = DataPipeline()
# ... add your jobs
pipeline.run_data_pipeline()
Architecture
zervedataplatform/
├── abstractions/ # Abstract base classes and interfaces
├── connectors/ # Database, cloud, and AI connectors
│ ├── ai/ # OpenAI, Gemini, LangChain, Google Vision
│ ├── sql_connectors/ # PostgreSQL, Spark SQL
│ └── cloud_storage_connectors/ # S3, Spark Cloud
├── pipeline/ # Pipeline orchestration framework
├── model_transforms/ # Database models and schemas
├── utils/ # Utilities and helpers
└── test/ # Unit tests
Key Components
Pipeline Framework
- 5-Stage Execution:
initialize → pre_validate → read → main → output - Activity Logging: JSON-based progress tracking with hierarchical structure
- Checkpoint/Resume: Resume long-running pipelines from failure points
AI Connectors
- Multi-Provider Support: OpenAI, Google Gemini, Ollama (local), HuggingFace
- Unified Interface: LangChain abstraction layer
- Auto-Detection: Configuration-driven provider selection
Data Processing
- Spark Integration: Distributed processing for large datasets
- Pandas/Spark: Seamless DataFrame conversions
- ETL Utilities: High-level operations for common ETL tasks
Configuration
Create configuration files in default_configs/:
// configuration.json
{
"db_config": "default_configs/db_config.json",
"run_config": "default_configs/run.json",
"ai_api_config": "default_configs/google_api_config.json",
"web_config": "default_configs/web_config.json",
"cloud_config": "default_configs/s3_config.json"
}
See the default_configs/ directory for configuration examples.
Requirements
- Python 3.11+
- Apache Spark 3.5.2
- PostgreSQL (optional, for SQL connector)
- AWS credentials (optional, for S3 connector)
- Google Cloud credentials (optional, for Vision API)
Development
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=. --cov-report=html
# Format code
black .
# Lint code
flake8
License
Proprietary - © 2025 Zerveme
Support
For issues and questions, please contact: support@zerveme.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zervedataplatform-0.1.1.tar.gz.
File metadata
- Download URL: zervedataplatform-0.1.1.tar.gz
- Upload date:
- Size: 91.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
702631b84637d7ac5632d0a6f870b9c7e0356799db4318818cfe2b8f5a105bdf
|
|
| MD5 |
1bbc803a4f3c9c3bc91dea0fc45ca2c4
|
|
| BLAKE2b-256 |
a934cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823
|
File details
Details for the file zervedataplatform-0.1.1-py3-none-any.whl.
File metadata
- Download URL: zervedataplatform-0.1.1-py3-none-any.whl
- Upload date:
- Size: 74.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f18094ea05f49af1d3a8fcc02d9b66a9db8714efe643f1a468a5985b6a556a03
|
|
| MD5 |
c4167fa01639f1bc97b77c0082a781db
|
|
| BLAKE2b-256 |
7960edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7
|