E-commerce data extraction and processing platform with AI-powered enrichment

These details have not been verified by PyPI

Project links

Project description

Zerve Data Platform

An enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.

Features

Multi-stage Pipeline Framework - Orchestrate complex ETL workflows with checkpointing and progress tracking
Web Scraping Automation - Selenium-based browser automation for e-commerce sites
AI-Powered Data Enrichment - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)
Cloud Integration - AWS S3 and Spark data lake support
Database Connectors - PostgreSQL and Spark SQL with auto-schema generation
Distributed Processing - Apache Spark for big data ETL workflows

Installation

Development Installation

# Clone the repository
git clone https://github.com/zerveme/zervemedata.git
cd zervedataplatform

# Install in editable mode with development dependencies
pip install -e ".[dev]"

Production Installation

pip install zervedataplatform

Quick Start

Import the package

from pipeline import DataPipeline, DataConnectorBase
from connectors.ai import GenAIManager
from connectors.sql_connectors import PostgresSqlConnector
from connectors.cloud_storage_connectors import S3CloudConnector
from utils import Utility

# Configure your pipeline
config = Utility.read_in_json_file("config.json")

# Create AI connector
ai_manager = GenAIManager(config["ai_config"])

# Create database connector
db = PostgresSqlConnector(config["db_config"])

# Create and run pipeline
pipeline = DataPipeline()
# ... add your jobs
pipeline.run_data_pipeline()

Architecture

zervedataplatform/
├── abstractions/          # Abstract base classes and interfaces
├── connectors/           # Database, cloud, and AI connectors
│   ├── ai/              # OpenAI, Gemini, LangChain, Google Vision
│   ├── sql_connectors/  # PostgreSQL, Spark SQL
│   └── cloud_storage_connectors/  # S3, Spark Cloud
├── pipeline/            # Pipeline orchestration framework
├── model_transforms/    # Database models and schemas
├── utils/              # Utilities and helpers
└── test/               # Unit tests

Key Components

Pipeline Framework

5-Stage Execution: initialize → pre_validate → read → main → output
Activity Logging: JSON-based progress tracking with hierarchical structure
Checkpoint/Resume: Resume long-running pipelines from failure points

AI Connectors

Multi-Provider Support: OpenAI, Google Gemini, Ollama (local), HuggingFace
Unified Interface: LangChain abstraction layer
Auto-Detection: Configuration-driven provider selection

Data Processing

Spark Integration: Distributed processing for large datasets
Pandas/Spark: Seamless DataFrame conversions
ETL Utilities: High-level operations for common ETL tasks

Configuration

Create configuration files in default_configs/:

// configuration.json
{
  "db_config": "default_configs/db_config.json",
  "run_config": "default_configs/run.json",
  "ai_api_config": "default_configs/google_api_config.json",
  "web_config": "default_configs/web_config.json",
  "cloud_config": "default_configs/s3_config.json"
}

See the default_configs/ directory for configuration examples.

Requirements

Python 3.11+
Apache Spark 3.5.2
PostgreSQL (optional, for SQL connector)
AWS credentials (optional, for S3 connector)
Google Cloud credentials (optional, for Vision API)

Development

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=. --cov-report=html

# Format code
black .

# Lint code
flake8

License

Support

For issues and questions, please contact: support@zerveme.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Oct 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zervedataplatform-0.1.1.tar.gz (91.8 kB view details)

Uploaded Oct 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zervedataplatform-0.1.1-py3-none-any.whl (74.0 kB view details)

Uploaded Oct 25, 2025 Python 3

File details

Details for the file zervedataplatform-0.1.1.tar.gz.

File metadata

Download URL: zervedataplatform-0.1.1.tar.gz
Upload date: Oct 25, 2025
Size: 91.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for zervedataplatform-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`702631b84637d7ac5632d0a6f870b9c7e0356799db4318818cfe2b8f5a105bdf`
MD5	`1bbc803a4f3c9c3bc91dea0fc45ca2c4`
BLAKE2b-256	`a934cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823`

See more details on using hashes here.

File details

Details for the file zervedataplatform-0.1.1-py3-none-any.whl.

File metadata

Download URL: zervedataplatform-0.1.1-py3-none-any.whl
Upload date: Oct 25, 2025
Size: 74.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for zervedataplatform-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f18094ea05f49af1d3a8fcc02d9b66a9db8714efe643f1a468a5985b6a556a03`
MD5	`c4167fa01639f1bc97b77c0082a781db`
BLAKE2b-256	`7960edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7`

See more details on using hashes here.

zervedataplatform 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Zerve Data Platform

Features

Installation

Development Installation

Production Installation

Quick Start

Import the package

Architecture

Key Components

Pipeline Framework

AI Connectors

Data Processing

Configuration

Requirements

Development

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes