Skip to main content

E-commerce data extraction and processing platform with AI-powered enrichment

Project description

Zerve Data Platform

An enterprise-grade ETL and data processing platform for automated e-commerce data extraction, AI-powered enrichment, and pipeline orchestration.

Features

  • Multi-stage Pipeline Framework - Orchestrate complex ETL workflows with checkpointing and progress tracking
  • Web Scraping Automation - Selenium-based browser automation for e-commerce sites
  • AI-Powered Data Enrichment - Multiple LLM provider support (OpenAI, Google Gemini, Ollama, HuggingFace)
  • Cloud Integration - AWS S3 and Spark data lake support
  • Database Connectors - PostgreSQL and Spark SQL with auto-schema generation
  • Distributed Processing - Apache Spark for big data ETL workflows

Installation

Development Installation

# Clone the repository
git clone https://github.com/zerveme/zervemedata.git
cd zervedataplatform

# Install in editable mode with development dependencies
pip install -e ".[dev]"

Production Installation

pip install zervedataplatform

Quick Start

Import the package

from pipeline import DataPipeline, DataConnectorBase
from connectors.ai import GenAIManager
from connectors.sql_connectors import PostgresSqlConnector
from connectors.cloud_storage_connectors import S3CloudConnector
from utils import Utility

# Configure your pipeline
config = Utility.read_in_json_file("config.json")

# Create AI connector
ai_manager = GenAIManager(config["ai_config"])

# Create database connector
db = PostgresSqlConnector(config["db_config"])

# Create and run pipeline
pipeline = DataPipeline()
# ... add your jobs
pipeline.run_data_pipeline()

Architecture

zervedataplatform/
├── abstractions/          # Abstract base classes and interfaces
├── connectors/           # Database, cloud, and AI connectors
│   ├── ai/              # OpenAI, Gemini, LangChain, Google Vision
│   ├── sql_connectors/  # PostgreSQL, Spark SQL
│   └── cloud_storage_connectors/  # S3, Spark Cloud
├── pipeline/            # Pipeline orchestration framework
├── model_transforms/    # Database models and schemas
├── utils/              # Utilities and helpers
└── test/               # Unit tests

Key Components

Pipeline Framework

  • 5-Stage Execution: initialize → pre_validate → read → main → output
  • Activity Logging: JSON-based progress tracking with hierarchical structure
  • Checkpoint/Resume: Resume long-running pipelines from failure points

AI Connectors

  • Multi-Provider Support: OpenAI, Google Gemini, Ollama (local), HuggingFace
  • Unified Interface: LangChain abstraction layer
  • Auto-Detection: Configuration-driven provider selection

Data Processing

  • Spark Integration: Distributed processing for large datasets
  • Pandas/Spark: Seamless DataFrame conversions
  • ETL Utilities: High-level operations for common ETL tasks

Configuration

Create configuration files in default_configs/:

// configuration.json
{
  "db_config": "default_configs/db_config.json",
  "run_config": "default_configs/run.json",
  "ai_api_config": "default_configs/google_api_config.json",
  "web_config": "default_configs/web_config.json",
  "cloud_config": "default_configs/s3_config.json"
}

See the default_configs/ directory for configuration examples.

Requirements

  • Python 3.11+
  • Apache Spark 3.5.2
  • PostgreSQL (optional, for SQL connector)
  • AWS credentials (optional, for S3 connector)
  • Google Cloud credentials (optional, for Vision API)

Development

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=. --cov-report=html

# Format code
black .

# Lint code
flake8

License

Proprietary - © 2025 Zerveme

Support

For issues and questions, please contact: support@zerveme.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zervedataplatform-0.1.1.tar.gz (91.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zervedataplatform-0.1.1-py3-none-any.whl (74.0 kB view details)

Uploaded Python 3

File details

Details for the file zervedataplatform-0.1.1.tar.gz.

File metadata

  • Download URL: zervedataplatform-0.1.1.tar.gz
  • Upload date:
  • Size: 91.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for zervedataplatform-0.1.1.tar.gz
Algorithm Hash digest
SHA256 702631b84637d7ac5632d0a6f870b9c7e0356799db4318818cfe2b8f5a105bdf
MD5 1bbc803a4f3c9c3bc91dea0fc45ca2c4
BLAKE2b-256 a934cb467ef0b1676ef7ced6c6e799673dd127bc912f622ed310cca09cb90823

See more details on using hashes here.

File details

Details for the file zervedataplatform-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for zervedataplatform-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f18094ea05f49af1d3a8fcc02d9b66a9db8714efe643f1a468a5985b6a556a03
MD5 c4167fa01639f1bc97b77c0082a781db
BLAKE2b-256 7960edb7c4aabf91f126a888d626d2ac6f33e279f6ff4134fb371e516dc6fad7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page