Skip to main content

Enterprise-grade Headless ETL Engine with Interactive UI

Project description

⚡ PyQuery Core: The Engine ⚙️

Pure Logic. Zero Fluff. The Backend of the Data OS.

Execution Speed Privacy Stack

PyPI Version Python Versions License

Core Engine

🧠 The Brain Behind the Operation

PyQuery Core is the headless, high-performance ETL and Analytics engine that powers the PyQuery ecosystem.

Previously hidden inside a generic monorepo, it has now been extracted into its own pure-python library. It handles the heavy lifting: File I/O, Data Transformation, Statistical Analytics, and Machine Learning.

It has no UI. It has no CLI. It is just raw, unadulterated Polars power wrapped in a strict, type-safe architecture.


⚡ Key Features

  • 🚀 Lazy-First Architecture: Built on Polars LazyFrames. Nothing executes until you say so.
  • 🛡️ Strict Type Safety: Every transform, every parameter, and every I/O operation is validated with Pydantic models. No more stringly-typed chaos.
  • 🔌 Universal I/O:
    • Readers: CSV, Parquet, Excel, JSON, IPC.
    • Healers: Auto-detects encoding issues and "heals" broken CSVs on the fly.
  • 🧪 Analytics Module:
    • Built-in scikit-learn integration for Clustering and Regression.
    • Automatic "What-If" simulation engines.
  • 🔧 Transform Registry: A modular plugin system for registering data transformation steps.

📦 Installation

pip install pyquery-core

💻 Usage (The SDK)

This is a library for builders. Use it to construct your own data pipelines.

1. The Engine

The PyQueryEngine is the orchestrator.

from pyquery_core.core import PyQueryEngine
from pyquery_core.io.files import FileLoader

# Initialize
engine = PyQueryEngine()

# Load Data (Lazy)
df = FileLoader.read_csv("massive_data.csv")

# Register a Pipeline
pipeline = [
    {"type": "filter", "params": {"column": "revenue", "operator": ">", "value": 1000}},
    {"type": "group_by", "params": {"by": "region", "agg": {"revenue": "sum"}}}
]

# Execute
result = engine.run(df, pipeline)
print(result.collect())

2. Analytics & ML

Run complex statistical analysis without the boilerplate.

from pyquery_core.analytics.ml import ClusterEngine

# Auto-Clustering
model = ClusterEngine(data=df, n_clusters=3)
segments = model.fit_predict()
print(segments)

📂 Architecture

The library is structured for modularity:

Module Description
pyquery_core.io Input/Output. Smart loaders for Excel, CSV, Parquet, and SQL.
pyquery_core.transforms Logic. Atomic data manipulation steps (Filter, Sort, Mutate).
pyquery_core.analytics Intelligence. Statistical tests, ML models, and forecasting.
pyquery_core.recipes Orchestration. JSON-serializable pipeline definitions.
pyquery_core.jobs Async Workers. Background task management for long-running ops.

🤝 Contributing

This is the Core. Code quality here is paramount.

  1. Fork it.
  2. Branch it (git checkout -b feature/fancy-algo).
  3. Test it. (If it breaks the engine, we break your PR).
  4. Push it.

📜 License

GPL-3.0. Open source forever. 💖


Made with ☕, 🦀 (Rust), and 💖 by Sudharshan TK

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyquery_core-5.0.0b3.tar.gz (67.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyquery_core-5.0.0b3-py3-none-any.whl (90.0 kB view details)

Uploaded Python 3

File details

Details for the file pyquery_core-5.0.0b3.tar.gz.

File metadata

  • Download URL: pyquery_core-5.0.0b3.tar.gz
  • Upload date:
  • Size: 67.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for pyquery_core-5.0.0b3.tar.gz
Algorithm Hash digest
SHA256 baa7cf7b7da8304a70ea4099442f19038e2e95c24419f9e1dd3c50cf42f448f1
MD5 df9958486656d8c2f5631bbd6a1c9a92
BLAKE2b-256 64bb10b2bca0218d5a3fb8fe25d7147b82bf7af6e5af99c2e6f4534bfd1b5cbd

See more details on using hashes here.

File details

Details for the file pyquery_core-5.0.0b3-py3-none-any.whl.

File metadata

File hashes

Hashes for pyquery_core-5.0.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 572b6d2c7d59c84cd160ce4eeb8f96e9bdb15f0060a43b752f9751d65ffe7b73
MD5 ecd1ad50c6907be0df0e6769cd33e40e
BLAKE2b-256 979e3a98a4cb98f245bd69095e867dfe06ef584ebc62a1db2e64844e9ab0aa3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page