Skip to main content

Enterprise-grade Headless ETL Engine with Interactive UI

Project description

⚡ PyQuery Core: The Engine ⚙️

Pure Logic. Zero Fluff. The Backend of the Data OS.

Execution Speed Privacy Stack

PyPI Version Python Versions License

Core Engine

🧠 The Brain Behind the Operation

PyQuery Core is the headless, high-performance ETL and Analytics engine that powers the PyQuery ecosystem.

Previously hidden inside a generic monorepo, it has now been extracted into its own pure-python library. It handles the heavy lifting: File I/O, Data Transformation, Statistical Analytics, and Machine Learning.

It has no UI. It has no CLI. It is just raw, unadulterated Polars power wrapped in a strict, type-safe architecture.


⚡ Key Features

  • 🚀 Lazy-First Architecture: Built on Polars LazyFrames. Nothing executes until you say so.
  • 🛡️ Strict Type Safety: Every transform, every parameter, and every I/O operation is validated with Pydantic models. No more stringly-typed chaos.
  • 🔌 Universal I/O:
    • Readers: CSV, Parquet, Excel, JSON, IPC.
    • Healers: Auto-detects encoding issues and "heals" broken CSVs on the fly.
  • 🧪 Analytics Module:
    • Built-in scikit-learn integration for Clustering and Regression.
    • Automatic "What-If" simulation engines.
  • 🔧 Transform Registry: A modular plugin system for registering data transformation steps.

📦 Installation

pip install pyquery-core

💻 Usage (The SDK)

This is a library for builders. Use it to construct your own data pipelines.

1. The Engine

The PyQueryEngine is the orchestrator.

from pyquery_core.core import PyQueryEngine
from pyquery_core.io.files import FileLoader

# Initialize
engine = PyQueryEngine()

# Load Data (Lazy)
df = FileLoader.read_csv("massive_data.csv")

# Register a Pipeline
pipeline = [
    {"type": "filter", "params": {"column": "revenue", "operator": ">", "value": 1000}},
    {"type": "group_by", "params": {"by": "region", "agg": {"revenue": "sum"}}}
]

# Execute
result = engine.run(df, pipeline)
print(result.collect())

2. Analytics & ML

Run complex statistical analysis without the boilerplate.

from pyquery_core.analytics.ml import ClusterEngine

# Auto-Clustering
model = ClusterEngine(data=df, n_clusters=3)
segments = model.fit_predict()
print(segments)

📂 Architecture

The library is structured for modularity:

Module Description
pyquery_core.io Input/Output. Smart loaders for Excel, CSV, Parquet, and SQL.
pyquery_core.transforms Logic. Atomic data manipulation steps (Filter, Sort, Mutate).
pyquery_core.analytics Intelligence. Statistical tests, ML models, and forecasting.
pyquery_core.recipes Orchestration. JSON-serializable pipeline definitions.
pyquery_core.jobs Async Workers. Background task management for long-running ops.

🤝 Contributing

This is the Core. Code quality here is paramount.

  1. Fork it.
  2. Branch it (git checkout -b feature/fancy-algo).
  3. Test it. (If it breaks the engine, we break your PR).
  4. Push it.

📜 License

GPL-3.0. Open source forever. 💖


Made with ☕, 🦀 (Rust), and 💖 by Sudharshan TK

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyquery_core-5.0.0b7.tar.gz (109.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyquery_core-5.0.0b7-py3-none-any.whl (90.0 kB view details)

Uploaded Python 3

File details

Details for the file pyquery_core-5.0.0b7.tar.gz.

File metadata

  • Download URL: pyquery_core-5.0.0b7.tar.gz
  • Upload date:
  • Size: 109.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for pyquery_core-5.0.0b7.tar.gz
Algorithm Hash digest
SHA256 bb3a8a788cc1859ca7978a752c9d45c7fe0ca5b5b0cf2ef5c362eae226322b06
MD5 461722034fa2fb22e9728edeef5478ce
BLAKE2b-256 68db69ce65ced8ebf05c84d1fe04f02df3761d92b8ed0c5872dfda9cc247e165

See more details on using hashes here.

File details

Details for the file pyquery_core-5.0.0b7-py3-none-any.whl.

File metadata

File hashes

Hashes for pyquery_core-5.0.0b7-py3-none-any.whl
Algorithm Hash digest
SHA256 516c5794741809decce14bca0c14dcf943d207f572056f2472da5567b62cf733
MD5 af013009ac736e4f9644fcc7fa9ec0a7
BLAKE2b-256 65b59a399ec6cdffa00c98548ccfff60bbe0ad9323b29a6eee08c605296c6c7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page