Enterprise-grade Headless ETL Engine with Interactive UI
Project description
⚡ PyQuery Core: The Engine ⚙️
Pure Logic. Zero Fluff. The Backend of the Data OS.
🧠 The Brain Behind the Operation
PyQuery Core is the headless, high-performance ETL and Analytics engine that powers the PyQuery ecosystem.
Previously hidden inside a generic monorepo, it has now been extracted into its own pure-python library. It handles the heavy lifting: File I/O, Data Transformation, Statistical Analytics, and Machine Learning.
It has no UI. It has no CLI. It is just raw, unadulterated Polars power wrapped in a strict, type-safe architecture.
⚡ Key Features
- 🚀 Lazy-First Architecture: Built on Polars LazyFrames. Nothing executes until you say so.
- 🛡️ Strict Type Safety: Every transform, every parameter, and every I/O operation is validated with Pydantic models. No more stringly-typed chaos.
- 🔌 Universal I/O:
- Readers: CSV, Parquet, Excel, JSON, IPC.
- Healers: Auto-detects encoding issues and "heals" broken CSVs on the fly.
- 🧪 Analytics Module:
- Built-in
scikit-learnintegration for Clustering and Regression. - Automatic "What-If" simulation engines.
- Built-in
- 🔧 Transform Registry: A modular plugin system for registering data transformation steps.
📦 Installation
pip install pyquery-core
💻 Usage (The SDK)
This is a library for builders. Use it to construct your own data pipelines.
1. The Engine
The PyQueryEngine is the orchestrator.
from pyquery_core.core import PyQueryEngine
from pyquery_core.io.files import FileLoader
# Initialize
engine = PyQueryEngine()
# Load Data (Lazy)
df = FileLoader.read_csv("massive_data.csv")
# Register a Pipeline
pipeline = [
{"type": "filter", "params": {"column": "revenue", "operator": ">", "value": 1000}},
{"type": "group_by", "params": {"by": "region", "agg": {"revenue": "sum"}}}
]
# Execute
result = engine.run(df, pipeline)
print(result.collect())
2. Analytics & ML
Run complex statistical analysis without the boilerplate.
from pyquery_core.analytics.ml import ClusterEngine
# Auto-Clustering
model = ClusterEngine(data=df, n_clusters=3)
segments = model.fit_predict()
print(segments)
📂 Architecture
The library is structured for modularity:
| Module | Description |
|---|---|
pyquery_core.io |
Input/Output. Smart loaders for Excel, CSV, Parquet, and SQL. |
pyquery_core.transforms |
Logic. Atomic data manipulation steps (Filter, Sort, Mutate). |
pyquery_core.analytics |
Intelligence. Statistical tests, ML models, and forecasting. |
pyquery_core.recipes |
Orchestration. JSON-serializable pipeline definitions. |
pyquery_core.jobs |
Async Workers. Background task management for long-running ops. |
🤝 Contributing
This is the Core. Code quality here is paramount.
- Fork it.
- Branch it (
git checkout -b feature/fancy-algo). - Test it. (If it breaks the engine, we break your PR).
- Push it.
📜 License
GPL-3.0. Open source forever. 💖
Made with ☕, 🦀 (Rust), and 💖 by Sudharshan TK
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyquery_core-5.0.0b3.tar.gz.
File metadata
- Download URL: pyquery_core-5.0.0b3.tar.gz
- Upload date:
- Size: 67.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
baa7cf7b7da8304a70ea4099442f19038e2e95c24419f9e1dd3c50cf42f448f1
|
|
| MD5 |
df9958486656d8c2f5631bbd6a1c9a92
|
|
| BLAKE2b-256 |
64bb10b2bca0218d5a3fb8fe25d7147b82bf7af6e5af99c2e6f4534bfd1b5cbd
|
File details
Details for the file pyquery_core-5.0.0b3-py3-none-any.whl.
File metadata
- Download URL: pyquery_core-5.0.0b3-py3-none-any.whl
- Upload date:
- Size: 90.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
572b6d2c7d59c84cd160ce4eeb8f96e9bdb15f0060a43b752f9751d65ffe7b73
|
|
| MD5 |
ecd1ad50c6907be0df0e6769cd33e40e
|
|
| BLAKE2b-256 |
979e3a98a4cb98f245bd69095e867dfe06ef584ebc62a1db2e64844e9ab0aa3a
|