Deterministic adaptive extraction runtime
Project description
ixtract
Deterministic Adaptive Extraction Runtime
ixtract extracts data from databases to files with automatic parallelism optimization, feedback-driven learning, and full explainability. It treats extraction as a closed-loop control problem — not static configuration.
pip install ixtract
The problem
You tune worker counts by hand. You guess at chunk sizes. When throughput drops, you don't know why. When a job takes twice as long as yesterday, there's no explanation — just a number that got worse.
ixtract was built to answer the questions data engineers actually ask:
- Why is this job slower today than yesterday?
- How many workers should I actually use?
- Why does adding more workers make it worse?
- Am I overloading my source database?
- Is it safe to run this during business hours?
- Can I trust this to run unattended?
Quickstart
pip install ixtract
from ixtract import plan, execute, ExtractionIntent
intent = ExtractionIntent(
source_type="postgresql",
source_config={"host": "localhost", "database": "mydb", "user": "app"},
object_name="orders",
)
result = plan(intent)
if result.is_safe:
execution = execute(result)
print(f"{execution.rows_extracted:,} rows in {execution.duration_seconds:.1f}s")
Or use the CLI:
# Profile your source (run once before first extraction)
ixtract profile orders --database mydb --user app
# Preview the plan
ixtract plan orders --database mydb --user app
# Extract
ixtract execute orders --database mydb --user app --output ./data
# Check health before next run
ixtract inspect orders
How it works
Profile → Plan → Execute → Learn
Run 1: Profile source → plan with profiler recommendation → extract → record
Run 2: Controller adjusts workers based on throughput signal
Run 3+: Controller converges toward optimal worker count
Every: Deviation diagnosed, anomaly detected, results explained
The controller uses direction-aware hill-climbing: if adding a worker helped, try adding another. If it hurt, reverse. Converges to near-optimal in ≤5 runs under stable conditions.
CLI Commands
ixtract profile <table> --database <db> # Profile source
ixtract plan <table> --database <db> # Preview plan
ixtract execute <table> --database <db> --output <dir> # Extract
ixtract inspect <table> # Health check
ixtract diagnose --object <table> # Diagnose last run
ixtract explain --object <table> # Explain last run
ixtract history <table> # Run history
ixtract metrics --object <table> # Run metrics
ixtract benchmark <table> --database <db> # Calibrate throughput
ixtract replay --run-id <id> --database <db> # Replay exact run
inspect — operational health
$ ixtract inspect orders
Inspect: orders (postgresql)
Last Run rx-20260414-173011 11.7s 856K/s SUCCESS
Controller CONVERGED at 8 workers stable (drift -3.8%)
Anomalies None
Profile ✔ up-to-date
Health
HEALTHY ✔
System is stable and operating within expected bounds.
Exit codes are a contract:
ixtract inspect orders
echo $? # 0=HEALTHY 1=NEEDS ATTENTION 2=DEGRADED
diagnose — root cause analysis
$ ixtract diagnose --object events
Skew Analysis:
Severity: Severe (43.2x max/median)
Slowest: chunk_001 (2.07s, 1,502,847 rows)
Fastest: chunk_006 (0.03s, 10,241 rows)
Suggestion:
→ Severe skew detected. Work-stealing is active and mitigating.
→ Consider density-aware chunking for a permanent fix.
replay — deterministic replay
$ ixtract replay --run-id rx-20260408-001 --database mydb
Decision Check
Workers 8 8
Chunks 20 20
Strategy range_chunking range_chunking
Plan Hash f6b8048a... f6b8048a... ✔ identical
Determinism: ✔ Verified (plan_fingerprint match)
RuntimeContext — declare your environment
# Tell ixtract about your environment — it adjusts the plan accordingly
ixtract execute orders --database mydb \
--source-load high \
--network-quality degraded \
--priority low
# Workers reduced from 8 → 2 via multipliers
# Result: 920K rows/sec at 2 workers (faster than 8 at high load)
Sources & Outputs
| Source | Status |
|---|---|
| PostgreSQL | ✅ |
| MySQL | ✅ |
| SQL Server | ✅ |
| Output | Status |
|---|---|
| Parquet (local) | ✅ |
| CSV (local) | ✅ |
| Amazon S3 | ✅ |
| Google Cloud Storage | ✅ |
Real-world validation
Tested across local PostgreSQL and Azure SQL Server:
| Run | Table | Result |
|---|---|---|
| Baseline | pgbench_accounts (10M rows) | 856K rows/sec, 11.7s, 8 workers |
| RuntimeContext | Same, high load declared | 2 workers → 920K rows/sec (faster) |
| Skewed table | skewed_events (CV=2.05) | 43x skew detected, work-stealing active |
| Cloud SQL Server | Azure, p50=30ms latency | 8.7K rows/sec, anomaly flagged at 44.3σ |
| Replay | Run 1 replayed exactly | Plan hash ✔ identical, +0.3% throughput delta |
540 tests passing across 15 simulation suites.
Key guarantees
| Guarantee | Definition |
|---|---|
| Deterministic planning | Same inputs → same plan, every time |
| Explainable decisions | Every plan choice has a structured justification |
| Bounded adaptation | No single adjustment exceeds configured step limits |
| Source safety | Conservative bias under uncertainty — never overloads source |
| Idempotent execution | Retries produce no duplicates |
| Snapshot consistency | REPEATABLE READ — no missing or duplicate rows |
| Deterministic replay | Any run can be re-executed exactly from its stored plan |
Installation
# Core (PostgreSQL)
pip install ixtract
# With MySQL support
pip install "ixtract[mysql]"
# With SQL Server support
pip install "ixtract[sqlserver]"
# With cloud writers (S3, GCS)
pip install "ixtract[cloud]"
# Everything
pip install "ixtract[all]"
Development
git clone https://github.com/ixtractr/ixtract.git
cd ixtract
pip install -e ".[dev]"
pytest tests/simulation/ -q
Product family
| Product | License | Purpose |
|---|---|---|
| ixtract | MIT | Extraction runtime — self-tuning, deterministic, explainable |
| iPoxy | MIT (coming soon) | Pipeline reinforcement |
| ixora | Commercial (coming soon) | Fleet intelligence |
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ixtract-0.9.4.tar.gz.
File metadata
- Download URL: ixtract-0.9.4.tar.gz
- Upload date:
- Size: 108.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b1fa8dcef37cbb04ecfd27e5fe8a2aee8e081a5499f4f816832b616fdafaf66
|
|
| MD5 |
ae71c2c2456a14f52172f24dfd8eeb0a
|
|
| BLAKE2b-256 |
8bc819f80c8449c07def5ea34a6ecf5298c1d5151c7ead4e890e1bfb3d24e695
|
File details
Details for the file ixtract-0.9.4-py3-none-any.whl.
File metadata
- Download URL: ixtract-0.9.4-py3-none-any.whl
- Upload date:
- Size: 126.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65299fed99b152d27c81d15b9986aa61d8c419b14254a46fe1ad1dddb55bc45a
|
|
| MD5 |
dff7d92d2eff11313b269dde6fa03820
|
|
| BLAKE2b-256 |
c51cb4db85ecd3af03eca56c5699816bfe524c22d55819bed85d444b85074469
|