Deterministic adaptive extraction runtime

These details have not been verified by PyPI

Project links

Project description

ixtract

Deterministic Adaptive Extraction Runtime

ixtract extracts data from databases to files with automatic parallelism optimization, feedback-driven learning, and full explainability. It treats extraction as a closed-loop control problem — not static configuration.

pip install ixtract

The problem

You tune worker counts by hand. You guess at chunk sizes. When throughput drops, you don't know why. When a job takes twice as long as yesterday, there's no explanation — just a number that got worse.

ixtract was built to answer the questions data engineers actually ask:

Why is this job slower today than yesterday?
How many workers should I actually use?
Why does adding more workers make it worse?
Am I overloading my source database?
Is it safe to run this during business hours?
Can I trust this to run unattended?

Quickstart

pip install ixtract

from ixtract import plan, execute, ExtractionIntent

intent = ExtractionIntent(
    source_type="postgresql",
    source_config={"host": "localhost", "database": "mydb", "user": "app"},
    object_name="orders",
)

result = plan(intent)
if result.is_safe:
    execution = execute(result)
    print(f"{execution.rows_extracted:,} rows in {execution.duration_seconds:.1f}s")

Or use the CLI:

# Profile your source (run once before first extraction)
ixtract profile orders --database mydb --user app

# Preview the plan
ixtract plan orders --database mydb --user app

# Extract
ixtract execute orders --database mydb --user app --output ./data

# Check health before next run
ixtract inspect orders

How it works

Profile → Plan → Execute → Learn

Run 1:  Profile source → plan with profiler recommendation → extract → record
Run 2:  Controller adjusts workers based on throughput signal
Run 3+: Controller converges toward optimal worker count
Every:  Deviation diagnosed, anomaly detected, results explained

The controller uses direction-aware hill-climbing: if adding a worker helped, try adding another. If it hurt, reverse. Converges to near-optimal in ≤5 runs under stable conditions.

CLI Commands

ixtract profile <table>   --database <db>                    # Profile source
ixtract plan <table>      --database <db>                    # Preview plan
ixtract execute <table>   --database <db> --output <dir>     # Extract
ixtract inspect <table>                                       # Health check
ixtract diagnose          --object <table>                   # Diagnose last run
ixtract explain           --object <table>                   # Explain last run
ixtract history <table>                                       # Run history
ixtract metrics           --object <table>                   # Run metrics
ixtract benchmark <table> --database <db>                    # Calibrate throughput
ixtract replay            --run-id <id> --database <db>      # Replay exact run

`inspect` — operational health

$ ixtract inspect orders

Inspect: orders (postgresql)

Last Run          rx-20260414-173011     11.7s    856K/s    SUCCESS
Controller        CONVERGED at 8 workers    stable (drift -3.8%)
Anomalies         None
Profile           ✔ up-to-date

Health
  HEALTHY ✔
  System is stable and operating within expected bounds.

Exit codes are a contract:

ixtract inspect orders
echo $?   # 0=HEALTHY  1=NEEDS ATTENTION  2=DEGRADED

`diagnose` — root cause analysis

$ ixtract diagnose --object events

Skew Analysis:
  Severity:   Severe (43.2x max/median)
  Slowest:    chunk_001 (2.07s, 1,502,847 rows)
  Fastest:    chunk_006 (0.03s, 10,241 rows)

Suggestion:
  → Severe skew detected. Work-stealing is active and mitigating.
  → Consider density-aware chunking for a permanent fix.

`replay` — deterministic replay

$ ixtract replay --run-id rx-20260408-001 --database mydb

  Decision Check
  Workers        8         8
  Chunks         20        20
  Strategy       range_chunking    range_chunking
  Plan Hash      f6b8048a...       f6b8048a... ✔ identical

  Determinism: ✔ Verified (plan_fingerprint match)

RuntimeContext — declare your environment

# Tell ixtract about your environment — it adjusts the plan accordingly
ixtract execute orders --database mydb \
  --source-load high \
  --network-quality degraded \
  --priority low

# Workers reduced from 8 → 2 via multipliers
# Result: 920K rows/sec at 2 workers (faster than 8 at high load)

Sources & Outputs

Source	Status
PostgreSQL	✅
MySQL	✅
SQL Server	✅

Output	Status
Parquet (local)	✅
CSV (local)	✅
Amazon S3	✅
Google Cloud Storage	✅

Real-world validation

Tested across local PostgreSQL and Azure SQL Server:

Run	Table	Result
Baseline	pgbench_accounts (10M rows)	856K rows/sec, 11.7s, 8 workers
RuntimeContext	Same, high load declared	2 workers → 920K rows/sec (faster)
Skewed table	skewed_events (CV=2.05)	43x skew detected, work-stealing active
Cloud SQL Server	Azure, p50=30ms latency	8.7K rows/sec, anomaly flagged at 44.3σ
Replay	Run 1 replayed exactly	Plan hash ✔ identical, +0.3% throughput delta

540 tests passing across 15 simulation suites.

Key guarantees

Guarantee	Definition
Deterministic planning	Same inputs → same plan, every time
Explainable decisions	Every plan choice has a structured justification
Bounded adaptation	No single adjustment exceeds configured step limits
Source safety	Conservative bias under uncertainty — never overloads source
Idempotent execution	Retries produce no duplicates
Snapshot consistency	REPEATABLE READ — no missing or duplicate rows
Deterministic replay	Any run can be re-executed exactly from its stored plan

Installation

# Core (PostgreSQL)
pip install ixtract

# With MySQL support
pip install "ixtract[mysql]"

# With SQL Server support
pip install "ixtract[sqlserver]"

# With cloud writers (S3, GCS)
pip install "ixtract[cloud]"

# Everything
pip install "ixtract[all]"

Development

git clone https://github.com/ixtractr/ixtract.git
cd ixtract
pip install -e ".[dev]"
pytest tests/simulation/ -q

Product family

Product	License	Purpose
ixtract	MIT	Extraction runtime — self-tuning, deterministic, explainable
iPoxy	MIT (coming soon)	Pipeline reinforcement
ixora	Commercial (coming soon)	Fleet intelligence

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.4

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ixtract-0.9.4.tar.gz (108.3 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ixtract-0.9.4-py3-none-any.whl (126.8 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file ixtract-0.9.4.tar.gz.

File metadata

Download URL: ixtract-0.9.4.tar.gz
Upload date: Apr 14, 2026
Size: 108.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for ixtract-0.9.4.tar.gz
Algorithm	Hash digest
SHA256	`8b1fa8dcef37cbb04ecfd27e5fe8a2aee8e081a5499f4f816832b616fdafaf66`
MD5	`ae71c2c2456a14f52172f24dfd8eeb0a`
BLAKE2b-256	`8bc819f80c8449c07def5ea34a6ecf5298c1d5151c7ead4e890e1bfb3d24e695`

See more details on using hashes here.

File details

Details for the file ixtract-0.9.4-py3-none-any.whl.

File metadata

Download URL: ixtract-0.9.4-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 126.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for ixtract-0.9.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`65299fed99b152d27c81d15b9986aa61d8c419b14254a46fe1ad1dddb55bc45a`
MD5	`dff7d92d2eff11313b269dde6fa03820`
BLAKE2b-256	`c51cb4db85ecd3af03eca56c5699816bfe524c22d55819bed85d444b85074469`

See more details on using hashes here.

ixtract 0.9.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ixtract

The problem

Quickstart

How it works

CLI Commands

inspect — operational health

diagnose — root cause analysis

replay — deterministic replay

RuntimeContext — declare your environment

Sources & Outputs

Real-world validation

Key guarantees

Installation

Development

Product family

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`inspect` — operational health

`diagnose` — root cause analysis

`replay` — deterministic replay