Skip to main content

Quantum data preparation — the missing preprocessing layer between classical datasets and quantum computing frameworks

Project description

QuPrep — Quantum Data Preparation

The missing preprocessing layer between classical datasets and quantum computing frameworks.

PyPI version Python 3.10+ License: Apache 2.0 DOI Documentation CI codecov CodeQL OpenSSF Scorecard OpenSSF Best Practices Hugging Face Demo


QuPrep converts classical datasets into quantum-circuit-ready format. It is not a quantum computing framework, simulator, or training tool. It is the preprocessing step that feeds into Qiskit, PennyLane, Cirq, TKET, and any other quantum workflow.

Think of QuPrep as the pandas of quantum data preparation: a focused, composable tool that does one thing exceptionally well.

CSV / DataFrame / NumPy  →  QuPrep  →  circuit-ready output for your framework

What QuPrep does

  • Ingest CSV, NumPy arrays, and Pandas DataFrames
  • Clean missing values, outliers, and categorical features
  • Reduce dimensionality to fit your hardware qubit budget (PCA, LDA, DFT, UMAP, hardware-aware)
  • Normalize data correctly per encoding method — automatically
  • Encode data using 12 encoding methods: Angle, Amplitude, Basis, IQP, Entangled Angle, Re-uploading, Hamiltonian, ZZFeatureMap, PauliFeatureMap, RandomFourier, TensorProduct, QAOAProblem
  • Recommend the best encoding for your dataset and task
  • Suggest a qubit budget based on dataset size and target task
  • Compare encoders side-by-side on cost, depth, and NISQ safety
  • Export circuits to OpenQASM 3.0, Qiskit, PennyLane, Cirq, TKET, Amazon Braket, Q#, IQM
  • Save entire batches of circuits as individual QASM files
  • Visualize circuits as ASCII diagrams or matplotlib figures
  • Save and reload fitted pipelines without re-fitting
  • Detect data drift between training and new data automatically
  • Formulate combinatorial optimization problems as QUBO / Ising models (Max-Cut, TSP, Knapsack, Portfolio, Graph Colouring, Scheduling, Number Partitioning)
  • Solve with exact brute-force (n ≤ 20) or simulated annealing (any n)
  • Generate QAOA circuits and export to D-Wave Ocean SDK format

What QuPrep does NOT do

It does not train models, simulate circuits, run on quantum hardware, optimize variational parameters, or replace any existing framework.


Installation

pip install quprep

With optional framework exports:

pip install quprep[qiskit]     # Qiskit QuantumCircuit
pip install quprep[pennylane]  # PennyLane QNode
pip install quprep[cirq]       # Cirq Circuit
pip install quprep[tket]       # TKET/pytket Circuit
pip install quprep[viz]        # matplotlib circuit diagrams
pip install quprep[all]        # everything above

Requirements: Python ≥ 3.10. Core dependencies: numpy, scipy, pandas, scikit-learn.


Quickstart

One-liner

import quprep as qd

result = qd.prepare("data.csv", encoding="angle", framework="qasm")
print(result.circuit)

Encoding recommendation

import quprep as qd

rec = qd.recommend("data.csv", task="classification", qubits=8)
print(rec)                    # ranked table with reasoning
result = rec.apply("data.csv")

Pipeline API

import quprep as qd  # all public classes on the top-level namespace

pipeline = qd.Pipeline(
    reducer=qd.PCAReducer(n_components=8),
    encoder=qd.IQPEncoder(reps=2),
    exporter=qd.PennyLaneExporter(),   # pip install quprep[pennylane]
)
result = pipeline.fit_transform("data.csv")
qnode = result.circuit   # callable qml.QNode

Circuit visualization

import quprep as qd

# ASCII — no dependencies
print(qd.draw_ascii(result.encoded[0]))

# matplotlib — pip install quprep[viz]
qd.draw_matplotlib(result.encoded[0], filename="circuit.png")

QUBO / combinatorial optimization

from quprep.qubo import max_cut, knapsack, solve_brute, solve_sa, qaoa_circuit
import numpy as np

# Max-Cut on a weighted graph
adj = np.array([[0,1,1],[1,0,1],[1,1,0]], dtype=float)
q = max_cut(adj)
print(q.evaluate(np.array([0., 1., 1.])))  # -2.0

# Brute-force (n ≤ 20) or simulated annealing (any n)
sol = solve_brute(q)        # exact
sol = solve_sa(q, seed=42)  # heuristic, scales to n ~ 500+

# Generate a QAOA circuit
qasm = qaoa_circuit(q, p=2)

# D-Wave Ocean SDK export
bqm_dict = q.to_dwave()   # {(i, j): coeff}

Qubit suggestion

import quprep as qd

s = qd.suggest_qubits("data.csv", task="classification")
print(s.n_qubits)        # recommended qubit count
print(s.encoding_hint)   # e.g. "angle"
print(s.warning)         # set if dataset exceeds NISQ ceiling

Data drift detection

import quprep as qd

det = qd.DriftDetector()
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), drift_detector=det)
pipeline.fit(X_train)

result = pipeline.transform(X_new)
print(result.drift_report.overall_drift)      # True / False
print(result.drift_report.drifted_features)   # list of feature names

Pipeline save / load

import quprep as qd

pipeline = qd.Pipeline(reducer=qd.PCAReducer(n_components=8), encoder=qd.AngleEncoder())
pipeline.fit(X_train)
pipeline.save("pipeline.pkl")

loaded = qd.Pipeline.load("pipeline.pkl")
result = loaded.transform(X_new)   # no re-fitting needed

Validation & cost estimation

import quprep as qd

# Define expected schema and attach to pipeline
schema = qd.DataSchema([
    qd.FeatureSpec("age",    dtype="continuous", min_value=0, max_value=120),
    qd.FeatureSpec("income", dtype="continuous", min_value=0),
])
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), schema=schema)
result = pipeline.fit_transform("data.csv")

# Cost estimate is computed automatically at fit time
print(result.cost.nisq_safe)    # True
print(result.cost.circuit_depth)
result.summary()                # audit table + cost breakdown

CLI

quprep convert data.csv --encoding angle --framework qasm
quprep convert data.csv --encoding iqp --framework pennylane
quprep convert data.csv --encoding angle --save-dir circuits/  # save each sample as a file

quprep recommend data.csv --task classification --qubits 8
quprep suggest data.csv --task classification       # qubit budget recommendation
quprep compare data.csv --task classification       # side-by-side encoder comparison

quprep validate data.csv                              # shape, columns, NaN report
quprep validate data.csv --infer-schema schema.json  # infer schema and save
quprep validate data.csv --schema schema.json        # enforce schema (exit 1 on violation)

quprep qubo maxcut --adjacency "0,1,1;1,0,1;1,1,0" --solve
quprep qubo knapsack --weights "2,3,4" --values "3,4,5" --capacity 5
quprep qubo qaoa maxcut --adjacency "0,1,1;1,0,1;1,1,0" --p 2 --output circuit.qasm

Supported encodings

Encoding Qubits Depth NISQ-safe Best for
Angle (Ry/Rx/Rz) n = d O(1) ✅ Excellent Most QML tasks
Amplitude ⌈log₂ d⌉ O(2ⁿ) ❌ Poor Qubit-limited scenarios
Basis n = d O(1) ✅ Excellent Binary features / QAOA
Entangled Angle n = d O(d · layers) ✅ Good Feature correlations
IQP n = d O(d² · reps) ⚠️ Medium Kernel methods
Re-uploading n = d O(d · layers) ✅ Good High-expressivity QNNs
Hamiltonian n = d O(d · steps) ⚠️ Medium Physics simulation / VQE
ZZ Feature Map n = d O(d² · reps) ⚠️ Medium Quantum kernel methods
Pauli Feature Map n = d O(d² · reps) ⚠️ Medium Configurable kernel methods
Random Fourier n_components O(1) ✅ Excellent RBF kernel approximation
Tensor Product ⌈d/2⌉ O(1) ✅ Excellent Qubit-efficient encoding
QAOA Problem n = d O(p) ✅ Good QAOA warm-start, problem-inspired maps

Supported export frameworks

Framework Install Output
OpenQASM 3.0 (included) str
Qiskit pip install quprep[qiskit] QuantumCircuit
PennyLane pip install quprep[pennylane] qml.QNode
Cirq pip install quprep[cirq] cirq.Circuit
TKET pip install quprep[tket] pytket.Circuit
Amazon Braket pip install quprep[braket] braket.Circuit
Q# pip install quprep[qsharp] Q# operation string
IQM pip install quprep[iqm] IQM circuit JSON

Documentation

Full documentation at docs.quprep.org


Examples

See the examples/ directory. Launch any notebook directly:

# Topic Launch
01 Quickstart — prepare() one-liner Colab Binder
02 Full pipeline — clean → encode → export → save/load Colab Binder
03 All encoders compared Colab Binder
04 Framework export — QASM, Qiskit, PennyLane, Cirq, TKET, Braket, Q#, IQM Colab Binder
05 Encoding recommendation Colab Binder
06 Circuit visualization — ASCII + matplotlib Colab Binder
07 QUBO / Ising — Max-Cut, Knapsack, solvers, D-Wave export, QAOA Colab Binder
08 Validation, schema & cost Colab Binder
09 Data drift detection Colab Binder
10 Qubit suggestion — suggest_qubits, task hints, NISQ ceiling Colab Binder
11 Plugin system — register custom encoders and exporters Colab Binder

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.


License

Apache 2.0 — see LICENSE.


Citation

If you use QuPrep in your research, please cite:

@software{quprep2026,
  author    = {Perera, Hasarindu},
  title     = {QuPrep: Quantum Data Preparation},
  year      = {2026},
  publisher = {Zenodo},
  version   = {0.5.0},
  doi       = {10.5281/zenodo.19286258},
  url       = {https://doi.org/10.5281/zenodo.19286258},
  license   = {Apache-2.0},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quprep-0.6.0.tar.gz (457.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quprep-0.6.0-py3-none-any.whl (140.7 kB view details)

Uploaded Python 3

File details

Details for the file quprep-0.6.0.tar.gz.

File metadata

  • Download URL: quprep-0.6.0.tar.gz
  • Upload date:
  • Size: 457.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quprep-0.6.0.tar.gz
Algorithm Hash digest
SHA256 57a6de9280ddfa5bf8b3feda38f5938674061e62563385f42924580114d2c7b6
MD5 8da59327a09823757eab23039fe2ff60
BLAKE2b-256 125bcbea1cb7ccc1e580d9905eee33dbdd09b4bc609abcce50d9ada2d8c4407e

See more details on using hashes here.

Provenance

The following attestation bundles were made for quprep-0.6.0.tar.gz:

Publisher: release.yml on quprep/quprep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file quprep-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: quprep-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 140.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quprep-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0cd42a0a2d46e0671bf9ff687925675076a23dec9f102bc0890bf64a1d20190
MD5 36cadb3ffa28c118fd899a6b43b09d85
BLAKE2b-256 a9842cf59f0ebfbbd9fc258896fa0028f9b7d499b6a3024f91708f9d0c184426

See more details on using hashes here.

Provenance

The following attestation bundles were made for quprep-0.6.0-py3-none-any.whl:

Publisher: release.yml on quprep/quprep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page