Quantum data preparation — the missing preprocessing layer between classical datasets and quantum computing frameworks
Project description
QuPrep — Quantum Data Preparation
The missing preprocessing layer between classical datasets and quantum computing frameworks.
QuPrep converts classical datasets into quantum-circuit-ready format. It is not a quantum computing framework, simulator, or training tool. It is the preprocessing step that feeds into Qiskit, PennyLane, Cirq, TKET, and any other quantum workflow.
Think of QuPrep as the pandas of quantum data preparation: a focused, composable tool that does one thing exceptionally well.
CSV / DataFrame / NumPy → QuPrep → circuit-ready output for your framework
What QuPrep does
- Ingest CSV, NumPy arrays, and Pandas DataFrames
- Clean missing values, outliers, and categorical features
- Reduce dimensionality to fit your hardware qubit budget (PCA, LDA, DFT, UMAP, hardware-aware)
- Normalize data correctly per encoding method — automatically
- Encode data using 12 encoding methods: Angle, Amplitude, Basis, IQP, Entangled Angle, Re-uploading, Hamiltonian, ZZFeatureMap, PauliFeatureMap, RandomFourier, TensorProduct, QAOAProblem
- Recommend the best encoding for your dataset and task
- Suggest a qubit budget based on dataset size and target task
- Compare encoders side-by-side on cost, depth, and NISQ safety
- Export circuits to OpenQASM 3.0, Qiskit, PennyLane, Cirq, TKET, Amazon Braket, Q#, IQM
- Save entire batches of circuits as individual QASM files
- Visualize circuits as ASCII diagrams or matplotlib figures
- Save and reload fitted pipelines without re-fitting
- Detect data drift between training and new data automatically
- Formulate combinatorial optimization problems as QUBO / Ising models (Max-Cut, TSP, Knapsack, Portfolio, Graph Colouring, Scheduling, Number Partitioning)
- Solve with exact brute-force (n ≤ 20) or simulated annealing (any n)
- Generate QAOA circuits and export to D-Wave Ocean SDK format
What QuPrep does NOT do
It does not train models, simulate circuits, run on quantum hardware, optimize variational parameters, or replace any existing framework.
Installation
pip install quprep
With optional framework exports:
pip install quprep[qiskit] # Qiskit QuantumCircuit
pip install quprep[pennylane] # PennyLane QNode
pip install quprep[cirq] # Cirq Circuit
pip install quprep[tket] # TKET/pytket Circuit
pip install quprep[viz] # matplotlib circuit diagrams
pip install quprep[all] # everything above
Requirements: Python ≥ 3.10. Core dependencies: numpy, scipy, pandas, scikit-learn.
Quickstart
One-liner
import quprep as qd
result = qd.prepare("data.csv", encoding="angle", framework="qasm")
print(result.circuit)
Encoding recommendation
import quprep as qd
rec = qd.recommend("data.csv", task="classification", qubits=8)
print(rec) # ranked table with reasoning
result = rec.apply("data.csv")
Pipeline API
import quprep as qd # all public classes on the top-level namespace
pipeline = qd.Pipeline(
reducer=qd.PCAReducer(n_components=8),
encoder=qd.IQPEncoder(reps=2),
exporter=qd.PennyLaneExporter(), # pip install quprep[pennylane]
)
result = pipeline.fit_transform("data.csv")
qnode = result.circuit # callable qml.QNode
Circuit visualization
import quprep as qd
# ASCII — no dependencies
print(qd.draw_ascii(result.encoded[0]))
# matplotlib — pip install quprep[viz]
qd.draw_matplotlib(result.encoded[0], filename="circuit.png")
QUBO / combinatorial optimization
from quprep.qubo import max_cut, knapsack, solve_brute, solve_sa, qaoa_circuit
import numpy as np
# Max-Cut on a weighted graph
adj = np.array([[0,1,1],[1,0,1],[1,1,0]], dtype=float)
q = max_cut(adj)
print(q.evaluate(np.array([0., 1., 1.]))) # -2.0
# Brute-force (n ≤ 20) or simulated annealing (any n)
sol = solve_brute(q) # exact
sol = solve_sa(q, seed=42) # heuristic, scales to n ~ 500+
# Generate a QAOA circuit
qasm = qaoa_circuit(q, p=2)
# D-Wave Ocean SDK export
bqm_dict = q.to_dwave() # {(i, j): coeff}
Qubit suggestion
import quprep as qd
s = qd.suggest_qubits("data.csv", task="classification")
print(s.n_qubits) # recommended qubit count
print(s.encoding_hint) # e.g. "angle"
print(s.warning) # set if dataset exceeds NISQ ceiling
Data drift detection
import quprep as qd
det = qd.DriftDetector()
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), drift_detector=det)
pipeline.fit(X_train)
result = pipeline.transform(X_new)
print(result.drift_report.overall_drift) # True / False
print(result.drift_report.drifted_features) # list of feature names
Pipeline save / load
import quprep as qd
pipeline = qd.Pipeline(reducer=qd.PCAReducer(n_components=8), encoder=qd.AngleEncoder())
pipeline.fit(X_train)
pipeline.save("pipeline.pkl")
loaded = qd.Pipeline.load("pipeline.pkl")
result = loaded.transform(X_new) # no re-fitting needed
Validation & cost estimation
import quprep as qd
# Define expected schema and attach to pipeline
schema = qd.DataSchema([
qd.FeatureSpec("age", dtype="continuous", min_value=0, max_value=120),
qd.FeatureSpec("income", dtype="continuous", min_value=0),
])
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), schema=schema)
result = pipeline.fit_transform("data.csv")
# Cost estimate is computed automatically at fit time
print(result.cost.nisq_safe) # True
print(result.cost.circuit_depth)
result.summary() # audit table + cost breakdown
CLI
quprep convert data.csv --encoding angle --framework qasm
quprep convert data.csv --encoding iqp --framework pennylane
quprep convert data.csv --encoding angle --save-dir circuits/ # save each sample as a file
quprep recommend data.csv --task classification --qubits 8
quprep suggest data.csv --task classification # qubit budget recommendation
quprep compare data.csv --task classification # side-by-side encoder comparison
quprep validate data.csv # shape, columns, NaN report
quprep validate data.csv --infer-schema schema.json # infer schema and save
quprep validate data.csv --schema schema.json # enforce schema (exit 1 on violation)
quprep qubo maxcut --adjacency "0,1,1;1,0,1;1,1,0" --solve
quprep qubo knapsack --weights "2,3,4" --values "3,4,5" --capacity 5
quprep qubo qaoa maxcut --adjacency "0,1,1;1,0,1;1,1,0" --p 2 --output circuit.qasm
Supported encodings
| Encoding | Qubits | Depth | NISQ-safe | Best for |
|---|---|---|---|---|
| Angle (Ry/Rx/Rz) | n = d | O(1) | ✅ Excellent | Most QML tasks |
| Amplitude | ⌈log₂ d⌉ | O(2ⁿ) | ❌ Poor | Qubit-limited scenarios |
| Basis | n = d | O(1) | ✅ Excellent | Binary features / QAOA |
| Entangled Angle | n = d | O(d · layers) | ✅ Good | Feature correlations |
| IQP | n = d | O(d² · reps) | ⚠️ Medium | Kernel methods |
| Re-uploading | n = d | O(d · layers) | ✅ Good | High-expressivity QNNs |
| Hamiltonian | n = d | O(d · steps) | ⚠️ Medium | Physics simulation / VQE |
| ZZ Feature Map | n = d | O(d² · reps) | ⚠️ Medium | Quantum kernel methods |
| Pauli Feature Map | n = d | O(d² · reps) | ⚠️ Medium | Configurable kernel methods |
| Random Fourier | n_components | O(1) | ✅ Excellent | RBF kernel approximation |
| Tensor Product | ⌈d/2⌉ | O(1) | ✅ Excellent | Qubit-efficient encoding |
| QAOA Problem | n = d | O(p) | ✅ Good | QAOA warm-start, problem-inspired maps |
Supported export frameworks
| Framework | Install | Output |
|---|---|---|
| OpenQASM 3.0 | (included) | str |
| Qiskit | pip install quprep[qiskit] |
QuantumCircuit |
| PennyLane | pip install quprep[pennylane] |
qml.QNode |
| Cirq | pip install quprep[cirq] |
cirq.Circuit |
| TKET | pip install quprep[tket] |
pytket.Circuit |
| Amazon Braket | pip install quprep[braket] |
braket.Circuit |
| Q# | pip install quprep[qsharp] |
Q# operation string |
| IQM | pip install quprep[iqm] |
IQM circuit JSON |
Documentation
Full documentation at docs.quprep.org
Examples
See the examples/ directory. Launch any notebook directly:
Contributing
Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.
- Open an issue for bugs or feature requests
- Start a discussion for questions or ideas
License
Apache 2.0 — see LICENSE.
Citation
If you use QuPrep in your research, please cite:
@software{quprep2026,
author = {Perera, Hasarindu},
title = {QuPrep: Quantum Data Preparation},
year = {2026},
publisher = {Zenodo},
version = {0.5.0},
doi = {10.5281/zenodo.19286258},
url = {https://doi.org/10.5281/zenodo.19286258},
license = {Apache-2.0},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quprep-0.6.0.tar.gz.
File metadata
- Download URL: quprep-0.6.0.tar.gz
- Upload date:
- Size: 457.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57a6de9280ddfa5bf8b3feda38f5938674061e62563385f42924580114d2c7b6
|
|
| MD5 |
8da59327a09823757eab23039fe2ff60
|
|
| BLAKE2b-256 |
125bcbea1cb7ccc1e580d9905eee33dbdd09b4bc609abcce50d9ada2d8c4407e
|
Provenance
The following attestation bundles were made for quprep-0.6.0.tar.gz:
Publisher:
release.yml on quprep/quprep
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quprep-0.6.0.tar.gz -
Subject digest:
57a6de9280ddfa5bf8b3feda38f5938674061e62563385f42924580114d2c7b6 - Sigstore transparency entry: 1219035791
- Sigstore integration time:
-
Permalink:
quprep/quprep@50fb6f6f7eecb3988206dde9c88eb3283c69813a -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/quprep
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@50fb6f6f7eecb3988206dde9c88eb3283c69813a -
Trigger Event:
push
-
Statement type:
File details
Details for the file quprep-0.6.0-py3-none-any.whl.
File metadata
- Download URL: quprep-0.6.0-py3-none-any.whl
- Upload date:
- Size: 140.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0cd42a0a2d46e0671bf9ff687925675076a23dec9f102bc0890bf64a1d20190
|
|
| MD5 |
36cadb3ffa28c118fd899a6b43b09d85
|
|
| BLAKE2b-256 |
a9842cf59f0ebfbbd9fc258896fa0028f9b7d499b6a3024f91708f9d0c184426
|
Provenance
The following attestation bundles were made for quprep-0.6.0-py3-none-any.whl:
Publisher:
release.yml on quprep/quprep
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quprep-0.6.0-py3-none-any.whl -
Subject digest:
d0cd42a0a2d46e0671bf9ff687925675076a23dec9f102bc0890bf64a1d20190 - Sigstore transparency entry: 1219035841
- Sigstore integration time:
-
Permalink:
quprep/quprep@50fb6f6f7eecb3988206dde9c88eb3283c69813a -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/quprep
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@50fb6f6f7eecb3988206dde9c88eb3283c69813a -
Trigger Event:
push
-
Statement type: