Python SDK for the Reparatio data conversion API (alpha)
Project description
reparatio — Python SDK
Alpha software. The API surface may change without notice between versions. Pin to a specific version in production.
Python client library for the Reparatio data conversion API.
Inspect, convert, merge, append, and query CSV, Excel, Parquet, JSON, GeoJSON, SQLite, and 30+ other formats with a single function call.
See also: reparatio-cli (command-line tool) · reparatio-mcp (MCP server for AI assistants)
Installation
pip install reparatio
Requires Python 3.9 or later.
Quick start
from reparatio import Reparatio
client = Reparatio(api_key="rp_YOUR_KEY")
# Inspect a file
result = client.inspect("sales.csv")
print(f"{result.rows} rows, {len(result.columns)} columns")
for col in result.columns:
print(f" {col.name}: {col.dtype}")
# Convert to Parquet
out = client.convert("sales.csv", "parquet")
with open(out.filename, "wb") as f:
f.write(out.content)
# SQL query
out = client.query("events.parquet", "SELECT region, SUM(revenue) FROM data GROUP BY region ORDER BY 2 DESC")
with open("by_region.csv", "wb") as f:
f.write(out.content)
Authentication
The API key can be supplied in three ways, in order of precedence:
- Passed directly:
Reparatio(api_key="rp_...") - Environment variable:
REPARATIO_API_KEY=rp_... - Omitted entirely (only
inspectandformatswork without a key)
Get a key at reparatio.app (Professional plan — $79/mo). API access requires the Professional plan; the Standard plan ($29/mo) covers web UI only.
Context manager
The client holds an HTTP connection pool. Use it as a context manager to ensure it is closed:
with Reparatio(api_key="rp_...") as client:
result = client.inspect("data.csv")
Or call client.close() explicitly when done.
Reference
Reparatio(api_key, base_url, timeout)
| Parameter | Default | Description |
|---|---|---|
api_key |
$REPARATIO_API_KEY |
Your rp_... API key |
base_url |
https://reparatio.app |
Override the API host |
timeout |
120.0 |
Request timeout in seconds |
client.formats() → FormatsResult
Return the list of supported input and output formats. No API key required.
f = client.formats()
print(f.input) # ["csv", "tsv", "xlsx", ...]
print(f.output) # ["csv", "parquet", "json.gz", ...]
client.me() → MeResult
Return subscription details for the current API key.
me = client.me()
print(me.email, me.plan, me.active)
MeResult fields: email, plan, active, api_access, expires_at
client.inspect(file, ...) → InspectResult
Detect encoding, count rows, list column types and statistics, and return a data preview. No API key required.
result = client.inspect(
"data.csv",
preview_rows=20,
fix_encoding=True,
)
print(result.rows, result.detected_encoding)
for col in result.columns:
print(col.name, col.dtype, col.null_count, col.unique_count)
for row in result.preview:
print(row)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
str / Path / bytes | required | File path or raw bytes |
filename |
str | "file" |
Original filename (required when passing bytes) |
no_header |
bool | False |
Treat first row as data |
fix_encoding |
bool | True |
Auto-detect and repair encoding |
preview_rows |
int | 8 |
Number of preview rows (1–100) |
delimiter |
str | "" |
Custom delimiter (auto-detected if blank) |
sheet |
str | "" |
Sheet name for Excel, ODS, or SQLite |
InspectResult fields: filename, detected_encoding, rows, columns (list of ColumnInfo), preview, sheets
ColumnInfo fields: name, dtype, null_count, unique_count
client.convert(file, target_format, ...) → ConvertResult
Convert a file from any supported input format to any supported output format. Requires a Professional plan key ($79/mo).
# Basic conversion
out = client.convert("sales.csv", "parquet")
Path(out.filename).write_bytes(out.content)
# Select and rename columns, compress output
out = client.convert(
"big.csv",
"csv.gz",
select_columns=["date", "region", "revenue"],
columns=["Date", "Region", "Revenue"],
)
# Deduplicate and take a 10% sample
out = client.convert("events.csv", "xlsx", deduplicate=True, sample_frac=0.1)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
str / Path / bytes | required | File path or raw bytes |
target_format |
str | required | Output format (see formats) |
filename |
str | "file" |
Original filename (required when passing bytes) |
no_header |
bool | False |
Treat first row as data |
fix_encoding |
bool | True |
Repair encoding |
delimiter |
str | "" |
Custom delimiter for CSV-like input |
sheet |
str | "" |
Sheet or table to read |
columns |
list[str] | [] |
Rename all columns (new names in order) |
select_columns |
list[str] | [] |
Columns to include (all if empty) |
deduplicate |
bool | False |
Remove duplicate rows |
sample_n |
int | 0 |
Random sample of N rows |
sample_frac |
float | 0.0 |
Random sample fraction (e.g. 0.1 for 10%) |
geometry_column |
str | "geometry" |
WKT geometry column for GeoJSON output |
cast_columns |
dict | {} |
Override inferred column types (see below) |
null_values |
list[str] | [] |
Strings to treat as null at load time, e.g. ["N/A", "NULL", "-"] |
encoding_override |
str | None |
Force a specific encoding, bypassing auto-detection. Any Python codec name: "cp037" (EBCDIC US), "cp500" (EBCDIC International), "cp1026" (EBCDIC Turkish), "cp1140" (EBCDIC US+Euro), "latin-1", etc. |
cast_columns format:
# Cast price to Float64 and parse date from "31/12/2024" format
out = client.convert(
"sales.csv",
"parquet",
cast_columns={
"price": {"type": "Float64"},
"date": {"type": "Date", "format": "%d/%m/%Y"},
},
)
Supported types: String, Int8–Int64, UInt8–UInt64, Float32, Float64,
Boolean, Date, Datetime, Time. Values that cannot be cast are silently set to null.
EBCDIC and encoding override:
# Convert an EBCDIC mainframe file (cp037 = EBCDIC US)
out = client.convert("mainframe.dat", "csv", encoding_override="cp037")
Path(out.filename).write_bytes(out.content)
# EBCDIC International (cp500)
out = client.convert("ibm_export.dat", "parquet", encoding_override="cp500")
Path(out.filename).write_bytes(out.content)
When encoding_override is set, chardet auto-detection is skipped entirely and the
specified codec is used directly. Omit the parameter (or pass None) to use the
default auto-detection behaviour, which also includes an EBCDIC heuristic.
client.batch_convert(zip_file, target_format, ...) → ConvertResult
Convert every file inside a ZIP archive to a common format.
Returns a ZIP archive in result.content. Files that fail to parse are skipped;
their names and errors are available as a JSON string in result.warning.
Requires a Professional plan key ($79/mo).
out = client.batch_convert("monthly_reports.zip", "parquet")
Path("converted.zip").write_bytes(out.content)
if out.warning:
import json
for err in json.loads(out.warning):
print(f"Skipped {err['file']}: {err['error']}")
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
zip_file |
str / Path / bytes | required | ZIP archive path or raw bytes |
target_format |
str | required | Output format for every file |
filename |
str | "batch.zip" |
Original filename (when passing bytes) |
no_header |
bool | False |
Treat first row as data |
fix_encoding |
bool | True |
Repair encoding |
delimiter |
str | "" |
Custom delimiter |
select_columns |
list[str] | [] |
Columns to include (all if empty) |
deduplicate |
bool | False |
Remove duplicate rows from each file |
sample_n |
int | 0 |
Random sample of N rows per file |
sample_frac |
float | 0.0 |
Random sample fraction per file |
cast_columns |
dict | {} |
Column type overrides for every file |
client.merge(file1, file2, operation, target_format, ...) → ConvertResult
Merge or join two files. Requires a Professional plan key ($79/mo).
out = client.merge(
"orders.csv",
"customers.xlsx",
operation="left",
target_format="parquet",
join_on="customer_id",
)
Path(out.filename).write_bytes(out.content)
if out.warning:
print("Warning:", out.warning)
Operations:
| Value | Behaviour |
|---|---|
append |
Stack all rows from both files; missing columns filled with null |
left |
All rows from file 1; matching columns from file 2 |
right |
All rows from file 2; matching columns from file 1 |
outer |
All rows from both files; nulls where no match |
inner |
Only rows present in both files |
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
file1 |
str / Path / bytes | required | First file |
file2 |
str / Path / bytes | required | Second file |
operation |
str | required | Join type (see table above) |
target_format |
str | required | Output format |
filename1 |
str | "file1" |
Original name of file1 (when passing bytes) |
filename2 |
str | "file2" |
Original name of file2 (when passing bytes) |
join_on |
str | "" |
Comma-separated column(s) to join on |
no_header |
bool | False |
Treat first row as data |
fix_encoding |
bool | True |
Repair encoding |
geometry_column |
str | "geometry" |
WKT geometry column for GeoJSON output |
client.append(files, target_format, ...) → ConvertResult
Stack rows from two or more files vertically. Column mismatches are handled gracefully — missing values are filled with null. Requires a Professional plan key ($79/mo).
import glob
paths = sorted(glob.glob("monthly/*.csv"))
out = client.append(paths, "parquet")
Path("all_months.parquet").write_bytes(out.content)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
files |
list[str / Path / bytes] | required | File paths or bytes (minimum 2) |
target_format |
str | required | Output format |
filenames |
list[str] | auto | Original filenames (when passing bytes) |
no_header |
bool | False |
Treat first row as data |
fix_encoding |
bool | True |
Repair encoding |
client.query(file, sql, ...) → ConvertResult
Run a SQL query against a file.
The file is loaded as a table named data.
Requires a Professional plan key ($79/mo).
out = client.query(
"events.parquet",
"SELECT region, SUM(revenue) AS total FROM data WHERE year = 2025 GROUP BY region ORDER BY total DESC",
target_format="json",
)
print(out.content.decode())
Supports SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, aggregations, and most scalar functions.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
str / Path / bytes | required | File path or raw bytes |
sql |
str | required | SQL query (table name: data) |
filename |
str | "file" |
Original filename (required when passing bytes) |
target_format |
str | "csv" |
Output format |
no_header |
bool | False |
Treat first row as data |
fix_encoding |
bool | True |
Repair encoding |
delimiter |
str | "" |
Custom delimiter for CSV-like input |
sheet |
str | "" |
Sheet or table to read |
ConvertResult
Returned by convert(), merge(), append(), and query().
| Field | Type | Description |
|---|---|---|
content |
bytes | Raw file content |
filename |
str | Suggested output filename |
warning |
str or None | Server warning (e.g. column mismatch) |
Write the file to disk:
out = client.convert("data.csv", "parquet")
Path(out.filename).write_bytes(out.content)
Supported formats
Input
CSV, TSV, CSV.GZ, CSV.BZ2, CSV.ZST, CSV.ZIP, TSV.GZ, TSV.BZ2, TSV.ZST, TSV.ZIP, GZ (any supported format), ZIP (any supported format), BZ2 (any supported format), ZST (any supported format), Excel (.xlsx / .xls), ODS, JSON, JSON.GZ, JSON.BZ2, JSON.ZST, JSON.ZIP, JSON Lines, GeoJSON, Parquet, Feather, Arrow, ORC, Avro, SQLite, YAML, BSON, SRT, VTT, HTML, Markdown, XML, SQL dump, PDF (text layer)
Output
CSV, TSV, CSV.GZ, CSV.BZ2, CSV.ZST, CSV.ZIP, TSV.GZ, TSV.BZ2, TSV.ZST, TSV.ZIP, Excel (.xlsx), ODS, JSON, JSON.GZ, JSON.BZ2, JSON.ZST, JSON.ZIP, JSON Lines, JSON Lines.GZ, JSON Lines.BZ2, JSON Lines.ZST, JSON Lines.ZIP, GeoJSON, GeoJSON.GZ, GeoJSON.BZ2, GeoJSON.ZST, GeoJSON.ZIP, Parquet, Feather, Arrow, ORC, Avro, SQLite, YAML, BSON, SRT, VTT
Error handling
All errors are subclasses of reparatio.ReparatioError:
| Exception | Cause |
|---|---|
AuthenticationError |
Missing, invalid, or expired API key |
InsufficientPlanError |
Operation requires a Professional plan |
FileTooLargeError |
File exceeds the server's size limit |
ParseError |
File could not be parsed in the detected format |
APIError |
Unexpected server error (has .status_code and .detail) |
from reparatio import Reparatio, AuthenticationError, ParseError
try:
out = client.convert("bad.csv", "parquet")
except AuthenticationError:
print("Check your API key")
except ParseError as e:
print(f"Could not read file: {e}")
Running the Examples
The repository includes 15 runnable examples covering every API method.
# clone and install
git clone https://github.com/jfrancis42/reparatio-sdk-py
cd reparatio-sdk-py
pip install -e .
# run all examples
REPARATIO_API_KEY=EXAMPLE-EXAMPLE-EXAMPLE \
python examples/examples.py
# run a single example
python -c "from examples.examples import ex_formats; ex_formats()"
Set REPARATIO_API_KEY to your API key before running. Examples that require authentication (all except ex_formats) will fail without a valid key.
Privacy
Files are sent to the Reparatio API at reparatio.app for processing.
Files are handled in memory and never stored — see the Privacy Policy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reparatio-0.1.0.tar.gz.
File metadata
- Download URL: reparatio-0.1.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5797eaeef822e28c5814055e15eb816990ac43eb61c95f7d5d37a88696bc7ba
|
|
| MD5 |
1f17a3d7f099db4730354f7c3659b56a
|
|
| BLAKE2b-256 |
5c3cb034f84af5c93ccebd934d686804f7ecc906f9dee38cf0735171319116f6
|
File details
Details for the file reparatio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: reparatio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb3c7747fb88fc910f450e6927d81bf844c6713f0f4c8e8a14d8fbc198c5515a
|
|
| MD5 |
bf2e33a7aff673e3e473f20ae6d6f926
|
|
| BLAKE2b-256 |
b7d55482992b3798007e0f72e51b7b88d7f25b3c99c99c6a442a870309061aaf
|