wrang: lightning-fast data wrangling toolkit for the terminal
Project description
wrang
Lightning-fast data wrangling for the terminal.
wrang is a terminal-native data analysis toolkit. Load, inspect, clean, transform, and export datasets without writing a single line of boilerplate. Use it interactively, script it as a Python library, or wire it into CI pipelines.
Install
pip install wrang
Optional extras:
pip install wrang[viz] # matplotlib + seaborn
pip install wrang[advanced] # duckdb + connectorx
pip install wrang[full] # everything
Requires Python 3.10+.
Migrating from
ride-cli? Bothwrangandridecommands are installed — they are identical aliases. Your existingride ...scripts will continue to work unchanged.
Quick start
# Interactive mode
wrang
# Load a file directly
wrang data.csv
# Quick inspect (CI-friendly JSON output)
wrang data.csv --inspect --output-format json
# SQL query via DuckDB
wrang data.csv --sql "SELECT dept, AVG(salary) FROM data GROUP BY dept"
# Generate self-contained HTML profile report
wrang data.csv --profile
# Compare two datasets
wrang --compare before.csv after.csv
# Stream large files in chunks
wrang large.csv --chunk-size 50000
# Export to Parquet
wrang data.csv --export clean.parquet
Interactive menu
wrang
The full interactive session gives you a menu-driven workflow:
| Option | Action |
|---|---|
1 |
Load dataset (CSV / Excel / Parquet / JSON) |
2 |
Inspect — shape, types, missing values, quality report |
3 |
Explore — correlations, distributions, outliers, plots |
4 |
Clean — impute, deduplicate, handle outliers, fix types |
5 |
Transform — encode, scale, polynomial features, binning |
6 |
Visualize — terminal histograms, scatter, heatmap |
7 |
Export — save to any supported format |
8 |
Settings — configure wrang preferences |
9 |
SQL Query — run DuckDB SQL against the current dataset |
10 |
HTML Profile — generate a full standalone HTML report |
11 |
Validate — check data against a JSON/YAML schema |
$ |
Quick export — save current dataset instantly |
q |
Exit |
Python API
wrang is also a full Python library. Every module is independently importable.
Load & save
from wrang import FastDataLoader, DataSaver
loader = FastDataLoader()
df = loader.load("sales.csv") # auto-detects format
df_lazy = loader.scan_lazy("big.parquet") # lazy frame for large files
saver = DataSaver()
saver.save(df, "output.parquet")
Inspect
from wrang import DataInspector
inspector = DataInspector(df)
info = inspector.get_basic_info()
print(info["n_rows"], info["missing_values_total"])
inspector.display_overview() # rich terminal output
inspector.display_data_quality()
Explore
from wrang import DataExplorer
explorer = DataExplorer(df)
corr = explorer.analyze_correlations(method="pearson")
outliers = explorer.detect_outliers(method="iqr")
normality = explorer.test_normality()
explorer.plot_histogram("age")
explorer.plot_scatter("age", "salary")
explorer.plot_correlation_heatmap()
Clean
from wrang import DataCleaner
from wrang.config import ImputationStrategy
cleaned = (
DataCleaner(df)
.handle_missing_values(ImputationStrategy.MEDIAN, columns=["age", "salary"])
.handle_missing_values(ImputationStrategy.MODE, columns=["dept"])
.remove_duplicates()
.handle_outliers(method="iqr", action="remove")
.get_cleaned_data()
)
Supported imputation strategies: DROP, MEAN, MEDIAN, MODE, FORWARD_FILL, BACKWARD_FILL, CUSTOM_VALUE, DISTRIBUTION, KNN.
Transform
from wrang import DataTransformer, create_pipeline
from wrang.config import EncodingMethod, ScalingMethod
result = (
DataTransformer(df)
.encode_categorical_features(method=EncodingMethod.ONEHOT, columns=["dept"])
.scale_features(method=ScalingMethod.STANDARD, columns=["age", "salary"])
.get_transformed_data()
)
# Or use the pipeline builder
result = (
create_pipeline(df)
.encode_categorical_features(method=EncodingMethod.LABEL)
.scale_features(method=ScalingMethod.ROBUST)
.create_polynomial_features(degree=2)
.get_transformed_data()
)
Validate
from wrang import DataSchema, ColumnSchema, DataValidator
schema = DataSchema(columns=[
ColumnSchema(name="id", dtype="Int64", nullable=False, unique=True),
ColumnSchema(name="salary", dtype="Float64", nullable=False, min_value=0.0),
ColumnSchema(name="dept", dtype="String", allowed_values=["eng", "hr"]),
])
result = DataValidator(schema).validate(df)
print(result.passed) # True / False
for v in result.violations:
print(v.severity, v.message)
# Infer schema from data and save to file
from wrang import infer_schema
infer_schema(df).to_json("schema.json")
Configuration
from wrang.config import get_config, update_config, reset_config
config = get_config()
print(config.outlier_factor) # 1.5
update_config(outlier_factor=2.0, chunk_size=5000)
reset_config()
User config is persisted at ~/.wrang/config.json.
Notebook usage
import polars as pl
from wrang import DataInspector, DataCleaner, DataExplorer
from wrang.config import ImputationStrategy
df = pl.read_csv("titanic.csv")
# Profile the data
DataInspector(df).display_overview()
# Clean
df_clean = (
DataCleaner(df)
.handle_missing_values(ImputationStrategy.MEDIAN)
.remove_duplicates()
.get_cleaned_data()
)
# Explore
explorer = DataExplorer(df_clean.select(["Age", "Fare", "Pclass"]))
explorer.plot_histogram("Age")
explorer.plot_scatter("Age", "Fare")
Supported file formats
| Format | Read | Write | Notes |
|---|---|---|---|
| CSV | ✓ | ✓ | Auto delimiter detection |
Excel (.xlsx) |
✓ | ✓ | via openpyxl |
Excel (.xls) |
✓ | — | via xlrd |
| Parquet | ✓ | ✓ | Columnar, fast |
| JSON / JSON Lines | ✓ | ✓ | Auto schema inference |
Non-interactive CLI reference
wrang [FILE] [OPTIONS]
Options:
--inspect Print dataset overview and exit
--output-format {text,json} Output format (default: text)
--profile Generate standalone HTML report
--sql QUERY Run DuckDB SQL against FILE (table: "data")
--compare FILE_A FILE_B Diff two datasets
--chunk-size N Stream FILE in N-row chunks
--export PATH Export dataset to PATH
--format {csv,excel,parquet,json} Export format
--version Show version and exit
--help-topic {usage,examples,formats,config}
--debug / --verbose
Testing
pip install wrang[dev]
pytest tests/ -v
# 217 passed, 2 xfailed
Contributing
- Fork the repo
- Create a feature branch
- Run the test suite — all tests must pass
- Open a pull request
Bug reports and feature requests → GitHub Issues.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wrang-0.2.1.tar.gz.
File metadata
- Download URL: wrang-0.2.1.tar.gz
- Upload date:
- Size: 100.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13cf159d1120afead63fa9f32118096606dec5eeb3e464a4318f4c3909dba3ee
|
|
| MD5 |
86342e506f0d8654dd98cd050be5644f
|
|
| BLAKE2b-256 |
0c08858b6373237e31f6cd3a3f41005acdc70ea945d9ddc25c5cf5ed52eadd7d
|
Provenance
The following attestation bundles were made for wrang-0.2.1.tar.gz:
Publisher:
publish.yml on sudhanshumukherjeexx/wrang
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wrang-0.2.1.tar.gz -
Subject digest:
13cf159d1120afead63fa9f32118096606dec5eeb3e464a4318f4c3909dba3ee - Sigstore transparency entry: 1282915841
- Sigstore integration time:
-
Permalink:
sudhanshumukherjeexx/wrang@c08c21b5f3fcb8facafaf69be595e1720f9e1a5f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/sudhanshumukherjeexx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c08c21b5f3fcb8facafaf69be595e1720f9e1a5f -
Trigger Event:
push
-
Statement type:
File details
Details for the file wrang-0.2.1-py3-none-any.whl.
File metadata
- Download URL: wrang-0.2.1-py3-none-any.whl
- Upload date:
- Size: 94.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f3bafca5af3bf81e156e2570ddf1364c1be8db3c120be9166b557ebc7d0928e
|
|
| MD5 |
82995ffe2294231b28b23dccdb3a6e3d
|
|
| BLAKE2b-256 |
3224840bc67aa23e2e3951664e106278fe0a07a529091a5add27d9eca98844ee
|
Provenance
The following attestation bundles were made for wrang-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on sudhanshumukherjeexx/wrang
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wrang-0.2.1-py3-none-any.whl -
Subject digest:
0f3bafca5af3bf81e156e2570ddf1364c1be8db3c120be9166b557ebc7d0928e - Sigstore transparency entry: 1282915845
- Sigstore integration time:
-
Permalink:
sudhanshumukherjeexx/wrang@c08c21b5f3fcb8facafaf69be595e1720f9e1a5f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/sudhanshumukherjeexx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c08c21b5f3fcb8facafaf69be595e1720f9e1a5f -
Trigger Event:
push
-
Statement type: