Skip to main content

wrang: lightning-fast data wrangling toolkit for the terminal

Project description

wrang

Lightning-fast data wrangling for the terminal.

PyPI version Python License Tests

wrang is a terminal-native data analysis toolkit. Load, inspect, clean, transform, and export datasets without writing a single line of boilerplate. Use it interactively, script it as a Python library, or wire it into CI pipelines.


Install

pip install wrang

Optional extras:

pip install wrang[viz]       # matplotlib + seaborn
pip install wrang[advanced]  # duckdb + connectorx
pip install wrang[full]      # everything

Requires Python 3.10+.

Migrating from ride-cli? Both wrang and ride commands are installed — they are identical aliases. Your existing ride ... scripts will continue to work unchanged.


Quick start

# Interactive mode
wrang

# Load a file directly
wrang data.csv

# Quick inspect (CI-friendly JSON output)
wrang data.csv --inspect --output-format json

# SQL query via DuckDB
wrang data.csv --sql "SELECT dept, AVG(salary) FROM data GROUP BY dept"

# Generate self-contained HTML profile report
wrang data.csv --profile

# Compare two datasets
wrang --compare before.csv after.csv

# Stream large files in chunks
wrang large.csv --chunk-size 50000

# Export to Parquet
wrang data.csv --export clean.parquet

Interactive menu

wrang

The full interactive session gives you a menu-driven workflow:

Option Action
1 Load dataset (CSV / Excel / Parquet / JSON)
2 Inspect — shape, types, missing values, quality report
3 Explore — correlations, distributions, outliers, plots
4 Clean — impute, deduplicate, handle outliers, fix types
5 Transform — encode, scale, polynomial features, binning
6 Visualize — terminal histograms, scatter, heatmap
7 Export — save to any supported format
8 Settings — configure wrang preferences
9 SQL Query — run DuckDB SQL against the current dataset
10 HTML Profile — generate a full standalone HTML report
11 Validate — check data against a JSON/YAML schema
$ Quick export — save current dataset instantly
q Exit

Python API

wrang is also a full Python library. Every module is independently importable.

Load & save

from wrang import FastDataLoader, DataSaver

loader = FastDataLoader()
df = loader.load("sales.csv")           # auto-detects format
df_lazy = loader.scan_lazy("big.parquet")   # lazy frame for large files

saver = DataSaver()
saver.save(df, "output.parquet")

Inspect

from wrang import DataInspector

inspector = DataInspector(df)
info = inspector.get_basic_info()
print(info["n_rows"], info["missing_values_total"])

inspector.display_overview()         # rich terminal output
inspector.display_data_quality()

Explore

from wrang import DataExplorer

explorer = DataExplorer(df)

corr = explorer.analyze_correlations(method="pearson")
outliers = explorer.detect_outliers(method="iqr")
normality = explorer.test_normality()

explorer.plot_histogram("age")
explorer.plot_scatter("age", "salary")
explorer.plot_correlation_heatmap()

Clean

from wrang import DataCleaner
from wrang.config import ImputationStrategy

cleaned = (
    DataCleaner(df)
    .handle_missing_values(ImputationStrategy.MEDIAN, columns=["age", "salary"])
    .handle_missing_values(ImputationStrategy.MODE,   columns=["dept"])
    .remove_duplicates()
    .handle_outliers(method="iqr", action="remove")
    .get_cleaned_data()
)

Supported imputation strategies: DROP, MEAN, MEDIAN, MODE, FORWARD_FILL, BACKWARD_FILL, CUSTOM_VALUE, DISTRIBUTION, KNN.

Transform

from wrang import DataTransformer, create_pipeline
from wrang.config import EncodingMethod, ScalingMethod

result = (
    DataTransformer(df)
    .encode_categorical_features(method=EncodingMethod.ONEHOT, columns=["dept"])
    .scale_features(method=ScalingMethod.STANDARD, columns=["age", "salary"])
    .get_transformed_data()
)

# Or use the pipeline builder
result = (
    create_pipeline(df)
    .encode_categorical_features(method=EncodingMethod.LABEL)
    .scale_features(method=ScalingMethod.ROBUST)
    .create_polynomial_features(degree=2)
    .get_transformed_data()
)

Validate

from wrang import DataSchema, ColumnSchema, DataValidator

schema = DataSchema(columns=[
    ColumnSchema(name="id",     dtype="Int64",   nullable=False, unique=True),
    ColumnSchema(name="salary", dtype="Float64", nullable=False, min_value=0.0),
    ColumnSchema(name="dept",   dtype="String",  allowed_values=["eng", "hr"]),
])

result = DataValidator(schema).validate(df)
print(result.passed)           # True / False
for v in result.violations:
    print(v.severity, v.message)

# Infer schema from data and save to file
from wrang import infer_schema
infer_schema(df).to_json("schema.json")

Configuration

from wrang.config import get_config, update_config, reset_config

config = get_config()
print(config.outlier_factor)   # 1.5

update_config(outlier_factor=2.0, chunk_size=5000)
reset_config()

User config is persisted at ~/.wrang/config.json.


Notebook usage

import polars as pl
from wrang import DataInspector, DataCleaner, DataExplorer
from wrang.config import ImputationStrategy

df = pl.read_csv("titanic.csv")

# Profile the data
DataInspector(df).display_overview()

# Clean
df_clean = (
    DataCleaner(df)
    .handle_missing_values(ImputationStrategy.MEDIAN)
    .remove_duplicates()
    .get_cleaned_data()
)

# Explore
explorer = DataExplorer(df_clean.select(["Age", "Fare", "Pclass"]))
explorer.plot_histogram("Age")
explorer.plot_scatter("Age", "Fare")

Supported file formats

Format Read Write Notes
CSV Auto delimiter detection
Excel (.xlsx) via openpyxl
Excel (.xls) via xlrd
Parquet Columnar, fast
JSON / JSON Lines Auto schema inference

Non-interactive CLI reference

wrang [FILE] [OPTIONS]

Options:
  --inspect                  Print dataset overview and exit
  --output-format {text,json}  Output format (default: text)
  --profile                  Generate standalone HTML report
  --sql QUERY                Run DuckDB SQL against FILE (table: "data")
  --compare FILE_A FILE_B    Diff two datasets
  --chunk-size N             Stream FILE in N-row chunks
  --export PATH              Export dataset to PATH
  --format {csv,excel,parquet,json}  Export format
  --version                  Show version and exit
  --help-topic {usage,examples,formats,config}
  --debug / --verbose

Testing

pip install wrang[dev]
pytest tests/ -v
# 217 passed, 2 xfailed

Contributing

  1. Fork the repo
  2. Create a feature branch
  3. Run the test suite — all tests must pass
  4. Open a pull request

Bug reports and feature requests → GitHub Issues.


License

MIT — see LICENSE.


Built with Polars and Rich.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrang-0.2.2.tar.gz (100.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wrang-0.2.2-py3-none-any.whl (94.8 kB view details)

Uploaded Python 3

File details

Details for the file wrang-0.2.2.tar.gz.

File metadata

  • Download URL: wrang-0.2.2.tar.gz
  • Upload date:
  • Size: 100.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wrang-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7d2a818bcd2dcce840d03738aa9dca394cd746f70be1b704e7d9e0edd3571c69
MD5 c40fdde2cd7710aa5a490d169c258d05
BLAKE2b-256 c33c09c4cb5221c23cdbc0c506402a074cec41417b14f2ad8a9ff3754c40d180

See more details on using hashes here.

Provenance

The following attestation bundles were made for wrang-0.2.2.tar.gz:

Publisher: publish.yml on sudhanshumukherjeexx/wrang

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wrang-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: wrang-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 94.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wrang-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6cd2717e84533b9ea83d08002393d37ed975725895ac9d505f129c1ef6380c11
MD5 7cf91ec636c2efb2bff9ae7332abf112
BLAKE2b-256 abc1c144cf6feca0f499cc584541280f2b3ae1cdfe7351d90dacab40f2a0998b

See more details on using hashes here.

Provenance

The following attestation bundles were made for wrang-0.2.2-py3-none-any.whl:

Publisher: publish.yml on sudhanshumukherjeexx/wrang

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page