Skip to main content

wrang: lightning-fast data wrangling toolkit for the terminal

Project description

wrang

Lightning-fast data wrangling for the terminal.

PyPI version Python License Tests

wrang is a terminal-native data analysis toolkit. Load, inspect, clean, transform, and export datasets without writing a single line of boilerplate. Use it interactively, script it as a Python library, or wire it into CI pipelines.


Install

pip install wrang

Optional extras:

pip install wrang[viz]       # matplotlib + seaborn
pip install wrang[advanced]  # duckdb + connectorx
pip install wrang[full]      # everything

Requires Python 3.10+.

Migrating from ride-cli? Both wrang and ride commands are installed — they are identical aliases. Your existing ride ... scripts will continue to work unchanged.


Quick start

# Interactive mode
wrang

# Load a file directly
wrang data.csv

# Quick inspect (CI-friendly JSON output)
wrang data.csv --inspect --output-format json

# SQL query via DuckDB
wrang data.csv --sql "SELECT dept, AVG(salary) FROM data GROUP BY dept"

# Generate self-contained HTML profile report
wrang data.csv --profile

# Compare two datasets
wrang --compare before.csv after.csv

# Stream large files in chunks
wrang large.csv --chunk-size 50000

# Export to Parquet
wrang data.csv --export clean.parquet

Interactive menu

wrang

The full interactive session gives you a menu-driven workflow:

Option Action
1 Load dataset (CSV / Excel / Parquet / JSON)
2 Inspect — shape, types, missing values, quality report
3 Explore — correlations, distributions, outliers, plots
4 Clean — impute, deduplicate, handle outliers, fix types
5 Transform — encode, scale, polynomial features, binning
6 Visualize — terminal histograms, scatter, heatmap
7 Export — save to any supported format
8 Settings — configure wrang preferences
9 SQL Query — run DuckDB SQL against the current dataset
10 HTML Profile — generate a full standalone HTML report
11 Validate — check data against a JSON/YAML schema
$ Quick export — save current dataset instantly
q Exit

Python API

wrang is also a full Python library. Every module is independently importable.

Load & save

from wrang import FastDataLoader, DataSaver

loader = FastDataLoader()
df = loader.load("sales.csv")           # auto-detects format
df_lazy = loader.scan_lazy("big.parquet")   # lazy frame for large files

saver = DataSaver()
saver.save(df, "output.parquet")

Inspect

from wrang import DataInspector

inspector = DataInspector(df)
info = inspector.get_basic_info()
print(info["n_rows"], info["missing_values_total"])

inspector.display_overview()         # rich terminal output
inspector.display_data_quality()

Explore

from wrang import DataExplorer

explorer = DataExplorer(df)

corr = explorer.analyze_correlations(method="pearson")
outliers = explorer.detect_outliers(method="iqr")
normality = explorer.test_normality()

explorer.plot_histogram("age")
explorer.plot_scatter("age", "salary")
explorer.plot_correlation_heatmap()

Clean

from wrang import DataCleaner
from wrang.config import ImputationStrategy

cleaned = (
    DataCleaner(df)
    .handle_missing_values(ImputationStrategy.MEDIAN, columns=["age", "salary"])
    .handle_missing_values(ImputationStrategy.MODE,   columns=["dept"])
    .remove_duplicates()
    .handle_outliers(method="iqr", action="remove")
    .get_cleaned_data()
)

Supported imputation strategies: DROP, MEAN, MEDIAN, MODE, FORWARD_FILL, BACKWARD_FILL, CUSTOM_VALUE, DISTRIBUTION, KNN.

Transform

from wrang import DataTransformer, create_pipeline
from wrang.config import EncodingMethod, ScalingMethod

result = (
    DataTransformer(df)
    .encode_categorical_features(method=EncodingMethod.ONEHOT, columns=["dept"])
    .scale_features(method=ScalingMethod.STANDARD, columns=["age", "salary"])
    .get_transformed_data()
)

# Or use the pipeline builder
result = (
    create_pipeline(df)
    .encode_categorical_features(method=EncodingMethod.LABEL)
    .scale_features(method=ScalingMethod.ROBUST)
    .create_polynomial_features(degree=2)
    .get_transformed_data()
)

Validate

from wrang import DataSchema, ColumnSchema, DataValidator

schema = DataSchema(columns=[
    ColumnSchema(name="id",     dtype="Int64",   nullable=False, unique=True),
    ColumnSchema(name="salary", dtype="Float64", nullable=False, min_value=0.0),
    ColumnSchema(name="dept",   dtype="String",  allowed_values=["eng", "hr"]),
])

result = DataValidator(schema).validate(df)
print(result.passed)           # True / False
for v in result.violations:
    print(v.severity, v.message)

# Infer schema from data and save to file
from wrang import infer_schema
infer_schema(df).to_json("schema.json")

Configuration

from wrang.config import get_config, update_config, reset_config

config = get_config()
print(config.outlier_factor)   # 1.5

update_config(outlier_factor=2.0, chunk_size=5000)
reset_config()

User config is persisted at ~/.wrang/config.json.


Notebook usage

import polars as pl
from wrang import DataInspector, DataCleaner, DataExplorer
from wrang.config import ImputationStrategy

df = pl.read_csv("titanic.csv")

# Profile the data
DataInspector(df).display_overview()

# Clean
df_clean = (
    DataCleaner(df)
    .handle_missing_values(ImputationStrategy.MEDIAN)
    .remove_duplicates()
    .get_cleaned_data()
)

# Explore
explorer = DataExplorer(df_clean.select(["Age", "Fare", "Pclass"]))
explorer.plot_histogram("Age")
explorer.plot_scatter("Age", "Fare")

Supported file formats

Format Read Write Notes
CSV Auto delimiter detection
Excel (.xlsx) via openpyxl
Excel (.xls) via xlrd
Parquet Columnar, fast
JSON / JSON Lines Auto schema inference

Non-interactive CLI reference

wrang [FILE] [OPTIONS]

Options:
  --inspect                  Print dataset overview and exit
  --output-format {text,json}  Output format (default: text)
  --profile                  Generate standalone HTML report
  --sql QUERY                Run DuckDB SQL against FILE (table: "data")
  --compare FILE_A FILE_B    Diff two datasets
  --chunk-size N             Stream FILE in N-row chunks
  --export PATH              Export dataset to PATH
  --format {csv,excel,parquet,json}  Export format
  --version                  Show version and exit
  --help-topic {usage,examples,formats,config}
  --debug / --verbose

Testing

pip install wrang[dev]
pytest tests/ -v
# 217 passed, 2 xfailed

Contributing

  1. Fork the repo
  2. Create a feature branch
  3. Run the test suite — all tests must pass
  4. Open a pull request

Bug reports and feature requests → GitHub Issues.


License

MIT — see LICENSE.


Built with Polars and Rich.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrang-0.2.0.tar.gz (100.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wrang-0.2.0-py3-none-any.whl (94.8 kB view details)

Uploaded Python 3

File details

Details for the file wrang-0.2.0.tar.gz.

File metadata

  • Download URL: wrang-0.2.0.tar.gz
  • Upload date:
  • Size: 100.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wrang-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b3bb8c517a1aed3d6bdb29134415fcf29667afc0160f8ae8f1c7f84e11b8a676
MD5 6b67a4f930d02123d39e0ea2ab73582b
BLAKE2b-256 ede5fd3a9ddd5723eda57204aa72a2b1907e664df6ace814ce933246b5e8e931

See more details on using hashes here.

Provenance

The following attestation bundles were made for wrang-0.2.0.tar.gz:

Publisher: publish.yml on sudhanshumukherjeexx/wrang

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wrang-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: wrang-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 94.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wrang-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c6b89c3fd6fe85ce1e9362614da2e047335a29d98dc1004867c41865acf3a757
MD5 9fb46de555fc484867df1c1c925031b8
BLAKE2b-256 c5fcc3fe855b94d7e6ae9f7f141fdfb8aa3593a3e8ed022a8606ffc137f3c506

See more details on using hashes here.

Provenance

The following attestation bundles were made for wrang-0.2.0-py3-none-any.whl:

Publisher: publish.yml on sudhanshumukherjeexx/wrang

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page