Skip to main content

A tiny, fast, Rust-backed transformation core for Python table data

Project description

🪶 feathertail

A high-performance Python DataFrame library powered by Rust — designed for flexibility, blazing speed, and intelligent type handling. Built for production with comprehensive features, advanced analytics, and enterprise-grade performance.


✨ Key Features

🚀 Core DataFrame Operations

  • ✅ Build TinyFrame from Python dict records (from_dicts)
  • ✅ Automatic type inference, including mixed-type and optional columns
  • ✅ Intelligent fallback to Python objects when Rust-native types aren't possible (stored by runtime pointer identity for the lifetime of the frame—keep references alive while using TinyFrame)
  • ✅ Flexible fillna to handle missing data
  • ✅ Powerful cast_column to convert columns between types
  • ✅ Smart edit_column: edits that automatically adjust column type if needed
  • ✅ Drop or rename columns easily
  • ✅ Export back to Python dicts (to_dicts)

🔗 Advanced Data Operations

  • Join Operations: Inner, left, right, outer, and cross joins
  • Filtering & Sorting: Advanced filtering with multiple conditions and multi-column sorting
  • GroupBy Aggregations: TinyGroupBy with string key columns — sum, mean, min, max, std, var, median, first, last, count, size (call each aggregation separately)
  • Window Functions: Rolling and expanding window operations
  • Ranking Functions: Rank calculation with multiple methods and percentage change

📊 Advanced Analytics

  • Descriptive Statistics: describe(), skew(), kurtosis(), quantile(), mode(), nunique()
  • Correlation & Covariance: Full correlation/covariance matrices and pairwise calculations
  • Time Series Operations: DateTime parsing (strict: invalid or empty strings raise ValueError), component extraction, time differences, and shifting
  • String Operations: Case conversion, whitespace removal, replacement, splitting, pattern matching, length, and concatenation
  • Data Validation: Not null, range, pattern, uniqueness validation with comprehensive reporting

Performance & Optimization

  • SIMD Operations: x86_64 optimized numerical operations for blazing speed
  • Parallel Processing: Multi-core operations using Rayon for GroupBy, filtering, and sorting
  • Memory Optimization: String interning, lazy evaluation, and copy-on-write optimizations
  • Chunked Processing: Handle large datasets efficiently with streaming operations
  • Rust-backed Core: Lightweight, fast, and dependency-light
  • Cross-Platform Builds: Automated CI/CD with pre-built wheels for all major platforms

🛠️ Developer Experience

  • Comprehensive Documentation: Sphinx-generated API docs with tutorials and guides
  • Logging & Debugging: Built-in logging system with performance monitoring
  • Profiling Tools: Performance profiling and optimization insights
  • Development Tools: Pre-commit hooks, automated testing, and development scripts
  • 250+ Comprehensive Tests: Full test coverage running in well under one second locally

📦 Installation

pip install feathertail

✅ Cross-Platform Support: Pre-built wheels are available for Python 3.8+ on:

  • Linux (x86_64)
  • macOS (ARM64/aarch64)
  • Windows (x86_64)

Building from Source

# Clone the repository
git clone https://github.com/eddiethedean/feathertail.git
cd feathertail

# Install dependencies and build
pip install maturin
maturin develop --release

# Or install in development mode
pip install -e .

🧑‍💻 Quickstart

Basic DataFrame Operations

import feathertail as ft

records = [
    {"name": "Alice", "age": 30, "city": "New York", "score": 95.5},
    {"name": "Bob", "age": None, "city": "Paris", "score": 85.0},
    {"name": "Charlie", "age": 25, "city": "New York", "score": None},
]

frame = ft.TinyFrame.from_dicts(records)
print(frame)

Output:

TinyFrame(rows=3, columns=4, cols={ 'name': 'Str', 'age': 'OptInt', 'city': 'Str', 'score': 'OptFloat' })

Advanced Filtering and Sorting

# Filter and sort data
filtered = frame.filter("age", ">", 25)
sorted_frame = frame.sort_values(["city", "age"], ascending=[True, False])
print(sorted_frame.to_dicts())

GroupBy Aggregations

# Group keys must be string columns. Build TinyGroupBy, then aggregate with the frame:
gb = ft.TinyGroupBy(frame, ["city"])
mean_age = gb.mean(frame, "age")
max_score = gb.max(frame, "score")
row_counts = gb.count(frame)

Join Operations

# Inner join with another DataFrame
other_data = [
    {"city": "New York", "population": 8_000_000},
    {"city": "Paris", "population": 2_000_000},
]
other_frame = ft.TinyFrame.from_dicts(other_data)

joined = frame.join(other_frame, "city", "city", "inner")
print(joined.to_dicts())

Join semantics

  • Composite keys: A row is used for matching only if every join-key column is non-null (SQL-style). If any key component is null, that row does not appear in the join key index.
  • Column names: The result keeps one name per logical column. If the same basename appears as a non-join column on both sides, or if a join-key name on one frame collides with a non-key column on the other, feathertail raises ValueError—rename on one frame first. Automatic _x / _y suffixing (pandas-style) is not implemented yet.
  • cross_join: Left and right must have disjoint column names; overlaps raise ValueError.
  • Python object fallback: Fallback object storage from both sides is merged on join outputs so to_dicts() can resolve references. Conflicting reuse of the same internal id for different objects raises a runtime error (very rare).

Advanced Analytics

# Descriptive statistics
description = frame.describe("score")
print(description.to_dicts())

# Correlation analysis
correlation = frame.corr("age", "score")
print(f"Age-Score correlation: {correlation}")

# Time series operations
time_data = [
    {"timestamp": "2023-01-01 10:00:00", "value": 100},
    {"timestamp": "2023-01-01 11:00:00", "value": 120},
]
time_frame = ft.TinyFrame.from_dicts(time_data)
time_frame = time_frame.to_timestamps("timestamp")  # adds `timestamp_timestamp` (Unix seconds)
time_frame = time_frame.dt_year("timestamp")      # still parses from original string column
print(time_frame.to_dicts())

Window Functions

# Rolling window operations
data = [{"value": i} for i in range(1, 11)]
window_frame = ft.TinyFrame.from_dicts(data)
rolling_mean = window_frame.rolling_mean("value", 3)
print(rolling_mean.to_dicts())

String Operations

# String manipulation
text_data = [{"text": "  hello world  "}, {"text": "foo bar"}]
text_frame = ft.TinyFrame.from_dicts(text_data)
processed = text_frame.str_upper("text").str_strip("text")
print(processed.to_dicts())

Data Validation

# Data quality checks
validation = frame.validate_not_null("age")
validation_summary = frame.validation_summary("age")
print(f"Validation summary: {validation_summary}")

🚀 Performance Features

SIMD-Accelerated Operations

# Automatic SIMD optimization for numerical operations
large_data = [{"category": "A" if i % 2 == 0 else "B", "value": i * 1.5} for i in range(100000)]
large_frame = ft.TinyFrame.from_dicts(large_data)

# TinyGroupBy keys must be string columns; aggregates run over numeric columns
gb = ft.TinyGroupBy(large_frame, ["category"])
sum_result = gb.sum(large_frame, "value")

Parallel Processing

# Multi-core operations for large datasets
# Automatically uses all available CPU cores
filtered = large_frame.filter("value", ">", 50000)
sorted_data = large_frame.sort_values("value")

Memory Optimization

# String interning and lazy evaluation
# Memory usage is automatically optimized
frame = ft.TinyFrame.from_dicts(records)
# Operations are optimized for memory efficiency

🛠️ Developer Tools

Logging and Debugging

# Enable comprehensive logging
ft.init_logging_with_config("info", log_memory=True, log_performance=True, log_operations=True)

# Enable debug mode
ft.enable_debug()

# Enable profiling
ft.enable_profiling()

# Your operations will be logged and profiled
frame = ft.TinyFrame.from_dicts(data)
result = frame.filter("age", ">", 25)

# View profiling report
ft.print_profiling_report()

Performance Monitoring

# Get operation statistics
stats = ft.get_operation_stats("filter")
print(f"Filter operations: {stats}")

# Get overall performance metrics
overall_stats = ft.get_overall_stats()
print(f"Total operations: {overall_stats['total_operations']}")

⚙️ Supported Types

Type Column variants Description
int Int, OptInt 64-bit integers with optional null support
float Float, OptFloat 64-bit floats with optional null support
bool Bool, OptBool Boolean values with optional null support
str Str, OptStr UTF-8 strings with optional null support
mixed Mixed, OptMixed Mixed types with automatic Python object fallback

cast_column and strings. Casting a non-optional Str column to int or float is strict: each cell must parse; otherwise ValueError is raised (values are not coerced to 0). Casting optional OptStr to numeric optional types maps unparseable strings to missing values where applicable.


📚 Documentation


🏗️ Build System & CI/CD

Automated Cross-Platform Builds

feathertail uses GitHub Actions to automatically build and test wheels for all major platforms:

  • 15 build configurations covering Python 3.8-3.12
  • 3 operating systems: Linux (Ubuntu), macOS (ARM64), Windows
  • Automated testing with wheel installation verification
  • Artifact management with 30-day retention
  • PyPI deployment on version tags

Build Matrix

Platform Python Versions Architecture
Ubuntu 3.8, 3.9, 3.10, 3.11, 3.12 x86_64
macOS 3.8, 3.9, 3.10, 3.11, 3.12 ARM64 (aarch64)
Windows 3.8, 3.9, 3.10, 3.11, 3.12 x86_64

Quality Assurance

  • Rust compilation with proper target architecture
  • Python wheel building with maturin
  • CI test matrix: pytest on Python 3.8–3.12 on Ubuntu, macOS, and Windows (release wheels built for the same range)
  • Installation testing from temp directories
  • Import verification to ensure module works correctly
  • Cross-platform compatibility testing

🧪 Testing

# Run all tests (Rust + Python; 250+ Python unit tests plus Rust tests)
make test

# Run specific test categories
python -m pytest tests/python/unit/test_tinyframe.py
python -m pytest tests/python/unit/test_joins.py
python -m pytest tests/python/unit/test_analytics.py

🏗️ Building from Source

# Clone the repository
git clone https://github.com/your-username/feathertail.git
cd feathertail

# Set up development environment
make dev

# Build the package
make build

# Run tests
make test

# Build documentation
make docs

🐉 Why "feathertail"?

In Fourth Wing, a "feathertail" is a juvenile dragon — small, golden, and nonviolent, known for grace rather than brute force.

This library follows the same spirit: gentle on dependencies, elegant in design, and capable of handling complex data types with ease — but with the power and performance of a full-grown dragon when you need it.


📊 Performance Benchmarks

  • 250+ Python unit tests plus Rust tests run in well under one second locally
  • SIMD-accelerated numerical operations
  • Parallel processing for multi-core performance
  • Memory-optimized with string interning and lazy evaluation
  • Production-ready with comprehensive error handling and logging

❤️ Contributing

Contributions, ideas, and feedback are always welcome! Please see our Contributing Guide for details.


📄 License

MIT


🎯 Roadmap

  • Cross-platform PyPI builds - ✅ Automated builds for Linux, macOS, and Windows
  • Additional time series functions
  • More statistical distributions
  • Enhanced plotting integration
  • Database connectors
  • Arrow/Parquet integration

Built with ❤️ using Rust and Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

feathertail-0.6.0-cp312-cp312-win_amd64.whl (477.1 kB view details)

Uploaded CPython 3.12Windows x86-64

feathertail-0.6.0-cp312-cp312-manylinux_2_34_x86_64.whl (633.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

feathertail-0.6.0-cp312-cp312-macosx_11_0_arm64.whl (567.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

feathertail-0.6.0-cp311-cp311-win_amd64.whl (475.0 kB view details)

Uploaded CPython 3.11Windows x86-64

feathertail-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl (630.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

feathertail-0.6.0-cp311-cp311-macosx_11_0_arm64.whl (567.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

feathertail-0.6.0-cp310-cp310-win_amd64.whl (475.0 kB view details)

Uploaded CPython 3.10Windows x86-64

feathertail-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl (630.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

feathertail-0.6.0-cp310-cp310-macosx_11_0_arm64.whl (567.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

feathertail-0.6.0-cp39-cp39-win_amd64.whl (477.0 kB view details)

Uploaded CPython 3.9Windows x86-64

feathertail-0.6.0-cp39-cp39-manylinux_2_34_x86_64.whl (631.5 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

feathertail-0.6.0-cp39-cp39-macosx_11_0_arm64.whl (569.0 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

feathertail-0.6.0-cp38-cp38-win_amd64.whl (476.0 kB view details)

Uploaded CPython 3.8Windows x86-64

feathertail-0.6.0-cp38-cp38-manylinux_2_34_x86_64.whl (631.1 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.34+ x86-64

feathertail-0.6.0-cp38-cp38-macosx_11_0_arm64.whl (568.7 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

File details

Details for the file feathertail-0.6.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: feathertail-0.6.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 477.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feathertail-0.6.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 72045245cf5959bc36844b3beda31a8d8957d9a458df9884adab9a8b8ffe0a02
MD5 8ff023464705578856d290b7fed3afec
BLAKE2b-256 6741e7638a49528a5e36448179c3770238b0febb6ab83f1d0cc5b0cacbd71394

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1c392dba675de0cedde200491e68cb461a2ffba5d5105d7fc4e5727001395d7c
MD5 6874f926502cdaf9a2998be3671c89e7
BLAKE2b-256 2d1ddc16424d712e68ec8dc445eed53b5090cdeff2529240ae55f2660d81918c

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bd501cbc1e3fb9f79f6616834186a94f63d16d346322e3566e38f2c57b388143
MD5 5a296da850cc2e141e0b8d26468cce47
BLAKE2b-256 c4d5fc22d6ccf639f066b62ec1ad16bdc49208777038298d99c482fdb432f830

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: feathertail-0.6.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 475.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feathertail-0.6.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 761fe37d2bc5503d8ff588151632c89a162c10bda9c0e31feb5da9152ec63569
MD5 c57260ed63249834f502aa9609414148
BLAKE2b-256 11f32859e083ccd8044c02acd15499d6ae0291308184eac2e75005979e8c7cd6

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 12a77bf2f5aed67703b1f7a406cb1d57f382177d31028ddb8a84eca8acfe1eea
MD5 7d4ae899eab5b5912ee3c6c0132300a1
BLAKE2b-256 d60c9da630f6b3600a3336e28e110ce8334080f5c141e44cfa22603175032ae4

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 950734572201c37928cc76c2d12dcad218eb542778ebc1bfd3a650d230f1c847
MD5 a2a6f1676ecda330541e38a87f365b8b
BLAKE2b-256 0ff7f4b40535049775a3a9e0ffa79c86762d3d72237e33ff0c67a8cffdcfb3b7

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: feathertail-0.6.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 475.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feathertail-0.6.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 3abfa53739350b0afb63c91dcf15f403421a4ecbd049671c68db24ffdd9616d0
MD5 cd29fdaac7f89d60603e697cebb92973
BLAKE2b-256 092b0556680a18558a644a24293212f3c0aa4195914ab1932fea26424bad9354

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 79193a5b336d070a193a9bb443ce8bfa322b801721ea850bf212d87de9c1bee6
MD5 df73907c03e942e7a9388d5a8dcbb6e3
BLAKE2b-256 c943101a670571fb9ce6927ec9322aa26321198c0b84fce0aa6614c12abf0fd5

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 75df706275b5c29703d4950d3ebe748093749f50c8ac2786e45fca051c2df4f4
MD5 b4381d1b6d1ffd7f6f665cab23ba175e
BLAKE2b-256 dc22ef95482327c82f28de53d9655805bb6e77beb881780e06aa424352d642c1

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: feathertail-0.6.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 477.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feathertail-0.6.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 19ed21eb5bce77b24180d21b7ff8e403cc8e89d4700ade0be96d09939fc1aec8
MD5 ebdfcfb0ff4aa9467f97ac0a0418c243
BLAKE2b-256 07f31c79299849c9e7b519ef9739106222776c3242f8606cbe58055167e7c78a

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ab155e2b37d8f31fd886cc201d25b69a8663d5bc4ae085f40fed641aec42893b
MD5 6aaac4983ddda790f21b2796bd0857a4
BLAKE2b-256 734a1e59f5888763c9f870ae77069580f9e803ffba363786243204a631336461

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fe958ace87c94db7005898dd7b1c2ba300c7581175d6b9c19120d4c84e82a182
MD5 f0f89da16445be28460b4c2e8d516191
BLAKE2b-256 c62b6e780a1abbcb0507c1430a024cab590a33724e5d2ec6f6207df059fb7795

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: feathertail-0.6.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 476.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feathertail-0.6.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 90d27c2e873e0a3ddf1f71e331e8fc6302a005a87cf4a6dbf3f960a10c49344d
MD5 6505bbb236e797be0ccd5b49f98c6d59
BLAKE2b-256 363bf5af2fe04a82c1398d7f2ee1fe0014f1214a607263e54c86976a784ae605

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp38-cp38-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp38-cp38-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6c623c30a95ea466d07bcad0d087583dcd014006536579154b8e49b9e12d0500
MD5 2d8a6529a8ee6833cb798b0a16f7064b
BLAKE2b-256 d1f0a75184235b2379fd44e2de9f1e97134555a943fca5e5f2e99579a476bd54

See more details on using hashes here.

File details

Details for the file feathertail-0.6.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for feathertail-0.6.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a2f6c4e29d02367b5c1ce31274121aa855ca414dd69b09dd28c1888fb860832d
MD5 14fc7bcf2e8d96247dba3e31ba1cd998
BLAKE2b-256 00cab111efe7716c8bbc6cdf91c19b8069e8b177647d7b529751b491cb88169f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page