Skip to main content

CSV column statistics powered by Ea SIMD kernels

Project description

eastat

CSV column statistics powered by Ea SIMD kernels.

Computes count, mean, stddev, min, max, p25, p50, p75 for every numeric column. String columns get length statistics.

Install

pip install eastat

Pre-built wheels include compiled SIMD kernels for Linux x86_64, Linux aarch64, and Windows x86_64. No compiler needed.

Usage

eastat data.csv
eastat --json data.csv
eastat -d '\t' data.tsv
eastat -c 0,2,4 data.csv
eastat --no-quotes data.csv   # force fast scan (skip quote detection)
eastat --quoted data.csv      # force quote-aware scan

Or from Python:

from eastat import process

results, headers, n_rows, col_count, timings = process("data.csv")

How it works

Four Ea kernels form a zero-copy pipeline over a memory-mapped file:

Kernel What it does
csv_scan AVX2 structural scanner — finds delimiter and newline positions using u8x32 comparison + movemask. Two modes: fast (no quotes) and quote-aware. Includes count_positions_quoted for two-pass large-file strategy.
csv_layout Builds row boundary arrays and per-row delimiter index via merge-scan. O(n_delims + n_rows).
csv_parse Batch ASCII-to-float parser with whitespace/quote trimming. Field length stats for string columns.
csv_stats f32x8 dual-accumulator FMA reduction for sum, min, max, sum-of-squares in one pass. SIMD binary-search percentiles (p25/p50/p75).

Scan modes

eastat auto-detects whether the CSV contains quoted fields by sampling the first 4 KB:

  • Fast scan — no quote handling. SIMD chunk-skip via movemask. Best throughput.
  • Quoted scan — tracks quote state to ignore delimiters/newlines inside quoted fields.

For large files (>128 MB), a two-pass strategy avoids over-allocation: count_positions_quoted counts positions first, then exact-sized buffers are allocated for the SIMD scan pass.

Override with --no-quotes or --quoted.

Precision & fairness

eastat uses a hybrid strategy — Eä SIMD kernels where they genuinely outperform, NumPy where it's the right tool:

Statistic Engine Precision Notes
sum, min, max, sumsq f32x8 SIMD f32 Dual-accumulator FMA reduction — faster than NumPy f64
percentiles (p25/p50/p75) np.percentile f64 Same algorithm as pandas — fair comparison, O(n) partial sort
CSV parsing batch_atof f32 Handles integers, decimals, signed values, scientific notation (1.5e-3, -2.0E+5)
structural scan Eä AVX2/NEON movemask-based delimiter/newline detection

Scientific notation (e/E with optional +/- sign) is fully supported in numeric parsing, matching pandas/polars behavior.

Building from source

Only needed if there's no pre-built wheel for your platform.

# Install the Ea compiler
# See https://github.com/petlukk/eacompute/releases

# Compile kernels
EA_BIN=./ea ./build_kernels.sh

# Install
pip install -e .

Requirements

  • Python 3.9+
  • NumPy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eastat-0.1.1-py3-none-win_amd64.whl (19.3 kB view details)

Uploaded Python 3Windows x86-64

eastat-0.1.1-py3-none-manylinux_2_17_x86_64.whl (23.9 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

eastat-0.1.1-py3-none-manylinux_2_17_aarch64.whl (22.5 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

File details

Details for the file eastat-0.1.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: eastat-0.1.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eastat-0.1.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 f0fa0181478843abc62bc62c605f718cfe23f7b834394529d00190f243d89210
MD5 72740c48c3e2f187845517cfaf27a68a
BLAKE2b-256 38b3ec07f1bfbc495794060df500e645dd0fd2df3ccd1a55b575b9593a50d7a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastat-0.1.1-py3-none-win_amd64.whl:

Publisher: publish.yml on petlukk/eastat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastat-0.1.1-py3-none-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for eastat-0.1.1-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 6b47549f68995c0770e79b587822f7c4a8436d67040d1acc18ed38227bf78143
MD5 3d8b7b84bd555b69ada37aea58e682d2
BLAKE2b-256 6f758705dbe3c84b8eef86d7d52aabc6ae21bad36434ef15f957e1cd0aed7b2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastat-0.1.1-py3-none-manylinux_2_17_x86_64.whl:

Publisher: publish.yml on petlukk/eastat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastat-0.1.1-py3-none-manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for eastat-0.1.1-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 876d76c212f24c8e45f0317d10ebf0cc544d994e9569083e7abf22e0aa81faf1
MD5 339f1ae952b24367dc90622bc263ac31
BLAKE2b-256 f72e09870dfd790a9806019fc3ed52475c796b642e7ebcd4b7fde101bdd35404

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastat-0.1.1-py3-none-manylinux_2_17_aarch64.whl:

Publisher: publish.yml on petlukk/eastat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page