Skip to main content

CSV column statistics powered by Ea SIMD kernels

Project description

eastat

CSV column statistics powered by Ea SIMD kernels.

Computes count, mean, stddev, min, max, p25, p50, p75 for every numeric column. String columns get length statistics.

Install

pip install eastat

Pre-built wheels include compiled SIMD kernels for Linux x86_64, Linux aarch64, and Windows x86_64. No compiler needed.

Usage

eastat data.csv
eastat --json data.csv
eastat -d '\t' data.tsv
eastat -c 0,2,4 data.csv
eastat --no-quotes data.csv   # force fast scan (skip quote detection)
eastat --quoted data.csv      # force quote-aware scan

Or from Python:

from eastat import process

results, headers, n_rows, col_count, timings = process("data.csv")

How it works

Four Ea kernels form a zero-copy pipeline over a memory-mapped file:

Kernel What it does
csv_scan AVX2 structural scanner — finds delimiter and newline positions using u8x32 comparison + movemask. Two modes: fast (no quotes) and quote-aware. Includes count_positions_quoted for two-pass large-file strategy.
csv_layout Builds row boundary arrays and per-row delimiter index via merge-scan. O(n_delims + n_rows).
csv_parse Batch ASCII-to-float parser with whitespace/quote trimming. Field length stats for string columns.
csv_stats f32x8 dual-accumulator FMA reduction for sum, min, max, sum-of-squares in one pass. SIMD binary-search percentiles (p25/p50/p75).

Scan modes

eastat auto-detects whether the CSV contains quoted fields by sampling the first 4 KB:

  • Fast scan — no quote handling. SIMD chunk-skip via movemask. Best throughput.
  • Quoted scan — tracks quote state to ignore delimiters/newlines inside quoted fields.

For large files (>128 MB), a two-pass strategy avoids over-allocation: count_positions_quoted counts positions first, then exact-sized buffers are allocated for the SIMD scan pass.

Override with --no-quotes or --quoted.

Building from source

Only needed if there's no pre-built wheel for your platform.

# Install the Ea compiler
# See https://github.com/petlukk/eacompute/releases

# Compile kernels
EA_BIN=./ea ./build_kernels.sh

# Install
pip install -e .

Requirements

  • Python 3.9+
  • NumPy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eastat-0.1.0-py3-none-win_amd64.whl (18.3 kB view details)

Uploaded Python 3Windows x86-64

eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl (22.9 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl (22.0 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

File details

Details for the file eastat-0.1.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: eastat-0.1.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eastat-0.1.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e8f1625379bee389d354fa9a7fac02c870128b0879c7421e1fe8ac1994a5ef15
MD5 98d68bd5d4d93cbe8ff82bb53e099ade
BLAKE2b-256 b09bc6746739e00083cacc1b5ac3dfb65184f74c37f496a84a77ff6be00fbc04

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastat-0.1.0-py3-none-win_amd64.whl:

Publisher: publish.yml on petlukk/eastat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 d9a5d5e71c8705c9da9b571556daa06f0268b9e990720719ac4cfee5cf62fb30
MD5 0396a4272d843f548e24df9cabbf9b8a
BLAKE2b-256 d09da965e428b3e16f0e90cb83c14ea166ffb5512715310930815ed9074f1edb

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl:

Publisher: publish.yml on petlukk/eastat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 8305c182836c3adec202928091a444a771ea3bbebfae83918bc77465aab829c4
MD5 900f85397eb86762be75c8874d2e1816
BLAKE2b-256 65cc45ec8babd62564f53510577ef64ed2fc7c27a2faa9998d595432e6356c1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl:

Publisher: publish.yml on petlukk/eastat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page