CSV column statistics powered by Ea SIMD kernels
Project description
eastat
CSV column statistics powered by Ea SIMD kernels.
Computes count, mean, stddev, min, max, p25, p50, p75 for every numeric column. String columns get length statistics.
Install
pip install eastat
Pre-built wheels include compiled SIMD kernels for Linux x86_64, Linux aarch64, and Windows x86_64. No compiler needed.
Usage
eastat data.csv
eastat --json data.csv
eastat -d '\t' data.tsv
eastat -c 0,2,4 data.csv
eastat --no-quotes data.csv # force fast scan (skip quote detection)
eastat --quoted data.csv # force quote-aware scan
Or from Python:
from eastat import process
results, headers, n_rows, col_count, timings = process("data.csv")
How it works
Four Ea kernels form a zero-copy pipeline over a memory-mapped file:
| Kernel | What it does |
|---|---|
csv_scan |
AVX2 structural scanner — finds delimiter and newline positions using u8x32 comparison + movemask. Two modes: fast (no quotes) and quote-aware. Includes count_positions_quoted for two-pass large-file strategy. |
csv_layout |
Builds row boundary arrays and per-row delimiter index via merge-scan. O(n_delims + n_rows). |
csv_parse |
Batch ASCII-to-float parser with whitespace/quote trimming. Field length stats for string columns. |
csv_stats |
f32x8 dual-accumulator FMA reduction for sum, min, max, sum-of-squares in one pass. SIMD binary-search percentiles (p25/p50/p75). |
Scan modes
eastat auto-detects whether the CSV contains quoted fields by sampling the first 4 KB:
- Fast scan — no quote handling. SIMD chunk-skip via
movemask. Best throughput. - Quoted scan — tracks quote state to ignore delimiters/newlines inside quoted fields.
For large files (>128 MB), a two-pass strategy avoids over-allocation: count_positions_quoted counts positions first, then exact-sized buffers are allocated for the SIMD scan pass.
Override with --no-quotes or --quoted.
Building from source
Only needed if there's no pre-built wheel for your platform.
# Install the Ea compiler
# See https://github.com/petlukk/eacompute/releases
# Compile kernels
EA_BIN=./ea ./build_kernels.sh
# Install
pip install -e .
Requirements
- Python 3.9+
- NumPy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eastat-0.1.0-py3-none-win_amd64.whl.
File metadata
- Download URL: eastat-0.1.0-py3-none-win_amd64.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8f1625379bee389d354fa9a7fac02c870128b0879c7421e1fe8ac1994a5ef15
|
|
| MD5 |
98d68bd5d4d93cbe8ff82bb53e099ade
|
|
| BLAKE2b-256 |
b09bc6746739e00083cacc1b5ac3dfb65184f74c37f496a84a77ff6be00fbc04
|
Provenance
The following attestation bundles were made for eastat-0.1.0-py3-none-win_amd64.whl:
Publisher:
publish.yml on petlukk/eastat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eastat-0.1.0-py3-none-win_amd64.whl -
Subject digest:
e8f1625379bee389d354fa9a7fac02c870128b0879c7421e1fe8ac1994a5ef15 - Sigstore transparency entry: 1008539288
- Sigstore integration time:
-
Permalink:
petlukk/eastat@aa496115d743cfe52089b74d05cc97c91b3e396d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/petlukk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aa496115d743cfe52089b74d05cc97c91b3e396d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl.
File metadata
- Download URL: eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9a5d5e71c8705c9da9b571556daa06f0268b9e990720719ac4cfee5cf62fb30
|
|
| MD5 |
0396a4272d843f548e24df9cabbf9b8a
|
|
| BLAKE2b-256 |
d09da965e428b3e16f0e90cb83c14ea166ffb5512715310930815ed9074f1edb
|
Provenance
The following attestation bundles were made for eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl:
Publisher:
publish.yml on petlukk/eastat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eastat-0.1.0-py3-none-manylinux_2_17_x86_64.whl -
Subject digest:
d9a5d5e71c8705c9da9b571556daa06f0268b9e990720719ac4cfee5cf62fb30 - Sigstore transparency entry: 1008539284
- Sigstore integration time:
-
Permalink:
petlukk/eastat@aa496115d743cfe52089b74d05cc97c91b3e396d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/petlukk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aa496115d743cfe52089b74d05cc97c91b3e396d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl.
File metadata
- Download URL: eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8305c182836c3adec202928091a444a771ea3bbebfae83918bc77465aab829c4
|
|
| MD5 |
900f85397eb86762be75c8874d2e1816
|
|
| BLAKE2b-256 |
65cc45ec8babd62564f53510577ef64ed2fc7c27a2faa9998d595432e6356c1d
|
Provenance
The following attestation bundles were made for eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl:
Publisher:
publish.yml on petlukk/eastat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eastat-0.1.0-py3-none-manylinux_2_17_aarch64.whl -
Subject digest:
8305c182836c3adec202928091a444a771ea3bbebfae83918bc77465aab829c4 - Sigstore transparency entry: 1008539294
- Sigstore integration time:
-
Permalink:
petlukk/eastat@aa496115d743cfe52089b74d05cc97c91b3e396d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/petlukk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aa496115d743cfe52089b74d05cc97c91b3e396d -
Trigger Event:
workflow_dispatch
-
Statement type: