File to Analysis — Automatically perform descriptive statistical analysis and visualization from any data source
Project description
f2a - File to Analysis
One line of code to analyze any data file. Automatic descriptive statistics, visualizations, and interactive HTML reports from 24+ file formats and HuggingFace datasets.
Installation
pip install f2a
All formats (HuggingFace, Excel, Parquet, SPSS, DuckDB, etc.) are supported out of the box. No extras needed.
Quick Start
import f2a
# Local file
report = f2a.analyze("data/sales.csv")
report.show() # Print summary to console
report.to_html("output/") # Save interactive HTML report
# HuggingFace dataset (multiple input styles)
report = f2a.analyze("https://huggingface.co/datasets/imdb")
report = f2a.analyze("hf://imdb")
report = f2a.analyze("imdb") # org/dataset pattern auto-detected
# Access results programmatically
report.stats.summary # Summary statistics (DataFrame)
report.stats.correlation_matrix # Correlation matrix
report.schema.columns # Column type information
report.to_dict() # Everything as a dictionary
Multi-Subset HuggingFace Datasets
Datasets with multiple subsets (configs) and splits are automatically discovered and analyzed individually.
# Auto-discover all subsets x splits
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard")
print(f"Total: {report.shape[0]} rows across {len(report.subsets)} subsets")
for s in report.subsets:
print(f" {s.subset}/{s.split}: {s.shape[0]} rows x {s.shape[1]} cols")
# Load specific subset via URL path
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard/viewer/agent")
# Or via explicit parameters
report = f2a.analyze("FINAL-Bench/ALL-Bench-Leaderboard", config="agent", split="train")
The HTML report includes tabbed navigation so each subset/split gets its own analysis page.
HTML Report
report.to_html() generates a self-contained HTML file with:
- Overview cards - row count, column count, type breakdown, memory usage
- Summary statistics table - horizontally scrollable with drag support and sticky first column
- Visualizations - distribution histograms, boxplots, correlation heatmap, missing data matrix
- Warnings - high correlation alerts, high missing ratio alerts
- Tabbed UI for multi-subset datasets
Supported Formats (24+)
| Category | Formats |
|---|---|
| Delimited | .csv .tsv .txt .dat .tab .fwf |
| JSON | .json .jsonl .ndjson |
| Spreadsheet | .xlsx .xls .xlsm .xlsb |
| OpenDocument | .ods |
| Columnar | .parquet .pq .feather .ftr .arrow .ipc .orc |
| HDF5 | .hdf .hdf5 .h5 |
| Statistical | .dta (Stata) .sas7bdat .xpt (SAS) .sav .zsav (SPSS) |
| Database | .sqlite .sqlite3 .db .duckdb |
| Pickle | .pkl .pickle |
| Markup | .xml .html .htm |
| HuggingFace | hf:// / URL / org/dataset |
Analysis Features
| Feature | Details |
|---|---|
| Descriptive Statistics | Mean, median, std, min/max, quartiles, unique count, mode |
| Distribution Analysis | Skewness, kurtosis, normality tests |
| Correlation Analysis | Pearson, Spearman, multicollinearity warnings |
| Missing Data | Per-column missing ratio, overall missing alerts |
| Type Inference | Auto-detect numeric, categorical, text, datetime, boolean |
| Visualization | Histograms, boxplots, correlation heatmaps, missing data matrix |
API Reference
f2a.analyze(source, **kwargs)
| Parameter | Description |
|---|---|
source |
File path, URL, or HuggingFace dataset identifier |
config |
HuggingFace dataset config/subset name (optional) |
split |
HuggingFace dataset split name (optional) |
**kwargs |
Additional arguments passed to the data loader |
Returns: AnalysisReport
AnalysisReport
| Attribute / Method | Description |
|---|---|
.shape |
(rows, columns) tuple |
.schema |
Column types and metadata |
.stats |
Statistical analysis results |
.viz |
Visualization access |
.subsets |
List of SubsetReport (multi-subset HF datasets) |
.warnings |
List of warning messages |
.show() |
Print summary to console |
.to_html(output_dir) |
Save interactive HTML report |
.to_dict() |
Export all results as dictionary |
License
MIT License - See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file f2a-0.1.2.tar.gz.
File metadata
- Download URL: f2a-0.1.2.tar.gz
- Upload date:
- Size: 332.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82213bf520e1126fc8edfe65a1e4ec4a8676be83b40be7dca3da620db0c6f280
|
|
| MD5 |
d79305cb76f2d75c6685091c2a90deee
|
|
| BLAKE2b-256 |
bedd50a4a93d16a29f8abb265de9b6c7c9fa3e73146f3a78dce36601d225f306
|
Provenance
The following attestation bundles were made for f2a-0.1.2.tar.gz:
Publisher:
publish.yml on CocoRoF/f2a
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
f2a-0.1.2.tar.gz -
Subject digest:
82213bf520e1126fc8edfe65a1e4ec4a8676be83b40be7dca3da620db0c6f280 - Sigstore transparency entry: 1108593687
- Sigstore integration time:
-
Permalink:
CocoRoF/f2a@4ffdd26720de867677e7cf75775c976e6444bc52 -
Branch / Tag:
refs/heads/deploy - Owner: https://github.com/CocoRoF
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4ffdd26720de867677e7cf75775c976e6444bc52 -
Trigger Event:
push
-
Statement type:
File details
Details for the file f2a-0.1.2-py3-none-any.whl.
File metadata
- Download URL: f2a-0.1.2-py3-none-any.whl
- Upload date:
- Size: 66.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11b3ba2d86f00c7125a976d75ba9c8498df63cd18373e6babf43a99fed253cfb
|
|
| MD5 |
54a187e6d079901e8266dc856ca3f879
|
|
| BLAKE2b-256 |
81ab021f7c41b1819ec0dcd403280ef811c4f1cb7062fd873fae2f61c2e3a5ec
|
Provenance
The following attestation bundles were made for f2a-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on CocoRoF/f2a
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
f2a-0.1.2-py3-none-any.whl -
Subject digest:
11b3ba2d86f00c7125a976d75ba9c8498df63cd18373e6babf43a99fed253cfb - Sigstore transparency entry: 1108593689
- Sigstore integration time:
-
Permalink:
CocoRoF/f2a@4ffdd26720de867677e7cf75775c976e6444bc52 -
Branch / Tag:
refs/heads/deploy - Owner: https://github.com/CocoRoF
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4ffdd26720de867677e7cf75775c976e6444bc52 -
Trigger Event:
push
-
Statement type: