Skip to main content

File to Analysis — Automatically perform descriptive statistical analysis and visualization from any data source

Project description

f2a - File to Analysis

One line of code to analyze any data file. Automatic descriptive statistics, visualizations, and interactive HTML reports from 24+ file formats and HuggingFace datasets.

PyPI Python License: MIT


Installation

pip install f2a

All formats (HuggingFace, Excel, Parquet, SPSS, DuckDB, etc.) are supported out of the box. No extras needed.


Quick Start

import f2a

# Local file
report = f2a.analyze("data/sales.csv")
report.show()                    # Print summary to console
report.to_html("output/")       # Save interactive HTML report

# HuggingFace dataset (multiple input styles)
report = f2a.analyze("https://huggingface.co/datasets/imdb")
report = f2a.analyze("hf://imdb")
report = f2a.analyze("imdb")    # org/dataset pattern auto-detected

# Access results programmatically
report.stats.summary             # Summary statistics (DataFrame)
report.stats.correlation_matrix  # Correlation matrix
report.schema.columns            # Column type information
report.to_dict()                 # Everything as a dictionary

Multi-Subset HuggingFace Datasets

Datasets with multiple subsets (configs) and splits are automatically discovered and analyzed individually.

# Auto-discover all subsets x splits
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard")
print(f"Total: {report.shape[0]} rows across {len(report.subsets)} subsets")

for s in report.subsets:
    print(f"  {s.subset}/{s.split}: {s.shape[0]} rows x {s.shape[1]} cols")

# Load specific subset via URL path
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard/viewer/agent")

# Or via explicit parameters
report = f2a.analyze("FINAL-Bench/ALL-Bench-Leaderboard", config="agent", split="train")

The HTML report includes tabbed navigation so each subset/split gets its own analysis page.


HTML Report

report.to_html() generates a self-contained HTML file with:

  • Overview cards - row count, column count, type breakdown, memory usage
  • Summary statistics table - horizontally scrollable with drag support and sticky first column
  • Visualizations - distribution histograms, boxplots, correlation heatmap, missing data matrix
  • Warnings - high correlation alerts, high missing ratio alerts
  • Tabbed UI for multi-subset datasets

Supported Formats (24+)

Category Formats
Delimited .csv .tsv .txt .dat .tab .fwf
JSON .json .jsonl .ndjson
Spreadsheet .xlsx .xls .xlsm .xlsb
OpenDocument .ods
Columnar .parquet .pq .feather .ftr .arrow .ipc .orc
HDF5 .hdf .hdf5 .h5
Statistical .dta (Stata) .sas7bdat .xpt (SAS) .sav .zsav (SPSS)
Database .sqlite .sqlite3 .db .duckdb
Pickle .pkl .pickle
Markup .xml .html .htm
HuggingFace hf:// / URL / org/dataset

Analysis Features

Feature Details
Descriptive Statistics Mean, median, std, min/max, quartiles, unique count, mode
Distribution Analysis Skewness, kurtosis, normality tests
Correlation Analysis Pearson, Spearman, multicollinearity warnings
Missing Data Per-column missing ratio, overall missing alerts
Type Inference Auto-detect numeric, categorical, text, datetime, boolean
Visualization Histograms, boxplots, correlation heatmaps, missing data matrix

API Reference

f2a.analyze(source, **kwargs)

Parameter Description
source File path, URL, or HuggingFace dataset identifier
config HuggingFace dataset config/subset name (optional)
split HuggingFace dataset split name (optional)
**kwargs Additional arguments passed to the data loader

Returns: AnalysisReport

AnalysisReport

Attribute / Method Description
.shape (rows, columns) tuple
.schema Column types and metadata
.stats Statistical analysis results
.viz Visualization access
.subsets List of SubsetReport (multi-subset HF datasets)
.warnings List of warning messages
.show() Print summary to console
.to_html(output_dir) Save interactive HTML report
.to_dict() Export all results as dictionary

License

MIT License - See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f2a-0.1.2.tar.gz (332.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

f2a-0.1.2-py3-none-any.whl (66.5 kB view details)

Uploaded Python 3

File details

Details for the file f2a-0.1.2.tar.gz.

File metadata

  • Download URL: f2a-0.1.2.tar.gz
  • Upload date:
  • Size: 332.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.2.tar.gz
Algorithm Hash digest
SHA256 82213bf520e1126fc8edfe65a1e4ec4a8676be83b40be7dca3da620db0c6f280
MD5 d79305cb76f2d75c6685091c2a90deee
BLAKE2b-256 bedd50a4a93d16a29f8abb265de9b6c7c9fa3e73146f3a78dce36601d225f306

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.2.tar.gz:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file f2a-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: f2a-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 66.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11b3ba2d86f00c7125a976d75ba9c8498df63cd18373e6babf43a99fed253cfb
MD5 54a187e6d079901e8266dc856ca3f879
BLAKE2b-256 81ab021f7c41b1819ec0dcd403280ef811c4f1cb7062fd873fae2f61c2e3a5ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.2-py3-none-any.whl:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page