Skip to main content

File to Analysis — Automatically perform descriptive statistical analysis and visualization from any data source

Project description

f2a - File to Analysis

One line of code to analyze any data file. Automatic descriptive statistics, visualizations, and interactive HTML reports from 24+ file formats and HuggingFace datasets.

PyPI Python License: MIT


Installation

pip install f2a

All formats (HuggingFace, Excel, Parquet, SPSS, DuckDB, etc.) are supported out of the box. No extras needed.


Quick Start

import f2a

# Local file
report = f2a.analyze("data/sales.csv")
report.show()                    # Print summary to console
report.to_html("output/")       # Save interactive HTML report

# HuggingFace dataset (multiple input styles)
report = f2a.analyze("https://huggingface.co/datasets/imdb")
report = f2a.analyze("hf://imdb")
report = f2a.analyze("imdb")    # org/dataset pattern auto-detected

# Access results programmatically
report.stats.summary             # Summary statistics (DataFrame)
report.stats.correlation_matrix  # Correlation matrix
report.schema.columns            # Column type information
report.to_dict()                 # Everything as a dictionary

Multi-Subset HuggingFace Datasets

Datasets with multiple subsets (configs) and splits are automatically discovered and analyzed individually.

# Auto-discover all subsets x splits
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard")
print(f"Total: {report.shape[0]} rows across {len(report.subsets)} subsets")

for s in report.subsets:
    print(f"  {s.subset}/{s.split}: {s.shape[0]} rows x {s.shape[1]} cols")

# Load specific subset via URL path
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard/viewer/agent")

# Or via explicit parameters
report = f2a.analyze("FINAL-Bench/ALL-Bench-Leaderboard", config="agent", split="train")

The HTML report includes tabbed navigation so each subset/split gets its own analysis page.


HTML Report

report.to_html() generates a self-contained HTML file with:

  • Overview cards - row count, column count, type breakdown, memory usage
  • Summary statistics table - horizontally scrollable with drag support and sticky first column
  • Visualizations - distribution histograms, boxplots, correlation heatmap, missing data matrix
  • Warnings - high correlation alerts, high missing ratio alerts
  • Tabbed UI for multi-subset datasets

Supported Formats (24+)

Category Formats
Delimited .csv .tsv .txt .dat .tab .fwf
JSON .json .jsonl .ndjson
Spreadsheet .xlsx .xls .xlsm .xlsb
OpenDocument .ods
Columnar .parquet .pq .feather .ftr .arrow .ipc .orc
HDF5 .hdf .hdf5 .h5
Statistical .dta (Stata) .sas7bdat .xpt (SAS) .sav .zsav (SPSS)
Database .sqlite .sqlite3 .db .duckdb
Pickle .pkl .pickle
Markup .xml .html .htm
HuggingFace hf:// / URL / org/dataset

Analysis Features

Feature Details
Descriptive Statistics Mean, median, std, min/max, quartiles, unique count, mode
Distribution Analysis Skewness, kurtosis, normality tests
Correlation Analysis Pearson, Spearman, multicollinearity warnings
Missing Data Per-column missing ratio, overall missing alerts
Type Inference Auto-detect numeric, categorical, text, datetime, boolean
Visualization Histograms, boxplots, correlation heatmaps, missing data matrix

API Reference

f2a.analyze(source, **kwargs)

Parameter Description
source File path, URL, or HuggingFace dataset identifier
config HuggingFace dataset config/subset name (optional)
split HuggingFace dataset split name (optional)
**kwargs Additional arguments passed to the data loader

Returns: AnalysisReport

AnalysisReport

Attribute / Method Description
.shape (rows, columns) tuple
.schema Column types and metadata
.stats Statistical analysis results
.viz Visualization access
.subsets List of SubsetReport (multi-subset HF datasets)
.warnings List of warning messages
.show() Print summary to console
.to_html(output_dir) Save interactive HTML report
.to_dict() Export all results as dictionary

License

MIT License - See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f2a-0.1.3.tar.gz (417.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

f2a-0.1.3-py3-none-any.whl (144.7 kB view details)

Uploaded Python 3

File details

Details for the file f2a-0.1.3.tar.gz.

File metadata

  • Download URL: f2a-0.1.3.tar.gz
  • Upload date:
  • Size: 417.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8f3ba1a36b734d7a676974bbe0d8276e341deb8f7a0e7d491923296c206c2a15
MD5 cabb78dba3f255733288f498a3afc457
BLAKE2b-256 cf539a18576b6d0ad57a2a701ab4517889f8be0828d8b3d247e82edf50038147

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.3.tar.gz:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file f2a-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: f2a-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 144.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e9844511f32892de922f9ee632cc000327de5452fe3c8fd8924d041109f70887
MD5 8b28e66920b27d53a1932af49efa7994
BLAKE2b-256 a4870abb7abc8262a37233b2e6f48c4213c3c6b052fafa02989b333e179dcd96

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.3-py3-none-any.whl:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page