Skip to main content

File to Analysis — Automatically perform descriptive statistical analysis and visualization from any data source

Project description

f2a - File to Analysis

One line of code to analyze any data file. Automatic descriptive statistics, visualizations, and interactive HTML reports from 24+ file formats and HuggingFace datasets.

PyPI Python License: MIT


Installation

pip install f2a

All formats (HuggingFace, Excel, Parquet, SPSS, DuckDB, etc.) are supported out of the box. No extras needed.


Quick Start

import f2a

# Local file
report = f2a.analyze("data/sales.csv")
report.show()                    # Print summary to console
report.to_html("output/")       # Save interactive HTML report

# HuggingFace dataset (multiple input styles)
report = f2a.analyze("https://huggingface.co/datasets/imdb")
report = f2a.analyze("hf://imdb")
report = f2a.analyze("imdb")    # org/dataset pattern auto-detected

# Access results programmatically
report.stats.summary             # Summary statistics (DataFrame)
report.stats.correlation_matrix  # Correlation matrix
report.schema.columns            # Column type information
report.to_dict()                 # Everything as a dictionary

Multi-Subset HuggingFace Datasets

Datasets with multiple subsets (configs) and splits are automatically discovered and analyzed individually.

# Auto-discover all subsets x splits
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard")
print(f"Total: {report.shape[0]} rows across {len(report.subsets)} subsets")

for s in report.subsets:
    print(f"  {s.subset}/{s.split}: {s.shape[0]} rows x {s.shape[1]} cols")

# Load specific subset via URL path
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard/viewer/agent")

# Or via explicit parameters
report = f2a.analyze("FINAL-Bench/ALL-Bench-Leaderboard", config="agent", split="train")

The HTML report includes tabbed navigation so each subset/split gets its own analysis page.


HTML Report

report.to_html() generates a self-contained HTML file with:

  • Overview cards - row count, column count, type breakdown, memory usage
  • Summary statistics table - horizontally scrollable with drag support and sticky first column
  • Visualizations - distribution histograms, boxplots, correlation heatmap, missing data matrix
  • Warnings - high correlation alerts, high missing ratio alerts
  • Tabbed UI for multi-subset datasets

Supported Formats (24+)

Category Formats
Delimited .csv .tsv .txt .dat .tab .fwf
JSON .json .jsonl .ndjson
Spreadsheet .xlsx .xls .xlsm .xlsb
OpenDocument .ods
Columnar .parquet .pq .feather .ftr .arrow .ipc .orc
HDF5 .hdf .hdf5 .h5
Statistical .dta (Stata) .sas7bdat .xpt (SAS) .sav .zsav (SPSS)
Database .sqlite .sqlite3 .db .duckdb
Pickle .pkl .pickle
Markup .xml .html .htm
HuggingFace hf:// / URL / org/dataset

Analysis Features

Feature Details
Descriptive Statistics Mean, median, std, min/max, quartiles, unique count, mode
Distribution Analysis Skewness, kurtosis, normality tests
Correlation Analysis Pearson, Spearman, multicollinearity warnings
Missing Data Per-column missing ratio, overall missing alerts
Type Inference Auto-detect numeric, categorical, text, datetime, boolean
Visualization Histograms, boxplots, correlation heatmaps, missing data matrix

API Reference

f2a.analyze(source, **kwargs)

Parameter Description
source File path, URL, or HuggingFace dataset identifier
config HuggingFace dataset config/subset name (optional)
split HuggingFace dataset split name (optional)
**kwargs Additional arguments passed to the data loader

Returns: AnalysisReport

AnalysisReport

Attribute / Method Description
.shape (rows, columns) tuple
.schema Column types and metadata
.stats Statistical analysis results
.viz Visualization access
.subsets List of SubsetReport (multi-subset HF datasets)
.warnings List of warning messages
.show() Print summary to console
.to_html(output_dir) Save interactive HTML report
.to_dict() Export all results as dictionary

License

MIT License - See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f2a-0.1.1.tar.gz (304.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

f2a-0.1.1-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file f2a-0.1.1.tar.gz.

File metadata

  • Download URL: f2a-0.1.1.tar.gz
  • Upload date:
  • Size: 304.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a284c6e38126926bcaa6cc0e9751b4a9b7837a834cdc4cd72d794fce2b5fce32
MD5 144a01f3167557731e1a09b431c9d716
BLAKE2b-256 24e801c4a1b19ba6a4feb0c447d4f77f923869572c19be3635eebf9e84f626a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.1.tar.gz:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file f2a-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: f2a-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba12fae30134962890c191152d1832d9e53128cf2e0516299c3f4e58d837b762
MD5 643e1ce2c00c171c6ef2c233a5a008ce
BLAKE2b-256 1692580cfc3acc519131896e261d2f90e52de37837e5f5b6bbf0fb1383ac72b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.1-py3-none-any.whl:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page