Skip to main content

File to Analysis — Automatically perform descriptive statistical analysis and visualization from any data source

Project description

f2a - File to Analysis

One line of code to analyze any data file. Automatic descriptive statistics, visualizations, and interactive HTML reports from 24+ file formats and HuggingFace datasets.

PyPI Python License: MIT


Installation

pip install f2a

All formats (HuggingFace, Excel, Parquet, SPSS, DuckDB, etc.) are supported out of the box. No extras needed.


Quick Start

import f2a

# Local file
report = f2a.analyze("data/sales.csv")
report.show()                    # Print summary to console
report.to_html("output/")       # Save interactive HTML report

# HuggingFace dataset (multiple input styles)
report = f2a.analyze("https://huggingface.co/datasets/imdb")
report = f2a.analyze("hf://imdb")
report = f2a.analyze("imdb")    # org/dataset pattern auto-detected

# Access results programmatically
report.stats.summary             # Summary statistics (DataFrame)
report.stats.correlation_matrix  # Correlation matrix
report.schema.columns            # Column type information
report.to_dict()                 # Everything as a dictionary

Multi-Subset HuggingFace Datasets

Datasets with multiple subsets (configs) and splits are automatically discovered and analyzed individually.

# Auto-discover all subsets x splits
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard")
print(f"Total: {report.shape[0]} rows across {len(report.subsets)} subsets")

for s in report.subsets:
    print(f"  {s.subset}/{s.split}: {s.shape[0]} rows x {s.shape[1]} cols")

# Load specific subset via URL path
report = f2a.analyze("https://huggingface.co/datasets/FINAL-Bench/ALL-Bench-Leaderboard/viewer/agent")

# Or via explicit parameters
report = f2a.analyze("FINAL-Bench/ALL-Bench-Leaderboard", config="agent", split="train")

The HTML report includes tabbed navigation so each subset/split gets its own analysis page.


HTML Report

report.to_html() generates a self-contained HTML file with:

  • Overview cards - row count, column count, type breakdown, memory usage
  • Summary statistics table - horizontally scrollable with drag support and sticky first column
  • Visualizations - distribution histograms, boxplots, correlation heatmap, missing data matrix
  • Warnings - high correlation alerts, high missing ratio alerts
  • Tabbed UI for multi-subset datasets

Supported Formats (24+)

Category Formats
Delimited .csv .tsv .txt .dat .tab .fwf
JSON .json .jsonl .ndjson
Spreadsheet .xlsx .xls .xlsm .xlsb
OpenDocument .ods
Columnar .parquet .pq .feather .ftr .arrow .ipc .orc
HDF5 .hdf .hdf5 .h5
Statistical .dta (Stata) .sas7bdat .xpt (SAS) .sav .zsav (SPSS)
Database .sqlite .sqlite3 .db .duckdb
Pickle .pkl .pickle
Markup .xml .html .htm
HuggingFace hf:// / URL / org/dataset

Analysis Features

Feature Details
Descriptive Statistics Mean, median, std, min/max, quartiles, unique count, mode
Distribution Analysis Skewness, kurtosis, normality tests
Correlation Analysis Pearson, Spearman, multicollinearity warnings
Missing Data Per-column missing ratio, overall missing alerts
Type Inference Auto-detect numeric, categorical, text, datetime, boolean
Visualization Histograms, boxplots, correlation heatmaps, missing data matrix

API Reference

f2a.analyze(source, **kwargs)

Parameter Description
source File path, URL, or HuggingFace dataset identifier
config HuggingFace dataset config/subset name (optional)
split HuggingFace dataset split name (optional)
**kwargs Additional arguments passed to the data loader

Returns: AnalysisReport

AnalysisReport

Attribute / Method Description
.shape (rows, columns) tuple
.schema Column types and metadata
.stats Statistical analysis results
.viz Visualization access
.subsets List of SubsetReport (multi-subset HF datasets)
.warnings List of warning messages
.show() Print summary to console
.to_html(output_dir) Save interactive HTML report
.to_dict() Export all results as dictionary

License

MIT License - See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f2a-0.1.4.tar.gz (417.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

f2a-0.1.4-py3-none-any.whl (144.7 kB view details)

Uploaded Python 3

File details

Details for the file f2a-0.1.4.tar.gz.

File metadata

  • Download URL: f2a-0.1.4.tar.gz
  • Upload date:
  • Size: 417.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.4.tar.gz
Algorithm Hash digest
SHA256 35f781f5447682a2c4e044de6b4953a11f35289c33377b51a3b2942cc6d62197
MD5 aa3d6dd20dcaabd09620ab10bb5fc6c9
BLAKE2b-256 c2c0b238ff0a1f1350d58be0b887e22e57b71942e22063724cdabee05de78d45

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.4.tar.gz:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file f2a-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: f2a-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 144.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for f2a-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 0752a1c5b2b7c5bf2ca02408c817700a8e60e92900ca2b6305c6c3ddaf9bccc0
MD5 3fbd20e7aaf55a1c3d5f99ac79c72615
BLAKE2b-256 f35dbacafaa790f6d73f1a567972a5b2a2498d899201984817d3589b4e0fe0ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for f2a-0.1.4-py3-none-any.whl:

Publisher: publish.yml on CocoRoF/f2a

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page