Skip to main content

Public battery cell test data, harmonized and sealed in one schema (Parquet + JSON).

Project description

celljar

PyPI HuggingFace Python License

Public battery cell test data, harmonized and sealed in one schema (Parquet + JSON).

celljar reads raw files from 9 published sources - ORNL Leaf, HNEI Kollmeyer, MATR (Severson 2019), CLO (Attia 2020), BILLS eVTOL, MOHTAT 2021, NASA PCoE, SNL Preger, Naumann - and writes them to a canonical schema with four entities: cell_metadata + test_metadata (JSON), timeseries + cycle_summary (Parquet). Query all sources via one SQL statement (DuckDB / pandas / Polars).

Scope: harmonization only. celljar focuses on measurements - unit conversion and schema normalization. It deliberately leaves fitting and modeling to downstream tools that specialize in those steps.

Quick start

The full harmonized bundle lives at huggingface.co/datasets/mihnathul/celljar. Query it directly - no clone needed:

import duckdb
df = duckdb.sql("""
    SELECT * FROM 'https://huggingface.co/datasets/mihnathul/celljar/resolve/main/timeseries.parquet'
    WHERE test_id = 'ORNL_LEAF_2013_HPPC_25C'
""").df()

Pandas and Polars work the same way against the HuggingFace URL.

Browser viewer - clone the repo (a PyPI release is on the roadmap):

git clone https://github.com/mihnathul/celljar.git
cd celljar
pip install -e ".[viewer]"
streamlit run apps/viewer.py    # fetches from HuggingFace by default

Pin a release for reproducibility: CELLJAR_HF_REVISION=v0.2.1 streamlit run apps/viewer.py.

Regenerate locally from raw sources: same setup, then python examples/demo_end_to_end.py and CELLJAR_LOCAL=1 streamlit run apps/viewer.py.

Sources

Source Chemistry Cells Test types Raw data
ORNL Leaf 2013 mixed (LMO/NCA pouch) 1 HPPC × 3 temperatures bundled
HNEI (Kollmeyer) NCA (Panasonic NCR18650PF) 1 HPPC, drive cycle, capacity_check, cycle_aging download
MATR (Severson 2019) LFP (A123 18650) 119 Cycling-to-failure download
CLO (Attia 2020) LFP (A123 18650) 45 Cycling, BO-optimized fast-charge download
BILLS / eVTOL (Bills 2023) NMC (Sony US18650VTC6) 22 Drive cycle (flight-duty) + RPTs download
MOHTAT (Mohtat 2021) NMC (UMich NMC532 pouch) 31 Cycle aging + synchronous expansion download
NASA PCoE LCO (vendor undisclosed, 2.0 Ah 18650) 34 Cycle aging download
SNL Preger 2020 LFP / NMC / NCA grid (18650) 87 Cycle aging across T × DoD × C-rate download
Naumann 2018/2020 LFP / graphite 17 calendar + 17 cycle Calendar + cycle aging (summary-only) download

Schema

Four entities joined by cell_id and test_id:

cell_metadata.json       hardware (chemistry, capacity, form factor)
test_metadata.json       protocol, SOH, provenance, license
timeseries.parquet       V / I / T per-sample + signed running coulomb count (∫I dt)
cycle_summary.parquet    per-cycle aggregates (capacity, R_DC, …) for aging studies

Conventions: SI units. Timestamps relative. Missing data is explicit null. Current is positive = charge (into the cell), negative = discharge.

Authoritative field list + types in schemas/ (JSON Schema). Pandera mirrors at runtime in celljar/harmonize/harmonize_schema.py.

Querying

-- Single test's timeseries
SELECT timestamp_s, voltage_V, current_A, temperature_C
FROM 'data/harmonized/timeseries.parquet'
WHERE test_id = 'ORNL_LEAF_2013_HPPC_25C'
ORDER BY timestamp_s;
-- Cross-source filter - same query works across all sources
SELECT cell_id, test_id, temperature_C_min
FROM 'data/harmonized/tests/*.json'
WHERE test_type = 'hppc' AND temperature_C_min = 25;

Same patterns from Python via duckdb.sql(...).df() or pl.read_parquet(..., filters=[...]).

Use cases

Parameterization · modeling · aging studies · cross-source analysis.

Out of scope: field/fleet telemetry; ML cycling-life prediction (use BatteryLife (KDD 2025) - 990 cells, 18 baselines). OCV/R0 extractors, ECM/SPM/DFN fitting, ML modeling all live in separate companion repos.

How this relates to other battery data tools

celljar tries to fit alongside, not replace, the other excellent tools in this space:

  • Battery Data Commons - registry indexing 300+ public battery datasets. Great for discovery; celljar complements it by providing a harmonized data layer for a subset of those sources.
  • Iontech (Shiyun Liu) - curated index of open-source battery monitoring & modeling datasets (RWTH home-storage, NREL failure databank, Stanford second-life, etc.) with paper links. Another good starting point for discovering datasets celljar hasn't yet harmonized.
  • BatteryLife / BatteryML - cycling-to-failure ML benchmark (KDD 2025). Optimized for lifetime-prediction ML; celljar keeps the full V/I/T timeseries that physics-based parameterization (ECM/SPM/DFN) needs.

Roadmap

  • More sources (CALCE, RWTH, HUST, Tongji, XJTU; Ecker 2015 + Chen 2020 for DFN parameterization)
  • PyPI release (pip install celljar)
  • SOH methodology iteration
  • BDF-export converter

Contributing

See CONTRIBUTING.md. Issues, ideas, and PRs welcome.

License & citation

The science here belongs to the original authors; celljar simply puts their data in one place with a shared schema. Please cite their papers when you use the data, and, if it's helpful, celljar alongside.

  • celljar code (this repository): MIT (LICENSE).
  • Harmonized bundle (packaging, schema, derived fields): CC-BY-4.0.
  • Upstream raw data retains each publisher's original license - see per-source provenance in data/raw/<source>/.

To make attribution easy, every test_metadata row carries its own source_doi, source_citation, source_license, and source_license_url. You can pull the references for any analysis with one query:

import duckdb
duckdb.sql("""
    SELECT DISTINCT source_doi, source_citation, source_license
    FROM 'data/harmonized/tests/*.json'
    WHERE test_id IN ('ORNL_LEAF_2013_HPPC_25C', 'HNEI_NCA_HPPC_25C')
""").df()

If you'd like to cite celljar:

@software{celljar,
  author = {Mihna Neerulpan},
  title  = {celljar: Public Battery Test Dataset Harmonization with a Canonical Schema},
  year   = {2026},
  url    = {https://github.com/mihnathul/celljar},
}

Acknowledgments

celljar exists because of the labs and authors who designed, ran, and openly published these experiments - work that took years of careful instrumentation and analysis. Thank you to:

Phillip Kollmeyer (HNEI) · G. Wiggins, S. Allu, H. Wang (ORNL) · K. Severson, P. Attia et al. (MATR, CLO; Stanford / MIT / TRI) · A. Bills et al. (BILLS; CMU) · P. Mohtat et al. (UMich) · B. Saha, K. Goebel (NASA PCoE) · Y. Preger et al. (Sandia) · M. Naumann et al. (TUM) · M. Ecker et al. (RWTH Aachen)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celljar-0.2.1.tar.gz (548.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celljar-0.2.1-py3-none-any.whl (90.6 kB view details)

Uploaded Python 3

File details

Details for the file celljar-0.2.1.tar.gz.

File metadata

  • Download URL: celljar-0.2.1.tar.gz
  • Upload date:
  • Size: 548.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for celljar-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7813e401d974f99f1430c247aef1ab1b0fd5095be4fe0900837e822c314bd12c
MD5 e32a3a8f998c532592815b1bcc263966
BLAKE2b-256 f135514cdf1500d2df638f029f06ee6ea7e6654c1dccacd80736230489cde93d

See more details on using hashes here.

File details

Details for the file celljar-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: celljar-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 90.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for celljar-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eda907e63427c24b4edae88cd4ceff09bf2c4a6115952006179573008812dca8
MD5 5f632171086d76e50701b558af632ac5
BLAKE2b-256 c203f84473cd39c53c572bf70c726d410a53f881cc58ae37786039504c23ac59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page