Public battery cell test data, harmonized and sealed in one schema (Parquet + JSON).
Project description
celljar
Public battery cell test data, harmonized and sealed in one schema (Parquet + JSON).
celljar reads raw files from 9 published sources - ORNL Leaf, HNEI Kollmeyer, MATR (Severson 2019), CLO (Attia 2020), BILLS eVTOL, MOHTAT 2021, NASA PCoE, SNL Preger, Naumann - and writes them to a canonical schema with four entities: cell_metadata + test_metadata (JSON), timeseries + cycle_summary (Parquet). Query all sources via one SQL statement (DuckDB / pandas / Polars).
Scope: harmonization only. celljar focuses on measurements - unit conversion and schema normalization. It deliberately leaves fitting and modeling to downstream tools that specialize in those steps.
Quick start
The full harmonized bundle lives at huggingface.co/datasets/mihnathul/celljar. Query it directly - no clone needed:
import duckdb
df = duckdb.sql("""
SELECT * FROM 'https://huggingface.co/datasets/mihnathul/celljar/resolve/main/timeseries.parquet'
WHERE test_id = 'ORNL_LEAF_2013_HPPC_25C'
""").df()
Pandas and Polars work the same way against the HuggingFace URL.
Browser viewer - clone the repo (a PyPI release is on the roadmap):
git clone https://github.com/mihnathul/celljar.git
cd celljar
pip install -e ".[viewer]"
streamlit run apps/viewer.py # fetches from HuggingFace by default
Pin a release for reproducibility: CELLJAR_HF_REVISION=v0.2.1 streamlit run apps/viewer.py.
Regenerate locally from raw sources: same setup, then python examples/demo_end_to_end.py and CELLJAR_LOCAL=1 streamlit run apps/viewer.py.
Sources
| Source | Chemistry | Cells | Test types | Raw data |
|---|---|---|---|---|
| ORNL Leaf 2013 | mixed (LMO/NCA pouch) | 1 | HPPC × 3 temperatures | bundled |
| HNEI (Kollmeyer) | NCA (Panasonic NCR18650PF) | 1 | HPPC, drive cycle, capacity_check, cycle_aging | download |
| MATR (Severson 2019) | LFP (A123 18650) | 119 | Cycling-to-failure | download |
| CLO (Attia 2020) | LFP (A123 18650) | 45 | Cycling, BO-optimized fast-charge | download |
| BILLS / eVTOL (Bills 2023) | NMC (Sony US18650VTC6) | 22 | Drive cycle (flight-duty) + RPTs | download |
| MOHTAT (Mohtat 2021) | NMC (UMich NMC532 pouch) | 31 | Cycle aging + synchronous expansion | download |
| NASA PCoE | LCO (vendor undisclosed, 2.0 Ah 18650) | 34 | Cycle aging | download |
| SNL Preger 2020 | LFP / NMC / NCA grid (18650) | 87 | Cycle aging across T × DoD × C-rate | download |
| Naumann 2018/2020 | LFP / graphite | 17 calendar + 17 cycle | Calendar + cycle aging (summary-only) | download |
Schema
Four entities joined by cell_id and test_id:
cell_metadata.json hardware (chemistry, capacity, form factor)
test_metadata.json protocol, SOH, provenance, license
timeseries.parquet V / I / T per-sample + signed running coulomb count (∫I dt)
cycle_summary.parquet per-cycle aggregates (capacity, R_DC, …) for aging studies
Conventions: SI units. Timestamps relative. Missing data is explicit null. Current is positive = charge (into the cell), negative = discharge.
Authoritative field list + types in schemas/ (JSON Schema). Pandera mirrors at runtime in celljar/harmonize/harmonize_schema.py.
Querying
-- Single test's timeseries
SELECT timestamp_s, voltage_V, current_A, temperature_C
FROM 'data/harmonized/timeseries.parquet'
WHERE test_id = 'ORNL_LEAF_2013_HPPC_25C'
ORDER BY timestamp_s;
-- Cross-source filter - same query works across all sources
SELECT cell_id, test_id, temperature_C_min
FROM 'data/harmonized/tests/*.json'
WHERE test_type = 'hppc' AND temperature_C_min = 25;
Same patterns from Python via duckdb.sql(...).df() or pl.read_parquet(..., filters=[...]).
Use cases
Parameterization · modeling · aging studies · cross-source analysis.
Out of scope: field/fleet telemetry; ML cycling-life prediction (use BatteryLife (KDD 2025) - 990 cells, 18 baselines). OCV/R0 extractors, ECM/SPM/DFN fitting, ML modeling all live in separate companion repos.
How this relates to other battery data tools
celljar tries to fit alongside, not replace, the other excellent tools in this space:
- Battery Data Commons - registry indexing 300+ public battery datasets. Great for discovery; celljar complements it by providing a harmonized data layer for a subset of those sources.
- Iontech (Shiyun Liu) - curated index of open-source battery monitoring & modeling datasets (RWTH home-storage, NREL failure databank, Stanford second-life, etc.) with paper links. Another good starting point for discovering datasets celljar hasn't yet harmonized.
- BatteryLife / BatteryML - cycling-to-failure ML benchmark (KDD 2025). Optimized for lifetime-prediction ML; celljar keeps the full V/I/T timeseries that physics-based parameterization (ECM/SPM/DFN) needs.
Roadmap
- More sources (CALCE, RWTH, HUST, Tongji, XJTU; Ecker 2015 + Chen 2020 for DFN parameterization)
- PyPI release (
pip install celljar) - SOH methodology iteration
- BDF-export converter
Contributing
See CONTRIBUTING.md. Issues, ideas, and PRs welcome.
License & citation
The science here belongs to the original authors; celljar simply puts their data in one place with a shared schema. Please cite their papers when you use the data, and, if it's helpful, celljar alongside.
- celljar code (this repository): MIT (
LICENSE). - Harmonized bundle (packaging, schema, derived fields): CC-BY-4.0.
- Upstream raw data retains each publisher's original license - see per-source provenance in
data/raw/<source>/.
To make attribution easy, every test_metadata row carries its own source_doi, source_citation, source_license, and source_license_url. You can pull the references for any analysis with one query:
import duckdb
duckdb.sql("""
SELECT DISTINCT source_doi, source_citation, source_license
FROM 'data/harmonized/tests/*.json'
WHERE test_id IN ('ORNL_LEAF_2013_HPPC_25C', 'HNEI_NCA_HPPC_25C')
""").df()
If you'd like to cite celljar:
@software{celljar,
author = {Mihna Neerulpan},
title = {celljar: Public Battery Test Dataset Harmonization with a Canonical Schema},
year = {2026},
url = {https://github.com/mihnathul/celljar},
}
Acknowledgments
celljar exists because of the labs and authors who designed, ran, and openly published these experiments - work that took years of careful instrumentation and analysis. Thank you to:
Phillip Kollmeyer (HNEI) · G. Wiggins, S. Allu, H. Wang (ORNL) · K. Severson, P. Attia et al. (MATR, CLO; Stanford / MIT / TRI) · A. Bills et al. (BILLS; CMU) · P. Mohtat et al. (UMich) · B. Saha, K. Goebel (NASA PCoE) · Y. Preger et al. (Sandia) · M. Naumann et al. (TUM) · M. Ecker et al. (RWTH Aachen)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file celljar-0.2.1.tar.gz.
File metadata
- Download URL: celljar-0.2.1.tar.gz
- Upload date:
- Size: 548.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7813e401d974f99f1430c247aef1ab1b0fd5095be4fe0900837e822c314bd12c
|
|
| MD5 |
e32a3a8f998c532592815b1bcc263966
|
|
| BLAKE2b-256 |
f135514cdf1500d2df638f029f06ee6ea7e6654c1dccacd80736230489cde93d
|
File details
Details for the file celljar-0.2.1-py3-none-any.whl.
File metadata
- Download URL: celljar-0.2.1-py3-none-any.whl
- Upload date:
- Size: 90.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eda907e63427c24b4edae88cd4ceff09bf2c4a6115952006179573008812dca8
|
|
| MD5 |
5f632171086d76e50701b558af632ac5
|
|
| BLAKE2b-256 |
c203f84473cd39c53c572bf70c726d410a53f881cc58ae37786039504c23ac59
|