Reader, converter, and CSV tooling for Campbell Scientific data files

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

spirrobe

These details have not been verified by PyPI

Project description

csiio

Python reader and converter for Campbell Scientific Inc data files either OOP or functional.

This package focuses on practical ingestion and conversion workflows for CSI formats (including TOA5, TOACI1, TOB1, TOB3, and CSIXML), with DataFrame-native handling for downstream analysis pipelines.

Citation: If you use csiio in published work and want to cite it, see the citation guidance in How To Cite.

Install

pip install -e .

Core Functionality

Read CSI files into pandas DataFrames.
Auto-detect file type during reads.
Convert between supported CSI formats.
Export single CSV outputs or time-window-split CSVs.
Use either CLI workflows or Python API workflows.

Changelog Policy

Release notes live in CHANGELOG.md.
Add user-visible changes under [Unreleased] in the section that fits best: Added, Changed, Fixed, CI/Build, or Docs.
On release, run python scripts/release_changelog.py 0.2.0 to move [Unreleased] into a dated version section, then tag the release.
Use python scripts/release_changelog.py 0.2.0 --dry-run to preview the edit without writing.

Python API Usage

from csiio import CSIDataFile, read_csi_files, convert_csi_file
import pandas as pd

# High-level object workflow
reader = CSIDataFile(["a.dat", "b.dat"])
df = reader.read()
normalized_meta = reader.meta
per_file_meta = reader.file_meta
csv_files = reader.to_csv("/tmp/out.csv", split_window="1h")

# Functional workflow
df2, meta = read_csi_files("/path/to/file.dat")
df_many, meta_many = read_csi_files(["/path/to/a.dat", "/path/to/b.dat"], max_workers=2)
converted = convert_csi_file("/path/to/in.dat", "/tmp/TOA5_out.dat", "TOA5")
split_outputs = convert_csi_file("/path/to/in.dat", "/tmp/TOA5_out.dat", "TOA5", split_window="1h")
split_outputs_limited = convert_csi_file(
    "/path/to/in.dat", "/tmp/TOA5_out.dat", "TOA5", split_window="1h", max_workers=2
)

# Initialize from an existing pandas DataFrame
frame = pd.DataFrame(
    {"air_temp (degC)": [12.1, 12.4, 12.8]},
    index=pd.date_range("2024-01-01 00:00:00", periods=3, freq="30min"),
)
from_df = CSIDataFile(data=frame)
csv_files = from_df.to_csv("/tmp/out.csv")
split_csv_files = from_df.to_csv("/tmp/out.csv", split_window="1h", max_workers=2)
converted_file = from_df.convert("/tmp/out.dat", "TOB3", max_workers=2)

CLI (command line) Usage

csiio --help

# Read and print DataFrame summary
csiio read /path/to/file.dat

# Read metadata only, useful to figure out contained columns
csiio read /path/to/file.dat --meta-only

# Stream CSV to stdout (good for shell pipelines)
csiio read /path/to/file.dat --as-csv

# Read many files with explicit worker limit
csiio read /path/to/a.dat /path/to/b.dat --max-workers 2

# Convert to another CSI format
csiio convert /path/to/in.dat --output-format TOB3 --output /tmp/TOB3_out.dat

# Convert to another CSI format and split by timewindow, useful for e.g. EC processing
csiio convert /path/to/in.dat --output-format TOB1 --split-window 1h --output /tmp/TOB1_out.dat

# Split conversion with explicit worker limit
csiio convert /path/to/in.dat --output-format TOB1 --split-window 1h --output /tmp/TOB1_out.dat --max-workers 2

# Export CSV and split by time window where time window is one of pandas known frequency strings found at https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects
csiio to-csv /path/to/in.dat --output /tmp/out.csv --split-window 1h

# Split CSV export with explicit worker limit
csiio to-csv /path/to/in.dat --output /tmp/out.csv --split-window 1h --max-workers 2

Typical Use Cases and Functionality

Supported File Formats

csiio is designed to work with the common Campbell Scientific formats where for most 4 header lines are available describing the logger (l1), the columnname (l2), the unit (l3) and the aggregation (l4):

Format	Description	Pro	Con
TOA5	ASCII file	Human-readable	file size
TOACI1	TOA5 with shorter header	Human-readable	file size
TOB1	Binary file with one timestamp per line	File size small	binary
TOB3	Binary file with 4 header lines with record frames to ensure integrity; native format of CSI loggers	Integrity	binary and complex
CSIXML	XML-based format	More descriptive	tree-structure not that useful for many data-related tasks

Main Use Cases

1) Read CSI Data to DataFrame

Use this when you need immediate pandas-based analysis from Python. The CLI shows a summary by default; use --as-csv if you want the data on stdout.

CLI summary:

csiio read /path/to/file.dat

CLI data stream:

csiio read /path/to/file.dat --as-csv

Python:

from csiio import read_csi_files

# read one file
data, meta = read_csi_files("/path/to/file.dat")

# read many files
data, meta = read_csi_files(["/path/to/file.dat", "/path/to/another/file.dat"])

Outcome:

Data is returned as a pandas DataFrame.
read_csi_files(...) returns per-file metadata alongside the DataFrame.
CSIDataFile.meta stores normalized metadata for the combined in-memory DataFrame.
CSIDataFile.file_meta stores per-file metadata as a dictionary keyed by filename.

2) Read Metadata Only

Use this when validating variable names or file headers without loading full data payloads.

CLI:

csiio read /path/to/file.dat --meta-only

Python:

from csiio import read_csi_files

meta = read_csi_files("/path/to/file.dat", meta_only=True)

When using CSIDataFile, read(meta_only=True) updates:

reader.meta with normalized metadata for the current in-memory view
reader.file_meta with per-file metadata keyed by filename

3) Convert to another CSI format

Use this when standardizing logger output or generating downstream-compatible files.

CLI:

csiio convert /path/to/in.dat --output-format TOB3 --output /tmp/out.dat

Python:

from csiio import convert_csi_file

out = convert_csi_file("/path/to/in.dat", "/tmp/out.dat", "TOB3")

4) Convert to another CSI format and split by frequency

Use this when standardizing logger output or generating downstream-compatible files.

CLI:

csiio convert /path/to/in.dat --output-format TOB3 --output /tmp/out.dat --split-window 1h

Python:

from csiio import convert_csi_file

outputs = convert_csi_file("/path/to/in.dat", "/tmp/out.dat", "TOB3", split_window="1h")

# Limit split-window writer threads
outputs_limited = convert_csi_file(
    "/path/to/in.dat", "/tmp/out.dat", "TOB3", split_window="1h", max_workers=2
)

5) Export CSV Outputs

Use this when delivering data to tools that expect CSV.

Single-file export (Python):

from csiio import CSIDataFile

reader = CSIDataFile("/path/to/in.dat")
reader.read()
outputs = reader.to_csv("/tmp/out.csv")

Time-window split export (Python):

outputs = reader.to_csv("/tmp/out.csv", split_window="1h")
# Limit split-window writer threads
outputs_limited = reader.to_csv("/tmp/out.csv", split_window="1h", max_workers=2)

CLI equivalent:

csiio to-csv /path/to/in.dat --output /tmp/out.csv --split-window 1h

6) Initialize from a pandas DataFrame

Use this when your data is already in pandas and you want to export or convert it without first writing an intermediate CSI file.

Python:

import pandas as pd
from csiio import CSIDataFile

frame = pd.DataFrame(
    {
        "air_temp (degC)": [12.1, 12.4, 12.8],
        "co2_flux (umol m-2 s-1)": [0.01, -0.02, 0.03],
    },
    index=pd.date_range("2024-01-01 00:00:00", periods=3, freq="30min"),
)

reader = CSIDataFile(data=frame)

# Export CSV directly
csv_files = reader.to_csv("/tmp/out.csv")

# Export split CSV files
split_csv_files = reader.to_csv("/tmp/out.csv", split_window="1h")

# Convert directly to a CSI format
tob3_file = reader.convert("/tmp/out.dat", "TOB3")

Outcome:

CSIDataFile normalizes the DataFrame to a TIMESTAMP index and adds RECORD (RN) when missing.
reader.meta is auto-generated from the DataFrame columns.
You can export CSV or CSI files directly from the in-memory DataFrame.

Parallelism Controls

Parallel workers are enabled by default for:
- reading multiple files via list input
- split-window conversion writes
- split-window CSV writes
Default worker count is max(1, cpu_count // 4), capped by number of tasks.
You can override with max_workers in Python API calls and --max-workers in CLI commands.
Safety check: max_workers must be an integer between 1 and available CPU count.

7) Stream Data Through Shell Pipelines

Use this when integrating with command-line tooling.

csiio read /path/to/in.dat --as-csv | gzip > out.csv.gz

API Surface Summary

The package currently exposes:

Instance

CSIDataFile

Instance / Reading

read_csi_files
read_csi_meta
read_csi_toa5
read_csi_tob1
read_csi_tob3
read_csi_csixml

Conversion

convert_csi_file

Writing

write_csi_toa5
write_csi_tob1
write_csi_tob3
write_csi_csixml

Notes and Constraints

Pandas dataframe are cnetral, csiio keeps its data internally as dataframe giving users the ease of use of pandas
The main purpose of csiio is the reader functionality for the binary, frame-oriented TOB3 format (which may be slower than proprietary tooling for very large files due to frame-level handling in Python.) and the utility to directly convert in a pipeline instead of via cardconvert

How To Cite

If csiio supports published work, and you have the opportunity to cite it you may cite the software repository and version you used.

Suggested BibTeX entry:

@software{csiio,
	title = {csiio: Reader and Converter for Campbell Scientific Data Files},
	author = {Spirig, Robert},
	year = {2026},
	url = {https://github.com/spirrobe/csiio},
	version = {0.1.0}
}

If relevant/available for your workflow, include a commit hash or release tag for exact reproducibility.

Testing

Generate dual-status conversion audits (strict plus tolerance) with reason codes:

python tests/generate_conversion_audit.py

Outputs are written to:

tests/reports/conversion_audit_dual.csv
tests/reports/conversion_audit_dual_summary.csv

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

spirrobe

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

May 21, 2026

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csiio_py-0.2.1.tar.gz (24.9 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

csiio_py-0.2.1-py3-none-any.whl (21.0 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file csiio_py-0.2.1.tar.gz.

File metadata

Download URL: csiio_py-0.2.1.tar.gz
Upload date: May 21, 2026
Size: 24.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for csiio_py-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`739983149900513cf73449bf014f3182e3251cc333717558c1088de8a5124ab0`
MD5	`bae53992fa2e63c68c9d510dc4dff44a`
BLAKE2b-256	`45f38166aab3395906e30d8cccddebc3899a0ff4d9adff678ef1c7903db0d7c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for csiio_py-0.2.1.tar.gz:

Publisher: publish-to-pypi.yml on spirrobe/csiio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: csiio_py-0.2.1.tar.gz
- Subject digest: 739983149900513cf73449bf014f3182e3251cc333717558c1088de8a5124ab0
- Sigstore transparency entry: 1592007019
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: spirrobe/csiio@0211a1e3d77c73d29e4f023390154431a201375d
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/spirrobe
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@0211a1e3d77c73d29e4f023390154431a201375d
- Trigger Event: push

File details

Details for the file csiio_py-0.2.1-py3-none-any.whl.

File metadata

Download URL: csiio_py-0.2.1-py3-none-any.whl
Upload date: May 21, 2026
Size: 21.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for csiio_py-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1fb2aa036c3c8e82487d8f06e7327b767b4ad6f66e38220f8cb48ac228c8ed61`
MD5	`4a634214a710b6b069be9245091b3f73`
BLAKE2b-256	`e9e6718062ae24ebacdefc1c8e41e4a8d449291cb781fcffef5701ed314ddaed`

See more details on using hashes here.

Provenance

The following attestation bundles were made for csiio_py-0.2.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on spirrobe/csiio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: csiio_py-0.2.1-py3-none-any.whl
- Subject digest: 1fb2aa036c3c8e82487d8f06e7327b767b4ad6f66e38220f8cb48ac228c8ed61
- Sigstore transparency entry: 1592007078
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: spirrobe/csiio@0211a1e3d77c73d29e4f023390154431a201375d
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/spirrobe
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@0211a1e3d77c73d29e4f023390154431a201375d
- Trigger Event: push

csiio-py 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

csiio

Install

Core Functionality

Changelog Policy

Python API Usage

CLI (command line) Usage

Typical Use Cases and Functionality

Supported File Formats

Main Use Cases

1) Read CSI Data to DataFrame

2) Read Metadata Only

3) Convert to another CSI format

4) Convert to another CSI format and split by frequency

5) Export CSV Outputs

6) Initialize from a pandas DataFrame

Parallelism Controls

7) Stream Data Through Shell Pipelines

API Surface Summary

Instance

Instance / Reading

Conversion

Writing

Notes and Constraints

How To Cite

Testing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance