File handling library for creating, saving, and loading various file types (CSV, JSON, JOBLIB, PDF, PARQUET)

These details have not been verified by PyPI

Project description

dsr-files

File handling library for creating, saving, and loading various file types (CSV, JSON, JOBLIB, PDF, PARQUET).

Version 3.0.0: Introduced Cloud-Native Pathing via cloudpathlib, standardized Universal Parameter Filtering to prevent engine-level crashes, and updated all signatures to support audit-ready return values.

Features

CSV: Read and write CSV files with pandas.
JSON: Save and load JSON data with recursive sanitization; now supports .jsonl (JSON Lines) for large datasets.
JOBLIB: Serialize Python objects and ML models with joblib.
Excel: Save and load Excel workbooks; supports .xlsx, .xls, .xlsm, and .xlsb formats.
PDF: Generate interactive, indexed audit reports with Matplotlib and ReportLab.
PARQUET: High-performance columnar storage; now supports .pq as a valid logical extension.
YAML: Save and load YAML files with recursive logic and strict key validation to prevent duplicate entries in configuration files.
FileType Utilities: The FileType enum now includes is_valid_extension() for performing logical consistency checks between file names and formats without requiring filesystem access. This is ideal for pre-validating configuration files in ML pipelines.

Installation

pip install dsr-files

Requirements

Python: >= 3.10
PyYAML: >= 6.0.2
Pandas: Required for CSV and Excel operations
Joblib: Required for object serialization
dsr-utils: >= 1.6.0
cloudpathlib: Required for AnyPath and CloudPath support

Optional Dependencies

For Excel support:

pip install dsr-files[excel]

For PDF support:

pip install dsr-files[pdf]

For full cloud support (S3, GCS, Azure)

pip install cloudpathlib[all]

Development Installation

pip install -e ".[dev,excel,pdf]"

Usage

Universal Parameter Filtering

All handlers now support safe_call=True. This leverages dsr-utils to filter out incompatible keyword arguments that would otherwise cause TypeErrors in underlying engines like pyarrow or fastparquet.

Any parameters that are not compatible with the specific engine are returned in a rejected dictionary for debugging and audit logging.

CSV Operations

from dsr_files import save_csv, load_csv, create_csv
import pandas as pd
from pathlib import Path

# Create from dictionary
data = {"name": ["Alice", "Bob"], "age": [30, 25]}
df = create_csv(data)

# Save to CSV
full_path, rejected = save_csv(df, Path("."), "data")

# Load from CSV
df, rejected = load_csv(Path("data.csv"))

JSON Operations

from dsr_files import save_json, load_json
from pathlib import Path

data = {"key": "value", "number": 42}

# Save to JSON
full_path, rejected = save_json(data, Path("."), "data")

# Load from JSON
data, rejected = load_json(Path("data.json"))

JOBLIB Operations

from dsr_files import save_joblib, load_joblib
from pathlib import Path

# Save any Python object
model = {"weights": [1, 2, 3], "config": {}}
full_path, rejected = save_joblib(model, Path("."), "model")

# Load from JOBLIB
model, rejected = load_joblib(Path("model.joblib"))

Excel Operations

from dsr_files import save_excel, load_excel, ExcelSheetConfig
from pathlib import Path
import pandas as pd

sales = pd.DataFrame({"region": ["NA", "EU"], "revenue": [120, 95]})
costs = pd.DataFrame({"region": ["NA", "EU"], "cost": [80, 70]})

# Save multi-sheet workbook
full_path, rejected = save_excel(
 [
  ExcelSheetConfig(data=sales, sheet_name="Sales"),
  ExcelSheetConfig(data=costs, sheet_name="Costs"),
 ],
 Path("."),
 "report",
)

# Load first sheet
df, rejected = load_excel(Path("report.xlsx"))

PDF Operations (Interactive Reports)

from dsr_files import PDFDocument, PageConfiguration, PageSize, PageOrientation, PageColors
from pathlib import Path

# Configure document style
config = PageConfiguration(
    page_size=PageSize.LETTER,
    orientation=PageOrientation.PORTRAIT,
    colors=PageColors(page_num="#000000", title="#444444"),
    margins=(0.07, 0.93, 0.90, 0.10)
)

doc = PDFDocument("Audit Report", config)
page = doc.create_new_page("Summary")
# ... Add Matplotlib content to page.fig ...

doc.render_table_of_contents()
full_path, rejected = doc.save(Path("."), "audit_report")

PARQUET Operations

from dsr_files import save_parquet, load_parquet
import pandas as pd
from pathlib import Path

df = pd.DataFrame({"A": [1, 2, 3], "B": ["x", "y", "z"]})

# Save to Parquet
full_path, rejected = save_parquet(df, Path("."), "data", engine="pyarrow")

# Load from Parquet
df, rejected = load_parquet(Path("data.parquet"))

YAML Operations

from dsr_files import save_yaml, load_yaml
from pathlib import Path

data = {"project": "dsr-orchestrator", "steps": ["ingest", "analyze"]}

# Save to YAML
full_path, rejected = save_yaml(data, Path("config.yaml"))

# Load from YAML using the new UniqueKeyLoader
# This will raise a ConstructorError if duplicate keys are detected,
# protecting your project settings from conflicting edits.
data, rejected = load_yaml(Path("config.yaml"))

Cloud-Native Pathing

dsr-files now supports both local and cloud filesystems (S3, GCS, Azure) out of the box using cloudpathlib. You can pass raw URI strings, pathlib.Path objects, or CloudPath objects directly to any handler.

from dsr_files import save_csv

# Local path
full_path, rejected = save_csv(df, "./data", "local_audit") 

# Cloud path (requires cloudpathlib[s3])
full_path, rejected = save_csv(df, "s3://my-bucket/audits", "remote_audit")

Testing

pytest tests/
pytest tests/ --cov=src/dsr_files

License

MIT

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

3.1.1

Apr 21, 2026

3.1.0

Apr 19, 2026

This version

3.0.0

Apr 19, 2026

2.3.0

Apr 17, 2026

2.2.0

Apr 14, 2026

2.1.0

Apr 13, 2026

2.0.0

Apr 9, 2026

1.0.3

Feb 9, 2026

1.0.0

Feb 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsr_files-3.0.0.tar.gz (30.5 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dsr_files-3.0.0-py3-none-any.whl (28.4 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file dsr_files-3.0.0.tar.gz.

File metadata

Download URL: dsr_files-3.0.0.tar.gz
Upload date: Apr 19, 2026
Size: 30.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dsr_files-3.0.0.tar.gz
Algorithm	Hash digest
SHA256	`242f239e1ea431409110c95170a58ae2ee3768c63e9773698431b45b59e7f851`
MD5	`9b68672b585937d98b4c4d76a23a4cbb`
BLAKE2b-256	`78040ad429b56d6830dfb0dfb3479e8847e5355be90ed3845cd3b11c5939aabe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_files-3.0.0.tar.gz:

Publisher: python-publish.yml on scottroberts140/dsr-files

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dsr_files-3.0.0.tar.gz
- Subject digest: 242f239e1ea431409110c95170a58ae2ee3768c63e9773698431b45b59e7f851
- Sigstore transparency entry: 1340708296
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: scottroberts140/dsr-files@77a5e46f8f92324e39caaa70896548f278f9cd9e
- Branch / Tag: refs/tags/v.3.0.0
- Owner: https://github.com/scottroberts140
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@77a5e46f8f92324e39caaa70896548f278f9cd9e
- Trigger Event: release

File details

Details for the file dsr_files-3.0.0-py3-none-any.whl.

File metadata

Download URL: dsr_files-3.0.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 28.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dsr_files-3.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4870215f214a326e06712d9a57a197527b22e25ee1467390bcf08bb6ea54ac1e`
MD5	`7260c960134b1d4f5205d567bd6babb8`
BLAKE2b-256	`8ad5d2615e95c96c0875ac99e2cb91e482238fe4983b36a2a024ff8ba20c0e7b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_files-3.0.0-py3-none-any.whl:

Publisher: python-publish.yml on scottroberts140/dsr-files

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dsr_files-3.0.0-py3-none-any.whl
- Subject digest: 4870215f214a326e06712d9a57a197527b22e25ee1467390bcf08bb6ea54ac1e
- Sigstore transparency entry: 1340708304
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: scottroberts140/dsr-files@77a5e46f8f92324e39caaa70896548f278f9cd9e
- Branch / Tag: refs/tags/v.3.0.0
- Owner: https://github.com/scottroberts140
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@77a5e46f8f92324e39caaa70896548f278f9cd9e
- Trigger Event: release

dsr-files 3.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

dsr-files

Features

Installation

Requirements

Optional Dependencies

Development Installation

Usage

Universal Parameter Filtering

CSV Operations

JSON Operations

JOBLIB Operations

Excel Operations

PDF Operations (Interactive Reports)

PARQUET Operations

YAML Operations

Cloud-Native Pathing

Testing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance