Skip to main content

File handling library for creating, saving, and loading various file types (CSV, JSON, JOBLIB, PDF, PARQUET)

Project description

dsr-files

PyPI version Python versions License Changelog

File handling library for creating, saving, and loading various file types (CSV, JSON, JOBLIB, PDF, PARQUET).

Version 2.2.0: Added UniqueKeyLoader to YAML operations to ensure configuration integrity by preventing duplicate keys in project files.

Features

  • CSV: Read and write CSV files with pandas
  • JSON: Save and load JSON data with recursive sanitization for NumPy/Pandas types
  • JOBLIB: Serialize Python objects and ML models with joblib
  • Excel: Save and load Excel workbooks (single or multi-sheet)
  • PDF: Generate interactive, indexed audit reports with Matplotlib and ReportLab
  • PARQUET: High-performance columnar storage using PyArrow or FastParquet
  • YAML: Save and load YAML files with recursive logic and strict key validation to prevent duplicate entries in configuration files.

Installation

pip install dsr-files

Requirements

  • Python: >= 3.10
  • PyYAML: >= 6.0.2
  • Pandas: Required for CSV and Excel operations
  • Joblib: Required for object serialization

Optional Dependencies

For Excel support:

pip install dsr-files[excel]

For PDF support:

pip install dsr-files[pdf]

Development Installation

pip install -e ".[dev,excel,pdf]"

Usage

CSV Operations

from dsr_files import save_csv, load_csv, create_csv
import pandas as pd
from pathlib import Path

# Create from dictionary
data = {"name": ["Alice", "Bob"], "age": [30, 25]}
df = create_csv(data)

# Save to CSV
save_csv(df, Path("."), "data")

# Load from CSV
df = load_csv(Path("data.csv"))

JSON Operations

from dsr_files import save_json, load_json
from pathlib import Path

data = {"key": "value", "number": 42}

# Save to JSON
save_json(data, Path("."), "data")

# Load from JSON
data = load_json(Path("data.json"))

JOBLIB Operations

from dsr_files import save_joblib, load_joblib
from pathlib import Path

# Save any Python object
model = {"weights": [1, 2, 3], "config": {}}
save_joblib(model, Path("."), "model")

# Load from JOBLIB
model = load_joblib(Path("model.joblib"))

Excel Operations

from dsr_files import save_excel, load_excel, ExcelSheetConfig
from pathlib import Path
import pandas as pd

sales = pd.DataFrame({"region": ["NA", "EU"], "revenue": [120, 95]})
costs = pd.DataFrame({"region": ["NA", "EU"], "cost": [80, 70]})

# Save multi-sheet workbook
save_excel(
 [
  ExcelSheetConfig(data=sales, sheet_name="Sales"),
  ExcelSheetConfig(data=costs, sheet_name="Costs"),
 ],
 Path("."),
 "report",
)

# Load first sheet
df = load_excel(Path("report.xlsx"))

PDF Operations (Interactive Reports)

from dsr_files import PDFDocument, PageConfiguration, PageSize, PageOrientation, PageColors
from pathlib import Path

# Configure document style
config = PageConfiguration(
    page_size=PageSize.LETTER,
    orientation=PageOrientation.PORTRAIT,
    colors=PageColors(page_num="#000000", title="#444444"),
    margins=(0.07, 0.93, 0.90, 0.10)
)

doc = PDFDocument("Audit Report", config)
page = doc.create_new_page("Summary")
# ... Add Matplotlib content to page.fig ...

doc.render_table_of_contents()
doc.save(Path("."), "audit_report")

PARQUET Operations

from dsr_files import save_parquet, load_parquet
import pandas as pd
from pathlib import Path

df = pd.DataFrame({"A": [1, 2, 3], "B": ["x", "y", "z"]})

# Save to Parquet
save_parquet(df, Path("."), "data", engine="pyarrow")

# Load from Parquet
df = load_parquet(Path("data.parquet"))

YAML Operations

from dsr_files import save_yaml, load_yaml
from pathlib import Path

data = {"project": "dsr-orchestrator", "steps": ["ingest", "analyze"]}

# Save to YAML
save_yaml(data, Path("config.yaml"))

# Load from YAML using the new UniqueKeyLoader
# This will raise a ConstructorError if duplicate keys are detected,
# protecting your project settings from conflicting edits.
data = load_yaml(Path("config.yaml"))

Testing

pytest tests/
pytest tests/ --cov=src/dsr_files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsr_files-2.2.0.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsr_files-2.2.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file dsr_files-2.2.0.tar.gz.

File metadata

  • Download URL: dsr_files-2.2.0.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dsr_files-2.2.0.tar.gz
Algorithm Hash digest
SHA256 16f681ca0d874554ab5224e4082dd0df9cf1dc77a0747787b236ef2b181c72f0
MD5 94aabe4c411ed913c9dcf7813d25e2d8
BLAKE2b-256 41aa352d168c7c427b4df970296e300aa393eb7b442619ad13a19b845a7aa073

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_files-2.2.0.tar.gz:

Publisher: python-publish.yml on scottroberts140/dsr-files

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dsr_files-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: dsr_files-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dsr_files-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba5b82a7947bd11c25c83538dc67740543d0687cb36bb25ccfbbf37f00279c1c
MD5 0547bc354dc54e78ae022f9eab60152d
BLAKE2b-256 06537b2d26639b3deedcbc0cbf4f951c956a9969e8344fe02666bed09a5a7035

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_files-2.2.0-py3-none-any.whl:

Publisher: python-publish.yml on scottroberts140/dsr-files

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page