Skip to main content

Settings and file I/O management using a configuration YAML file

Project description

config-versioned

A Python package for YAML-based configuration management in data pipelines, with versioned directory support and automatic file I/O by extension.

Installation

pip install config-versioned

Install optional extras for specific file formats:

pip install config-versioned[pandas]   # CSV, TSV, Excel, Stata
pip install config-versioned[geo]      # Shapefiles, GeoJSON, GeoPackage, etc.
pip install config-versioned[raster]   # GeoTIFF, rasterio formats
pip install config-versioned[xarray]   # NetCDF
pip install config-versioned[dbfread]  # DBF files
pip install config-versioned[all]      # All of the above

Quick Start

1. Create a config YAML file

# project_config.yaml
project_name: 'my_analysis'

directories:
  raw_data:
    versioned: false
    path: '~/data/raw'
    files:
      input_table: 'records.csv'

  results:
    versioned: true
    path: '~/data/results'
    files:
      output_table: 'processed.csv'
      summary: 'summary.txt'

versions:
  results: 'v1'

2. Load the config

from config_versioned import Config

cfg = Config('project_config.yaml')

3. Access settings

cfg.get('project_name')           # 'my_analysis'
cfg.get('versions', 'results')    # 'v1'
cfg.get()                         # full config dict

4. Build paths

# Non-versioned: returns ~/data/raw
cfg.get_dir_path('raw_data')

# Versioned: returns ~/data/results/v1
cfg.get_dir_path('results')

# With a custom version override
cfg.get_dir_path('results', custom_version='v2')

# Full file path
cfg.get_file_path('raw_data', 'input_table')   # ~/data/raw/records.csv
cfg.get_file_path('results', 'output_table')   # ~/data/results/v1/processed.csv

All path methods return pathlib.Path objects.

5. Read and write files

import pandas as pd

# Read a file (path resolved from config)
df = cfg.read('raw_data', 'input_table')

# Process data
processed = df.head(10)

# Write results (directory must exist)
cfg.write(processed, 'results', 'output_table')
cfg.write(['Summary: 10 rows written\n'], 'results', 'summary')

# Write the config itself to the results directory
cfg.write_self('results')

6. Override versions at load time

# Run the same pipeline with a new version
cfg_v2 = Config('project_config.yaml', versions={'results': 'v2'})
cfg_v2.get_dir_path('results')  # ~/data/results/v2

Standalone autoread / autowrite

from config_versioned import autoread, autowrite

# Read by extension
df = autoread('data/records.csv')
config = autoread('config.yaml')
lines = autoread('notes.txt')

# Write by extension
autowrite(df, 'output/results.csv')
autowrite({'key': 'value'}, 'output/config.yaml')
autowrite(['line one\n', 'line two\n'], 'output/notes.txt')

Supported File Extensions

Format Extensions Requires
CSV / TSV csv, tsv, gz, bz2 pandas
Excel xls, xlsx pandas, openpyxl
Stata dta pandas
DBF dbf dbfread
YAML yaml, yml (core)
Text txt (core)
Shapefile / Vector shp, geojson, gpkg, fgb, gml, kml, and more geopandas
Raster tif, geotiff rasterio
NetCDF nc xarray

For raster files, autoread returns {"data": np.ndarray, "profile": dict} and autowrite accepts that same structure (or a (data, profile) tuple).

Example Config File

A bundled example is included with the package:

import importlib.resources as r
from config_versioned import Config

path = str(r.files("config_versioned") / "data" / "example_config.yaml")
cfg = Config(path)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

config_versioned-0.2.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

config_versioned-0.2.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file config_versioned-0.2.0.tar.gz.

File metadata

  • Download URL: config_versioned-0.2.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for config_versioned-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bfa02c9fdba354e8d735859218680091609fc9da5eadd16305685db97c2156be
MD5 e5e07b065b0d4566765d0e4ca3786559
BLAKE2b-256 6b1393fa94a5973be52e572d8be601919b21b80c617b0953e6435547ecb8d341

See more details on using hashes here.

File details

Details for the file config_versioned-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for config_versioned-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 764db7c6b15338d5cf9a4fb0d8aa8baea85f9bd58670495ded583fc6b577b524
MD5 95df6174948b206d75eef9c8bb176214
BLAKE2b-256 1229ec51e88831a13c5431189c5e67c5ab77e795fe5a4b087bfd02e9046059b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page