Skip to main content

Python library for storing and working with monthly-period data.

Project description

monthpack

monthpack is a Python library for organizing period-based data sources, such as bank statements, income statements, and similar records.

The project is centered around local source.config.json files that define:

  • base metadata without period
  • persistent changes starting at a given period
  • temporary changes for one specific period
  • placeholders such as {period}, {period.year}, and {period.month}

Current Layout

monthpack/
  data/
  src/
    monthpack/
  pyproject.toml
  README.md

Example

from monthpack import Source

source = Source.from_path("data/source/source.config.json")
metadata = source.resolve_metadata(202401, storage=0)

print(metadata.period)
print(metadata.year)
print(metadata.month)
print(metadata.inpath)
print(metadata["reader"])

data = source.read((202401, 202406), storage=0, skip_error=True)

You can also initialize the source with admin_user and preprocessors:

source = Source.from_path(
    "data/source/source.config.json",
    admin_user=True,
    preprocessors=[preprocess_main, preprocess_backup],
)

You can also override input or output when loading the config:

source = Source.from_path(
    "data/source/source.config.json",
    input={"root": None, "path": "D:/raw/monthpack"},
    output={"path": "processed_alt"},
)

Config Templates

monthpack also exposes a helper function for generating a starter source.config.json file:

from monthpack import write_sample_config

write_sample_config("data/sample/source.config.json")

This helper generates one example file with three storages already configured:

  • dataframe: pandas with pandas_type = "dataframe"
  • series: pandas with pandas_type = "series"
  • pickle: pickle

source.config.json

In general terms, a source.config.json file is structured like this:

{
    "input": {
        "root": ".",
        "path": "input"
    },
    "output": {
        "root": ".",
        "path": "output"
    },
    "storage": [
        {
            "name": "main",
            "writer": "pandas",
            "pandas_type": "dataframe",
            "collection": "concat",
            "concat_axis": 0,
            "period_label": "period",
            "persistence": true,
            "metadata": [
                {
                    "outpath": "{period.year}/{period}_{name}.bin"
                }
            ]
        }
    ],
    "metadata": [
        {
            "inpath": "**/{period}_*.csv",
            "reader": "csv"
        },
        {
            "period": 202507,
            "inpath": "**/{period}_*.xlsx",
            "reader": "excel"
        }
    ]
}

Field overview:

  • metadata: temporal metadata definitions. Entries without period are base values; entries with period override from that month onward; entries with temporary: true apply only for that exact month.
  • storage: processed-data storage definitions. Each item defines writer and collection behavior, and can also contain its own metadata list.
  • input: optional input path configuration. root defines the base used to interpret path.
  • output: optional output path configuration. root defines the base used to interpret path.

Path resolution rules:

  • root = ".": path is resolved relative to the folder containing the JSON file.
  • root = null: path is used as-is.
  • root = "some/path": path is resolved relative to that explicit root.

At runtime, Source.from_path(...) reads this file, resolves input and output, and builds a Source instance from it. The same method also accepts optional input={...} and output={...} overrides; only the keys you pass are changed, and those values take precedence over the JSON file. You can also pass admin_user and preprocessors directly to the same constructor.

Source.resolve_metadata(...) returns a Metadata object. Resolved keys are available both as attributes and as dictionary-style accessors, so user preprocessors can use either metadata.inpath or metadata["inpath"]. The period itself is exposed as metadata.period, not as metadata["period"].

When period=None, resolve_metadata(...) returns only the base metadata, without applying any periodic or temporary entries.

Storage references can be passed either as:

  • an index, for example storage=0
  • a storage name, for example storage="main"

When name is defined inside storage, it must be unique across the configuration.

Read Behavior

  • source.read(period, ...) reads one period.
  • source.read(None, ...) reads the atemporal/base case.
  • source.read([period1, period2, ...], ...) respects the exact order of the list.
  • source.read((start, end), ...) expands a continuous monthly range, ascending or descending according to the tuple order.
  • source.read_one(period, ...) is the single-period helper used internally.

skip_error=True returns None for missing-read cases such as a missing processed file or a missing persistence anchor. With skip_error=False, those cases raise FileNotFoundError. Programming errors inside preprocessors are not swallowed.

Storage Options

Within each storage item:

  • name: optional unique identifier that lets the storage be referenced by name instead of only by index.
  • writer: currently supports pandas and pickle.
  • pandas_type: required when writer = "pandas". Use dataframe or series.
  • collection: one of list, dict, or concat.
  • concat_axis: axis used when collection = "concat".
  • period_label: when defined, adds the requested period to pandas outputs during collection reads. For DataFrame, it is used as a column name; for Series, it is used as the outer index level name.
  • persistence: when true, only metadata entries of type periodic act as anchors; later periods reuse the latest valid anchor.
  • metadata: storage-specific metadata. This is also where outpath should be declared.

Within storage metadata:

  • outpath: output path template for the stored artifact.

User Mode

Source can run in read-only user mode:

source.set_user()
data = source.read(202401)

In user mode:

  • read(...) only returns already processed data.
  • missing processed files are not regenerated from raw inputs.
  • save(...) is not available.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monthpack-0.1.2.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

monthpack-0.1.2-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file monthpack-0.1.2.tar.gz.

File metadata

  • Download URL: monthpack-0.1.2.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for monthpack-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f532df6c237b77229ff4d0ac6e66828fa1158cee284bed65980387a4f28486e7
MD5 7032472663792d8521bf7fae21033f1b
BLAKE2b-256 4d95d4b5c7c8c86645540b6779df5c2607ff6a63c3ab90439d632294b48b6475

See more details on using hashes here.

File details

Details for the file monthpack-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: monthpack-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for monthpack-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8ee9cafff706a90743f33d121bcc0264405e097707785a7b5d20320fb51b6d86
MD5 3c04e344a6becbd7b2b5e331a90dd927
BLAKE2b-256 ad994a345a972ceec73f06ab62d0376d3e4281b2865be492a8bf848ffe111af6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page