Skip to main content

Provides an abstraction layer for creating and parsing paths in a programmatic way via templates.

Project description

Andar Package

Caminante, no hay camino, se hace camino al andar.

Antonio Machado

Andar is a python package that provides an abstraction layer for managing path structures, helping to create and parse paths in a programmatic way via templated file paths.

Install Package

With pip:

pip install andar

Quick start:

Simple PathModel definition using default field configurations:

from andar import PathModel

simple_path_model = PathModel(
    template="/{base_folder}/{subfolder}/{base_name}__{suffix}.{extension}"
)

Generate a path:

result_path = simple_path_model.get_path(
    base_folder="parent_folder",
    subfolder="other_folder",
    base_name="mydata",
    suffix="2000-01-01",
    extension="csv",
)
print(result_path)
"/parent_folder/other_folder/mydata__2000-01-01.csv"

Parse a path:

file_path = "/data/reports/summary__2025-12-31.csv"
parsed_fields = simple_path_model.parse_path(file_path)
print(parsed_fields)
{
    'base_folder': 'data', 
    'subfolder': 'reports', 
    'base_name': 'summary', 
    'suffix': '2025-12-31', 
    'extension': 'csv',
}

Examples

How to create a path generator / parser for a date tree structure

Define a PathModel following a date tree folder structure with datetime a suffix using the next template and fields:

from andar import FieldConf, PathModel, SafePatterns

date_archived_pm = PathModel(
    template="{base_path}/{subfolder}/{date_path}/{date_prefix}_{name}_{datetime_suffix}.{ext}",
    fields={
        "base_path": FieldConf(pattern=SafePatterns.DIRPATH),
        "subfolder": FieldConf(pattern=SafePatterns.NAME),
        "date_path": FieldConf(pattern=r"\d{4}/\d{2}/\d{2}", date_format="%Y/%m/%d"),
        "date_prefix": FieldConf(pattern=r"\d{4}-\d{2}-\d{2}", date_format="%Y-%m-%d"),
        "name": FieldConf(pattern=SafePatterns.FIELD),
        "datetime_suffix": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),
    },
)

Then, for generating the paths just iterate over dates:

import datetime as dt

base_path = "/company/reports"
subfolder = "finance"
report_name = "revenue"
extension = "xls"
start_date = dt.date(2025, 12, 1)
report_date_list = [start_date + dt.timedelta(days=d) for d in range(10)]

for report_date in report_date_list:
    creation_datetime = dt.datetime.now()
    report_path = date_archived_pm.get_path(
        base_path=base_path,
        subfolder=subfolder,
        date_path=report_date,
        date_prefix=report_date,
        name=report_name,
        datetime_suffix=creation_datetime,
        ext=extension,
    )
    print(report_path)

For parsing already existing paths use a library that allows to recursive search (e.g. pathlib, glob, os, etc) and output a fullpath for each file:

import pathlib
base_path = "/company/reports"
search_folder = pathlib.Path(base_path)
path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]

for file_path in path_list:
    parsed_fields = date_archived_pm.parse_path(file_path)
    print(parsed_fields)

How to define path conventions for a datalake

For example Data Mesh propose conventions for separating data into domains, layers and products. This could be implemented with the following PathModel template and fields:

from andar import FieldConf, PathModel, SafePatterns

data_mesh_pm = PathModel(
    template="/{domain}/{layer}/{product}/{aggregation}/{date}_{product}.{ext}",
    fields={
        "domain": FieldConf(pattern=SafePatterns.NAME),  # sales, marketing, HR, finance, etc
        "layer": FieldConf(pattern=SafePatterns.NAME),  # raw, intermediate, mart, etc
        "product": FieldConf(pattern=SafePatterns.NAME),  # orders, revenues, taxes, campaigns, etc
        "aggregation": FieldConf(pattern=SafePatterns.NAME),  # daily, weekly, monthly, etc
        "date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),  # product date
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),  # csv, xls, parquet, etc
    },
)

For improving traceability, it's a good practice to also include run datetime (i.e. generation date) as a simple version system:

from andar import FieldConf, PathModel, SafePatterns

data_mesh_pm = PathModel(
    template="/{domain}/{layer}/{product}/{aggregation}/{product_date}_{product}_{run_datetime}.{ext}",
    fields={
        "domain": FieldConf(pattern=SafePatterns.NAME),  # sales, marketing, HR, finance, etc
        "layer": FieldConf(pattern=SafePatterns.NAME),  # raw, intermediate, mart, etc
        "product": FieldConf(pattern=SafePatterns.NAME),  # orders, revenues, taxes, campaigns, etc
        "aggregation": FieldConf(pattern=SafePatterns.NAME),  # daily, weekly, monthly, etc
        "product_date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),  # product target date
        "run_datetime": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),  # generation datetime
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),  # csv, xls, parquet, etc
    },
)

How to reorganize files and folders in a datalake

In this example we will reorganize a flatten file structure into a nested one. First define the two PathModels, the old one and the new one:

from andar import FieldConf, PathModel, SafePatterns

old_flat_pm = PathModel(
    template="{base_path}/{category}_{name}_{date}.{ext}",
    fields={
        "base_path": FieldConf(pattern=SafePatterns.DIRPATH),
        "category": FieldConf(pattern=SafePatterns.NAME),
        "name": FieldConf(pattern=SafePatterns.FIELD),
        "date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),
    },
)

# we can just update the template if the fields are de same
new_nested_pm = old_flat_pm.update(
    template="{base_path}/{category}/{date}/{name}.{ext}"
)

Example of file creating in a temporary directory using a flatten structure with the old PathModel:

import pathlib
import tempfile
import datetime as dt

base_path = tempfile.mkdtemp()
start_date = dt.datetime(2025, 12, 1)
date_list = [start_date + dt.timedelta(days=d) for d in range(10)]

for date in date_list:
    creation_datetime = dt.datetime.now()
    file_path = old_flat_pm.get_path(
        base_path=base_path,
        category="sales",
        name="orders",
        date=date,
        ext="csv",
    )
    print(file_path)
    pathlib.Path(file_path).touch()  # create an empty file

Example of nesting file paths using the parser of the old PathModel and the get_path of the new PathModel:

# First list existing files in target base path
search_folder = pathlib.Path(base_path)
path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]

for file_path in path_list:
    parsed_fields = old_flat_pm.parse_path(file_path)
    # As the fields are the same we can reuse them directly
    new_file_path = new_nested_pm.get_path(**parsed_fields)
    # create new parent directories
    pathlib.Path(new_file_path).parent.mkdir(parents=True, exist_ok=True)
    # move old file to new location using the new name
    pathlib.Path(file_path).replace(new_file_path)

The same strategy could be adapted to flatten a nested path structure using PathModels.

Documentation

See the official documentation to learn more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

andar-0.1.4.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

andar-0.1.4-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file andar-0.1.4.tar.gz.

File metadata

  • Download URL: andar-0.1.4.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for andar-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1d6bd1692d54571d34c2ce069ae5711227a0f6a7827e8b820f65a74dfbe3881f
MD5 937af4447ec269783d3fe55d036839fc
BLAKE2b-256 a2d2d94814f23119d287084ca9eda3e0f64502f2fd78f7beedb4c2df3b3ab0e1

See more details on using hashes here.

File details

Details for the file andar-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: andar-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for andar-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e6a51687650b43ce80357f5d9603c9d71381e80764d2c63611c73affd5f7b0f7
MD5 440072c6b23b44191ae4cb10e3ec4b34
BLAKE2b-256 aba631c0f7f21c737f85025a61baab7356a30409a8d173347bf2498e55580b97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page