Provides an abstraction layer for creating and parsing paths in a programmatic way via templates.
Project description
Andar Package
Caminante, no hay camino, se hace camino al andar.
Antonio Machado
Andar is a python package that provides an abstraction layer for managing path structures, helping to create and parse paths in a programmatic way via templated file paths.
Install Package
With pip:
pip install andar
Quick start:
Simple PathModel definition using default field configurations:
from andar import PathModel
simple_path_model = PathModel(
template="/{base_folder}/{subfolder}/{base_name}__{suffix}.{extension}"
)
Generate a path:
result_path = simple_path_model.get_path(
base_folder="parent_folder",
subfolder="other_folder",
base_name="mydata",
suffix="2000-01-01",
extension="csv",
)
print(result_path)
"/parent_folder/other_folder/mydata__2000-01-01.csv"
Parse a path:
file_path = "/data/reports/summary__2025-12-31.csv"
parsed_fields = simple_path_model.parse_path(file_path)
print(parsed_fields)
{
'base_folder': 'data',
'subfolder': 'reports',
'base_name': 'summary',
'suffix': '2025-12-31',
'extension': 'csv',
}
Examples
How to create a path generator / parser for a date tree structure
Define a PathModel following a date tree folder structure with datetime a suffix using the next template and fields:
from andar import FieldConf, PathModel, SafePatterns
date_archived_pm = PathModel(
template="{base_path}/{subfolder}/{date_path}/{date_prefix}_{name}_{datetime_suffix}.{ext}",
fields={
"base_path": FieldConf(pattern=SafePatterns.DIRPATH),
"subfolder": FieldConf(pattern=SafePatterns.NAME),
"date_path": FieldConf(pattern=r"\d{4}/\d{2}/\d{2}", date_format="%Y/%m/%d"),
"date_prefix": FieldConf(pattern=r"\d{4}-\d{2}-\d{2}", date_format="%Y-%m-%d"),
"name": FieldConf(pattern=SafePatterns.FIELD),
"datetime_suffix": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),
"ext": FieldConf(pattern=SafePatterns.EXTENSION),
},
)
Then, for generating the paths just iterate over dates:
import datetime as dt
base_path = "/company/reports"
subfolder = "finance"
report_name = "revenue"
extension = "xls"
start_date = dt.date(2025, 12, 1)
report_date_list = [start_date + dt.timedelta(days=d) for d in range(10)]
for report_date in report_date_list:
creation_datetime = dt.datetime.now()
report_path = date_archived_pm.get_path(
base_path=base_path,
subfolder=subfolder,
date_path=report_date,
date_prefix=report_date,
name=report_name,
datetime_suffix=creation_datetime,
ext=extension,
)
print(report_path)
For parsing already existing paths use a library that allows to recursive search (e.g. pathlib, glob, os, etc) and output a fullpath for each file:
import pathlib
base_path = "/company/reports"
search_folder = pathlib.Path(base_path)
path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]
for file_path in path_list:
parsed_fields = date_archived_pm.parse_path(file_path)
print(parsed_fields)
How to define path conventions for a datalake
For example Data Mesh propose conventions for separating data into domains, layers and products. This could be implemented with the following PathModel template and fields:
from andar import FieldConf, PathModel, SafePatterns
data_mesh_pm = PathModel(
template="/{domain}/{layer}/{product}/{aggregation}/{date}_{product}.{ext}",
fields={
"domain": FieldConf(pattern=SafePatterns.NAME), # sales, marketing, HR, finance, etc
"layer": FieldConf(pattern=SafePatterns.NAME), # raw, intermediate, mart, etc
"product": FieldConf(pattern=SafePatterns.NAME), # orders, revenues, taxes, campaigns, etc
"aggregation": FieldConf(pattern=SafePatterns.NAME), # daily, weekly, monthly, etc
"date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"), # product date
"ext": FieldConf(pattern=SafePatterns.EXTENSION), # csv, xls, parquet, etc
},
)
For improving traceability, it's a good practice to also include run datetime (i.e. generation date) as a simple version system:
from andar import FieldConf, PathModel, SafePatterns
data_mesh_pm = PathModel(
template="/{domain}/{layer}/{product}/{aggregation}/{product_date}_{product}_{run_datetime}.{ext}",
fields={
"domain": FieldConf(pattern=SafePatterns.NAME), # sales, marketing, HR, finance, etc
"layer": FieldConf(pattern=SafePatterns.NAME), # raw, intermediate, mart, etc
"product": FieldConf(pattern=SafePatterns.NAME), # orders, revenues, taxes, campaigns, etc
"aggregation": FieldConf(pattern=SafePatterns.NAME), # daily, weekly, monthly, etc
"product_date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"), # product target date
"run_datetime": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"), # generation datetime
"ext": FieldConf(pattern=SafePatterns.EXTENSION), # csv, xls, parquet, etc
},
)
How to reorganize files and folders in a datalake
In this example we will reorganize a flatten file structure into a nested one. First define the two PathModels, the old one and the new one:
from andar import FieldConf, PathModel, SafePatterns
old_flat_pm = PathModel(
template="{base_path}/{category}_{name}_{date}.{ext}",
fields={
"base_path": FieldConf(pattern=SafePatterns.DIRPATH),
"category": FieldConf(pattern=SafePatterns.NAME),
"name": FieldConf(pattern=SafePatterns.FIELD),
"date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),
"ext": FieldConf(pattern=SafePatterns.EXTENSION),
},
)
# we can just update the template if the fields are de same
new_nested_pm = old_flat_pm.update(
template="{base_path}/{category}/{date}/{name}.{ext}"
)
Example of file creating in a temporary directory using a flatten structure with the old PathModel:
import pathlib
import tempfile
import datetime as dt
base_path = tempfile.mkdtemp()
start_date = dt.datetime(2025, 12, 1)
date_list = [start_date + dt.timedelta(days=d) for d in range(10)]
for date in date_list:
creation_datetime = dt.datetime.now()
file_path = old_flat_pm.get_path(
base_path=base_path,
category="sales",
name="orders",
date=date,
ext="csv",
)
print(file_path)
pathlib.Path(file_path).touch() # create an empty file
Example of nesting file paths using the parser of the old PathModel and the get_path of the new PathModel:
# First list existing files in target base path
search_folder = pathlib.Path(base_path)
path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]
for file_path in path_list:
parsed_fields = old_flat_pm.parse_path(file_path)
# As the fields are the same we can reuse them directly
new_file_path = new_nested_pm.get_path(**parsed_fields)
# create new parent directories
pathlib.Path(new_file_path).parent.mkdir(parents=True, exist_ok=True)
# move old file to new location using the new name
pathlib.Path(file_path).replace(new_file_path)
The same strategy could be adapted to flatten a nested path structure using PathModels.
Documentation
See the official documentation to learn more.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file andar-0.1.4.tar.gz.
File metadata
- Download URL: andar-0.1.4.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d6bd1692d54571d34c2ce069ae5711227a0f6a7827e8b820f65a74dfbe3881f
|
|
| MD5 |
937af4447ec269783d3fe55d036839fc
|
|
| BLAKE2b-256 |
a2d2d94814f23119d287084ca9eda3e0f64502f2fd78f7beedb4c2df3b3ab0e1
|
File details
Details for the file andar-0.1.4-py3-none-any.whl.
File metadata
- Download URL: andar-0.1.4-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6a51687650b43ce80357f5d9603c9d71381e80764d2c63611c73affd5f7b0f7
|
|
| MD5 |
440072c6b23b44191ae4cb10e3ec4b34
|
|
| BLAKE2b-256 |
aba631c0f7f21c737f85025a61baab7356a30409a8d173347bf2498e55580b97
|