Skip to main content

Universal, extensible Python library for extracting structured information (groups, dates, times, custom patterns) from file names and paths.

Project description

file_path_parser

Universal, extensible Python library for extracting structured information (groups, dates, times, custom patterns) from file names and paths.

  • No hardcoded logic: you choose any number of groups (lists, enums, dicts, strings).
  • Automatic date and time search (many formats supported and validated).
  • Unlimited custom patterns: add your own regex groups.
  • Configurable priority: filename or path takes precedence.
  • Supports str and pathlib.Path.
  • Returns None if not found or not valid.

Installation

pip install file_path_parser

Supported Date and Time Formats

Date examples:

  • 20240622 (YYYYMMDD)
  • 2024-06-22 (YYYY-MM-DD)
  • 2024_06_22 (YYYY_MM_DD)
  • 22.06.2024 (DD.MM.YYYY)
  • 22-06-2024 (DD-MM-YYYY)
  • 220624 (YYMMDD)
  • 2024-6-2, 2024_6_2

Time examples:

  • 154212 (HHMMSS)
  • 1542 (HHMM)
  • 15-42-12 (HH-MM-SS)
  • 15_42_12 (HH_MM_SS)
  • 15-42, 15_42 (HH-MM, HH_MM)

All dates and times are validated. E.g. "20241341" is not a date; "246199" is not a time.


Usage Examples

1. Lists and Tuples as Groups

from file_path_parser import FilePathParser

animals = ["cat", "dog"]
shifts = ("night", "day")
departments = {"department": ["prod", "dev", "test"]}

parser = FilePathParser(
    animals,
    shifts,
    departments,
    date=True,
    time=True,
    patterns={"cam": r"cam\d{1,2}"}
)

result = parser.parse("cat_night_dev_cam08_20240622_1542.jpg")
print(result)
# {
#   "group1": "cat",
#   "group2": "night",
#   "department": "dev",
#   "date": "20240622",
#   "time": "1542",
#   "cam": "cam08"
# }

2. Enum as Groups

from file_path_parser import FilePathParser
from enum import Enum

class Shift(Enum):
    NIGHT = "night"
    DAY = "day"

class Animal(Enum):
    CAT = "cat"
    DOG = "dog"

parser = FilePathParser(
    Animal,
    Shift,
    date=True,
    time=True,
    patterns={"cam": r"cam\d{1,2}"}
)

result = parser.parse("dog_day_cam12_2024-06-23_1730.avi")
print(result)
# {
#   "animal": "dog",
#   "shift": "day",
#   "date": "2024-06-23",
#   "time": "1730",
#   "cam": "cam12"
# }

3. Dictionary as Group

from file_path_parser import FilePathParser

departments = {"department": ["it", "finance", "marketing"]}
levels = {"level": ("junior", "middle", "senior")}
flags = {"flag": "urgent"}

parser = FilePathParser(
    departments,
    levels,
    flags,
    date=True,
    patterns={"ticket": r"T\d{3,5}"}
)

result = parser.parse("finance_senior_urgent_T1004_20240601.txt")
print(result)
# {
#   "department": "finance",
#   "level": "senior",
#   "flag": "urgent",
#   "date": "20240601",
#   "ticket": "T1004"
# }

4. Mixed Groups: Enum, List, Custom Patterns, Date, and Time

from file_path_parser import FilePathParser
from enum import Enum

class Status(Enum):
    OPEN = "open"
    CLOSED = "closed"

parser = FilePathParser(
    Status,
    ["alpha", "beta"],
    date=True,
    time=True,
    patterns={"session": r"session\d+"}
)

result = parser.parse("beta_open_session27_2023-12-31_2359.txt")
print(result)
# {
#   "status": "open",
#   "group2": "beta",
#   "date": "2023-12-31",
#   "time": "2359",
#   "session": "session27"
# }

5. Only Custom Patterns and Date/Time

from file_path_parser import FilePathParser

parser = FilePathParser(
    date=True,
    time=True,
    patterns={"id": r"id\d+", "batch": r"batch\d{2,4}"}
)

result = parser.parse("id99_batch012_20240701_1430.log")
print(result)
# {
#   "date": "20240701",
#   "time": "1430",
#   "id": "id99",
#   "batch": "batch012"
# }

6. If both the path and filename contain a group or date, the value from the priority parameter wins.

from file_path_parser import FilePathParser

parser = FilePathParser(["prod", "test"], date=True, priority="filename")
# 'prod' есть в пути, 'test' — в имени файла
result = parser.parse("/data/prod/archive/test_20240620.csv")
print(result)
# Если priority="filename", group1 == "test"
# Если priority="path", group1 == "prod"

API Reference

class FilePathParser:
    def __init__(
        *groups: Any,        # Any number of lists, enums, dicts, or strings (group name auto-detected)
        date: bool = False,  # Extract date? (default: False)
        time: bool = False,  # Extract time? (default: False)
        separator: str = "_",
        priority: str = "filename", # or "path"
        patterns: dict = None,      # e.g. {"cam": r"cam\d+"}
    )

    def parse(self, full_path: Union[str, Path]) -> dict:
        """
        Returns a dict {group: value or None, ...}.
        """
  • Group name is auto-generated:
    • Enum: lowercase enum class name.
    • Dict: key as group name.
    • List/tuple/set: groupN (N = order of argument).
    • String: value as group name.
  • If group not found or invalid: returns None for that group.
  • Date and time always validated (returns None if not real date/time).

How It Works

  1. Splits filename and path into “blocks” (by _, -, ., /, etc).

  2. For each group, tries to find an exact match (for enums, lists, dicts).

  3. For date and time:

    • Matches all supported formats via regex.
    • Validates with datetime.strptime.
  4. For custom patterns:

    • Uses provided regex patterns.

If both path and filename have a group, the value from priority wins.


Notes

  • Group name in the result will be None if not found or not valid.
  • If both path and filename have the group, value from priority wins.
  • You can use any number of groups or patterns — no hard limit.

Contributing

Pull requests, bug reports and feature requests are welcome!


FAQ / Known Issues

Q: What happens if both the path and filename contain the same group, but with different values?

A: The result depends on the priority parameter:

  • If priority="filename" (default), the group value from the filename wins.
  • If priority="path", the value from the directory path wins.

Q: Can I use non-Latin or Unicode characters in group values?

A: Yes. Groups and blocks are matched in a case-insensitive way and support Unicode.

Q: What separators does the parser recognize between blocks?

A: By default, the parser splits by any of these: _, -, ., /, \, {}, or space.
If your files use custom separators, let us know!

Q: What if a value looks like a date/time, but is not real?

A: The parser validates all dates/times. "20241341" (wrong month/day) will not be recognized as a date, etc.

Known Issues

  • If your separator is unusual (not in the list above), you may need to pre-process filenames.
  • Extremely exotic date/time formats (not listed in "Supported formats") are not matched.
  • Path parsing supports both str and pathlib.Path, but network/multiplatform paths (e.g., UNC, SMB) are not specifically tested.

License

MIT


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_path_parser-0.1.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_path_parser-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file file_path_parser-0.1.0.tar.gz.

File metadata

  • Download URL: file_path_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for file_path_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7970ea3103867528c909b2b8086d8b7483eb253c74fa6938556c196a1c652899
MD5 b7e2e5b28c387e86bc3fbca1dc01b981
BLAKE2b-256 c76f3e2a9743bc6a548f714a3ebe48d996b12e40033efbfad3486f604ed0e0b5

See more details on using hashes here.

File details

Details for the file file_path_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for file_path_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4e90d8d64c49d77de009588037e3def9f1085d96b03c03597040f61e31ecd5b
MD5 21f8415911afea14f7dc7e64a8f37541
BLAKE2b-256 821c76ceb98e57ca815128dcbf6cebb3eee9af3dba42a38047cb6a45fbf01324

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page